Comments on: Running bwa vs bwa-mem2

By: Sunkyu Kwon

Sunkyu Kwon — Tue, 07 Feb 2023 02:01:07 +0000

Very helpful post! Thank you

By: maartenk

maartenk — Thu, 16 Jul 2020 11:23:12 +0000

Hi Lennart,

The results with 8 cores is indeed a bit off. I think this effect is mostly caused by other running processes. If you look at the first 7 cores, every additional core gives a bit of a loss (less than 3 percent), but needs more systematic benchmarking to give a good estimate. It might be interesting to know if this is caused by a lack of resources (memory bandwidth?) or that a part is serial executed and another part in parallel and Amdahl law is kicking in and limiting total speedup in the function of N cores. https://en.wikipedia.org/wiki/Amdahl%27s_law

The effect of loading the files is already removed by subtracting the loading times of the reference and index from the wallclock time. However, this is done in a crude matter: I subtracted 51 seconds from every wall time and did not look to the real timings. So ZFS will not sort this out.

By: Lennart Karssen

Lennart Karssen — Wed, 15 Jul 2020 15:05:16 +0000

Hi Maarten,

Interesting post! It is always good to see a head to head comparison of a new version of a tool.

Looking at the graph where you investigate speedup vs. number of cores, I see a decline in speedup as you approach the maximum of 8 physical cores. I’m wondering if this is somehow a limitation of your 8 cores (e.g. because one core is used for I/O), or whether this is caused by the algorithm or because the data set wasn’t large enough (maybe some threads were starving for data?). It would be interesting to see this test on hardware with more cores available.

Another point, related to the compression test at the end: in PolyOmica we use ZFS as filesystem, with its default lz4 compression enabled (IIRC zstd compression is in the works for a future ZFSonLinux release). It would be interesting to see how your tests are impacted by the on-the-fly (de)compression of the input/output files. For example, would we see that in a flattening of the speedup vs. nr. of cores/threads graph.

If you want to, you are welcome to use our server for these tests. Just let me know.