Hi,
I've sent off several human DNA extractions to companies like Novogene or Macrogene and they returned 15x - 30x genome data in FastQ and / or BAM files aligned to hg19. (paired end 2 x 150 bases)
Now I wanted to try and map the FastQ.gz files to hg38 by myself. I have a 16 core (32 thread) 2 GHz machine with 128 GByte RAM and a ~700 GByte fast SSD. Before I'll get fancy and try BBmap I wanted to test an old school approach with bwa mem to see how it works. I compiled the newest bwa from github. Indexing the hg38 was fast and easy. However now I have a bad feeling about the bwa mem mapping because it takes already 15 hours and I don't see any progress. Here is what I have:
thomas@streymoy:/mnt/raid0ssd/align/2987$ ~/src/bwa/bwa mem -t 30 -M /mnt/raid0ssd/refseq/hg38/hg38.fa /mnt/raid0ssd/fastq/2987/S_2987_D16084169_HWM7HCCXX_L1_1.fq.gz /mnt/raid0ssd/fastq/2987/S_2987_D16084169_HWM7HCCXX_L1_2.fq.gz > S_2987_D16084169_HWM7HCCXX_L1_pe.sam
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M:rocess] read 2000000 sequences (300000000 bp)...
[M:rocess] read 2000000 sequences (300000000 bp)...
The paired end FastQ files are about 8 GBytes large and contain 85,069,589 reads each.
Is it right that only 2 million reads are being processed until now?
I wonder if my machine is under-powered or if I'm doing something completely wrong.
htop shows that in fact 30 threads are running at close to 100%.
Thanks for your thoughts.
I've sent off several human DNA extractions to companies like Novogene or Macrogene and they returned 15x - 30x genome data in FastQ and / or BAM files aligned to hg19. (paired end 2 x 150 bases)
Now I wanted to try and map the FastQ.gz files to hg38 by myself. I have a 16 core (32 thread) 2 GHz machine with 128 GByte RAM and a ~700 GByte fast SSD. Before I'll get fancy and try BBmap I wanted to test an old school approach with bwa mem to see how it works. I compiled the newest bwa from github. Indexing the hg38 was fast and easy. However now I have a bad feeling about the bwa mem mapping because it takes already 15 hours and I don't see any progress. Here is what I have:
thomas@streymoy:/mnt/raid0ssd/align/2987$ ~/src/bwa/bwa mem -t 30 -M /mnt/raid0ssd/refseq/hg38/hg38.fa /mnt/raid0ssd/fastq/2987/S_2987_D16084169_HWM7HCCXX_L1_1.fq.gz /mnt/raid0ssd/fastq/2987/S_2987_D16084169_HWM7HCCXX_L1_2.fq.gz > S_2987_D16084169_HWM7HCCXX_L1_pe.sam
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M:rocess] read 2000000 sequences (300000000 bp)...
[M:rocess] read 2000000 sequences (300000000 bp)...
The paired end FastQ files are about 8 GBytes large and contain 85,069,589 reads each.
Is it right that only 2 million reads are being processed until now?
I wonder if my machine is under-powered or if I'm doing something completely wrong.
htop shows that in fact 30 threads are running at close to 100%.
Thanks for your thoughts.
Comment