As a side comment, you can make bwa mem more sensitive by lowering the minimum score for a read to be outputted (-T option, e.g. try -T 20) and/or by making the seed length shorter (-k option). It will be much slower and memory demanding but for a bacterial genome it shouldn't be a problem.
Obviously, if there is a problem with the reads this will not fix it but at least you can get an idea of what's going on...
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Blasting #4 matches 97% to Aeromonas hydrophila YL17 with 4 mismatches.
(via web blast).
#5 matches 98% to several Aeromonas hydrophila strains
Perhaps you can use blast to align your unmapped reads if this is your target genome.
Is this the target? Is it a contaminant species?
I think there's some support for sam output and likely there is support for converting blast output to sam from other utilities.
UCSC blat is also good for checking on problem reads.
Note also that alignment software often has parameters that you can use which will allow more mismatches and gaps.Last edited by Richard Finney; 03-10-2016, 05:35 PM.
Leave a comment:
-
I tried reassembling the reference and then trying reference based assembly using bowtie2 and bwa-mem. The alignment rates improved to 25% and 70% respectively. The adapter content in fastqc is not marked to be a problematic field.
When I extract the unmapped reads in fast format, I see a pattern though. The first and the last few unmapped reads are give below (I inserted the names "identifier 1, 2, 3 4, 5, 6 for the purpose of this post). Some reads clearly have lots (or only) string of NNNNNNs (identifier 1, 2,3) and will be unmappable. But there are reads (identifier 4, 5,6) that have no NNNNNs and still don't map. Is the quality too low for these reads, especially if some of them blast to the right genome?
@Identifier1
NNNNNCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGNTNTCCATCTTTTNTTTCCTTCGNTNTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+
#####E#####################################################/#E#E//EEAEEE<#E<EE/A//E#E#/############################################
@Identifier2
AGATCTCCGGCTTGATGNNTCTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGNNGNNCGNATCGCGCNACTTNTNGTCNTGCGCGGCCAGGGCATCCAGCGTCTGCTTGG
+
EEEEEEEEEEEEEEEEE##EEE#######################################E##6##EE#EEEEEEA#EEEE#E#EEE#EEEEEEEEEEEEEEEEEAEEEEEEEEEEEEE
@Identifier3
CCGGGCGCTACCACCGGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNANNTNNTNNCANGNGNNGACNNNNATGNGGGNCAGCTGGCCNCTGACGGAGAGCCCCTGCCAGCCGGCAT
+
EEEEEEEEEEEEEEEEE############################################E##E##A##EA#E#<##/AA####EEE#EEE#<EEEEEE<E#E<AEEEEEE<EEEEEEE6AEA<AAAA/E
@Identifier 4
AACAGCAGGCTCCCGATCGAGTAGCCGGCGCCAAACGAGCAGAGCACGCCGCGGGCCCCCGCGGAAAGGTCGCGATTGTACAGATGGAACGCAATAATGGAACCCGCCGAGCTGGTGTTGCCGTAGCTGTC
+
EAEAEE<EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEAEEEEEEEEEEEAAEEEEEEEAEEEE<AEEEEEEEEEEEEEEEEEAE/EEEEAEEEEEE<EEAAEEEEEEAEAEEEEEAAAEAEEEEE<
@Identifier 5
CCTTGGCGTAGTCGCCAGCCCGATACCAGGCATTGCCCTGCCAGACGGGATCCGCAAAGGCGCGGGCGGCCTCCTTATAGTGCTTTTGCAAGAAGGCCTGCCAGGCCTGATTGTCATGGGTAG
+
EEEEEEEEEEEEEEEEEEAEEEEEAEEEEEE/EEEEEEEEE<EE/EEEAEEEEEEEE6EEEEEEEEEEEA<<EEAEEEAEEEE//<AEEEEEE/EEE6EE<A<EEEA/AA//EE<</EEEEEE
@Identifier 6
CAGGCCTGGCAGGCCTTCTTGCAAAAGCACTATAAGGAGGCCGCCCGCGCCTTTGCGGATCCCGTCTGGCAGGGCAATGCCTGGTATCGGGCTGGCGACTACGCCAAGGCGGTCGCCGCCTAT
+
EEEEEEEEEEEEEEEEEEEEEEEEE/EEEEEEEAEEEEEEEEEAEEEEEEEEEEEEEEE<EEEEEEEAEEEEEEEEEAE<<<EEEEEE<E<EEEEEE/EEEEEEEA/AEEEEAEEAAA<A/A/
Thanks very much for your help!
Leave a comment:
-
It is good to know that the possibility of contamination is low/not there. But then why is bowtie2 having trouble aligning the reads (I am not sure if bowtie2 defaults are strict and/or your "reference" is different enough to not allow the reads to map). If the second possibility is true then you may indeed want to try and assemble your own reference. Lab specific strains can turn out to be different from the "reference" out in the wild.
Leave a comment:
-
I have never used BBMap and I am in the process of learning how to install and use it.
But I found and did the following to get the unmapped reads from my bwa output
./samtools view -b -f 4 S25_bwa.bam > S25_unmapped.bam
./samtools bam2fq S25_unmapped.bam > S25_unmapped_bam2fq.fastq
Fortunately or unfortunately, my unmapped reads blast with the expected genome i.e. contamination with unexpected genome does not seem to be the overwhelming issue.
I will now see if I can rerun the analysis by checking for and removing adapter contamination but I also read that unmapped reads is not a problem for SNP analysis (my goal) and that I can proceed with just the mapped reads. But if I do that, doesn't the low alignment rate mean that I would be looking at just a very incomplete picture of how SNPs are distributed across the genome?
Could I just skip assembly with reference and do de-novo assembly? Because I have genomes from time t=0 and time t=end of experiment, would it be possible to make SNPs inferences by comparing the SNPs at initial vs final timepoints?
I would really appreciate any number of insights!
Leave a comment:
-
Thanks Richard and GenoMax. I will try your suggestions. Somewhere I was hoping that it was not a contamination issue. But let's see. Will post here what I find.
Leave a comment:
-
Your check for contamination angle is the thing to do.
You can even *randomly* subsample the unmappeds and paste 'em as fastas into into NCBI web blast just to get your bearings.
If for instance 20% of them map to another bacteria family, then , yeah, it's contamination.
Check out their adapter contamination tool, too.
Leave a comment:
-
I am going to suggest that you try BBMap (I know .. yet another aligner) since it will easily allow you to capture reads that do not map to a file. Leave settings at default values. Only provide a reference index (that you will need to build or build at run time) input files and outputs (to contain the alignment and files for unmapped reads).
I would also suggest that you grab a sample of reads and do blast @NCBI to figure out if there is a possibility of contamination from an unexpected species.Last edited by GenoMax; 03-01-2016, 05:06 PM.
Leave a comment:
-
Low alignment rates with bowtie2 and bwa for a bacterial genome
Apologies for cross-posting.
I am getting really low alignment rates while trying to do SNP analysis.
I sequenced the genome of a bacterial strain. Then conducted an experiment and then sequenced the genome of the strain at the end of the experiment to see what mutations (SNPs) took place during the course of the experiment.
Bowtie2 gives me the following output:
#map the reads
-bash-4.1$ ./bowtie2 -p 1 -x AER -1 S25_R1_001.fastq -2 S25_R2_001.fastq > S25_bowtie2.sam
#output
4240966 reads; of these:
4240966 (100.00%) were paired; of these:
4240777 (100.00%) aligned concordantly 0 times
161 (0.00%) aligned concordantly exactly 1 time
28 (0.00%) aligned concordantly >1 times
----
4240777 pairs aligned concordantly 0 times; of these:
10902 (0.26%) aligned discordantly 1 time
----
4229875 pairs aligned 0 times concordantly or discordantly; of these:
8459750 mates make up the pairs; of these:
8168790 (96.56%) aligned 0 times
49047 (0.58%) aligned exactly 1 time
241913 (2.86%) aligned >1 times
3.69% overall alignment rate
and bwa gives me the following output (I ran samtools flagstat command to see % overall alignment rate, which is 47% if I understand the output correctly)
#map the reads
-bash-4.1$ bwa mem -t 4 AER.fasta S25_R1_001.fastq S25_R2_001.fastq > S25_bwa.sam
#convert to bam
-bash-4.1$ ./samtools view -bS S25_bwa.sam > S25_bwa.bam
#get flagstats
-bash-4.1$ ./samtools flagstat S25_bwa.bam
#output
8816220 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
334288 + 0 supplementary
0 + 0 duplicates
4153225 + 0 mapped (47.11%:-nan%)
8481932 + 0 paired in sequencing
4240966 + 0 read1
4240966 + 0 read2
2605074 + 0 properly paired (30.71%:-nan%)
3513674 + 0 with itself and mate mapped
305263 + 0 singletons (3.60%:-nan%)
901042 + 0 with mate mapped to a different chr
41706 + 0 with mate mapped to a different chr (mapQ>=5)
I have read that bwa me is generally a very aggressive aligner and that probably explains the 47% rate.
I am looking into how to extract unmapped reads so that I can blast them to see any contamination issues.
The fastqc reports look fine (no red crosses), especially after I trim the first ~10 and last ~3 bases. Is there any other quality control I should be doing before mapping? What are all of the reasons for low alignment rates?Tags: None
Latest Articles
Collapse
-
by seqadmin
This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.
The Headliner
The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...-
Channel: Articles
03-03-2025, 01:39 PM -
-
by seqadmin
The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...-
Channel: Articles
02-24-2025, 06:31 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 03-03-2025, 01:15 PM
|
0 responses
179 views
0 likes
|
Last Post
by seqadmin
03-03-2025, 01:15 PM
|
||
Started by seqadmin, 02-28-2025, 12:58 PM
|
0 responses
274 views
0 likes
|
Last Post
by seqadmin
02-28-2025, 12:58 PM
|
||
Started by seqadmin, 02-24-2025, 02:48 PM
|
0 responses
659 views
0 likes
|
Last Post
by seqadmin
02-24-2025, 02:48 PM
|
||
Started by seqadmin, 02-21-2025, 02:46 PM
|
0 responses
268 views
0 likes
|
Last Post
by seqadmin
02-21-2025, 02:46 PM
|
Leave a comment: