Seqanswers Leaderboard Ad

**Richard Finney** · 02-10-2011, 12:51 PM

Just some thoughts:

1) select a more aggressive trim
2) try bwa
3) look at the unmappeds, do they look right?
4) take a random sampling of the unmapped reads and do a blast, are they hitting other organisms?

**Chipper** · 02-10-2011, 01:45 PM

What settings did you use for bowtie? If using standard settings, try -n3 -l 24 -e 200 as defaults are for Illumina reads.

Look at the average qv per position, if you have a pattern with positions with low qv (eg every 5th base), try BFAST with appropriate masks.

bwa is not going to help, neither is blasting unmapped since you do not have the (correct) sequence... I would try also velvet on unmapped reads in case there are rRNA sequences not found in the reference.

**Jean** · 02-11-2011, 07:35 AM

Thanks for the ideas.

Bowtie mapping:
Since we have also been mapping to a mixed population I have used the following to get best hit maps:

Code:

bowtie -S -C -3 10 --threads 6 --best -M 1

And as mentioned, for a pure culture I get these results:

Code:

# reads processed: 55208429
# reads with at least one reported alignment: 13639433 (24.71%)
# reads that failed to align: 29756435 (53.90%)
# reads with alignments sampled due to -M: 11812561 (21.40%)

Based on suggestions here, I tried trimming 20bp, and I also tried settings suggested by Chipper:

Code:

bowtie -S -C -3 20 --best -M 1
# reads processed: 55208429
# reads with at least one reported alignment: 15156628 (27.45%)
# reads that failed to align: 27899931 (50.54%)
# reads with alignments sampled due to -M: 12151870 (22.01%)

Code:

bowtie -S -C -t -n 3 -l 24 -e 200
# reads processed: 55208429
# reads with at least one reported alignment: 29658102 (53.72%)
# reads that failed to align: 25550327 (46.28%)

As you can see, this doesn't seem to affect the unmapped portion much.

As for other suggestions:
- Chipper is correct that the converted unmapped reads are not in the right basespace so I have BLASTed them, but they do not match anything (see my "side point" above)
- I have tried assembling with Velvet and did not get significant contigs
- I know half of the mappable reads are 23s and I have mapped against bacterial databases of 23s and cpn60 and there is nothing extra pulled out
- Admittedly I have not looked at the quality scores in depth so I will do that. Would you suggest looking at the SOLiD qual files, or the bowtie output (unmapped reads)?

**mnkyboy** · 02-11-2011, 09:59 AM

We had something similar and half our reads would not map and we were told it is most likely during the emulsion PCR a lot of chimeric beads are made instead of those with a single read and they just fall out of mapping. So when we do SOLiD we just assume we are going to lose half our reads.

**Jean** · 02-11-2011, 10:10 AM

Originally posted by mnkyboy View Post

We had something similar and half our reads would not map and we were told it is most likely during the emulsion PCR a lot of chimeric beads are made instead of those with a single read and they just fall out of mapping. So when we do SOLiD we just assume we are going to lose half our reads.

I'm suspecting that is the problem here. We were supposed to get 1.4bil reads, but only got 500mil, then half of them are unmappable, and half of those are rRNA.

**colindaven** · 02-15-2011, 04:04 AM

Not sure how useful it is for Solid data (I guess the SAM/BAM input function should work fine after alignment), but I'd recommend Fastqc for looking at the per base quality.

**flashton** · 04-04-2012, 11:01 AM

Something similar happened with my RNA-seq data done on SOLiD 4, expecting billions of reads, got 500 million, 10-20% of them mapped. I have heard that RNA-seq mapping is always lower than gDNA. Perhaps Illumina has a higher mapping rate?

Still got good coverage/results - I love working with bacteria!

**samanta** · 04-04-2012, 11:23 AM

50% hit with SOLiD is not bad. SOLiD machines generate an order of magnitude more reads than Illumina, but they have more noise as well.

**hoagiang** · 09-26-2012, 05:33 AM

I have the same problem with low mapping (SOLiD reads using bowtie). I found a weird thing is that some of the unmapped reads are actually mapped. I have posted a thread but it seems nobody has any idea why.
I know this sounds ridiculous: you can try add 20bp random sequences to the start and end of your reference genomes. Redo the mapping. Filter out reads mapping to 20bp random seqs. You probably see an increase mapping.

**FLeader** · 01-17-2013, 12:11 PM

I strongly suspect a sample preparation problem. We have sequenced many different bacteria using SOLiD 3 & 4 platforms and typically achieve 75-85% mapping. However, we are very careful to reject poor quality samples to start with (garbage in, garbage out) and make sure that all of the QA/QC steps are correct. You should not have to trim you sequences much to get excellent mapping results.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 24 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 25 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 21 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Lots of unmapped reads - SOLiD bacterial RNA-seq and bowtie mapping

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News