Seqanswers Leaderboard Ad

**yueluo** · 03-30-2015, 01:23 AM

For miRNA data, I would suggest using bowtie instead of bowtie2, since the author claims bowtie to be more sensitive for reads <50bps.

Also, I would suggest you make a database from Rfam and ncbi/Ensembl's known non-coding RNA (mostly rRNA,tRNA,snoRNA, etc). Map reads against this first and you'll see how much contamination you have in the data.

**mziemann** · 03-30-2015, 01:31 AM

Don't use Bowtie1 unless you've checked it with simulated reads yourself. With perfect 20nt reads Bowtie1 has a 15% error rate compared to Bowtie2 (~2%)

microRNA aligners compared

http://genomespot.blogspot.com.au/2014/11/microrna-aligners-compared.html

Alignment of microRNA to the genome poses a particular challenge because the reads are short, and some microRNAs are nearly identical. More...

**yueluo** · 03-30-2015, 01:46 AM

Originally posted by mziemann View Post

Don't use Bowtie1 unless you've checked it with simulated reads yourself. With perfect 20nt reads Bowtie1 has a 15% error rate compared to Bowtie2 (~2%)
http://genomespot.blogspot.com.au/20...-compared.html

Thanks for the input, so bowtie1 has the highest mapping rate but also significantly high incorrect rate. Sound like a terrible tradeoff.

**Ahaswer** · 03-30-2015, 01:57 AM

I've made some research before and found opinions that bowtie1 is not better than bowtie2 while considering miRNA data. So I'm convinced to use bowtie2 for this purpose, but still I have this problem.

**mziemann** · 03-30-2015, 02:05 AM

Regarding the original Qn its really puzzling that your alignment to miRbase gave such low alignment rates. Did you align to the mature.fa or the hairpin.fa? I would think that you would get higher alignment rates for the hairpin.fa because it will capture most isomiRs. Removing special characters from the fasta headers might help too.

**Ahaswer** · 03-30-2015, 02:11 AM

Yes, I've aligned reads to the mature.fa also but this alignment provides me only 4.39% of overall alignment rate. As you suggested I'll try remove special characters from those files.

**Ahaswer** · 03-30-2015, 01:53 PM

Still the same overall alignment rate. Does anyone have some suggestions about bowtie2 parameters for this task?

**mziemann** · 03-30-2015, 02:12 PM

One explantion could be contamination as Yueluo suggested. Use featureCounts or HTSeq to see where the reads are landing in the reference genome. If these are mostly mRNA and rRNA, then you have a contamination issue.

**Ahaswer** · 03-31-2015, 12:38 AM

As you suggested I have used featureCounts to check reads location and it is more or less the same (4.6%). So it is an issue of contamination, however I will try to get some valuable data. Guess I can still get additional 1-2% through novel miRNA research.

Thank you yueluo and mziemann for your interest and help

But I still wonder what is typical overall alignment rate or content of miRNA sequences in miRNA sequencing data. Can anyone provide such information?

**peterawe** · 03-31-2015, 12:46 AM

Originally posted by mziemann View Post

Don't use Bowtie1 unless you've checked it with simulated reads yourself. With perfect 20nt reads Bowtie1 has a 15% error rate compared to Bowtie2 (~2%)
http://genomespot.blogspot.com.au/20...-compared.html

Hi, I read the great blog regularly. I was however a little puzzled by this analysis. My concern is what is meant by incorrectly and correctly mapped reads. If I got it right, the sequences for alignment are randomly cut out from miRNA-hairpins and mapped back against the reference genome. A possible problem is that, by chance, some of these sequences may match other loci as well. For 100% specificity, the aligner can discard such multi-mappers and thus seem very accurate, while for 100% sensitivity it could just report mapping on all corresponding loci. Thus, the large differences between the aligners may be due to different ways (default parameters) of assigning/reporting multi-mappers and not to actual errors. What's your thought on this?

**mziemann** · 03-31-2015, 01:31 AM

@Ahaswer
"But I still wonder what is typical overall alignment rate or content of miRNA sequences in miRNA sequencing data. Can anyone provide such information?"

Your small RNA seq should be >75% microRNA as a rule of thumb. If it is lower than that, then there is an issue about contamination which happens three ways (1) lots of adapter-only reads (2) degradation of mRNA/rRNAs into small fragments that swamp the miRNAs (3) Size selection of the miRNA-containing fragments went wrong.

@peterawe
Regarding my analysis, I use featureCounts "-R" option which outputs information regarding which feature each read is mapped to. I purposely use a map quality threshold of 20 to remove multimappers. Reads which are mapped back to the correct microRNA gene (mapq>=20) are "correct" and reads that are mapped to some other part of the genome (mapq>=20) are "incorrect". Reads that are unassigned due to mapq threshold are neither.

The % correct/incorrect proportions are relative to the starting number of reads, so they are accurate way of showing the performance. One thing I didn't explore was whether mapq20 was really the right threshold or not;. Who knows, maybe bowtie1 is really good at mapq30.

**peterawe** · 03-31-2015, 02:36 AM

Originally posted by mziemann View Post

@Ahaswer
"But I still wonder what is typical overall alignment rate or content of miRNA sequences in miRNA sequencing data. Can anyone provide such information?"

Your small RNA seq should be >75% microRNA as a rule of thumb. If it is lower than that, then there is an issue about contamination which happens three ways (1) lots of adapter-only reads (2) degradation of mRNA/rRNAs into small fragments that swamp the miRNAs (3) Size selection of the miRNA-containing fragments went wrong.

@peterawe
Regarding my analysis, I use featureCounts "-R" option which outputs information regarding which feature each read is mapped to. I purposely use a map quality threshold of 20 to remove multimappers. Reads which are mapped back to the correct microRNA gene (mapq>=20) are "correct" and reads that are mapped to some other part of the genome (mapq>=20) are "incorrect". Reads that are unassigned due to mapq threshold are neither.

The % correct/incorrect proportions are relative to the starting number of reads, so they are accurate way of showing the performance. One thing I didn't explore was whether mapq20 was really the right threshold or not;. Who knows, maybe bowtie1 is really good at mapq30.

Thanks for your reply, I think this may explain the difference. Running bowtie with default parameters returns a MAPQ of 255 for all alignable reads, regardless of multi-mapping. Thus, this part of the bowtie output is not very informative for downstream filtering.

@Ahaswer
miRNA content depends on biological sample, RNA quality, library preparation protocol and accuracy in gel-cutting. Hoen et al. did smRNA-seq on 465 lymphoblastoid cell lines and reported

...the relative miRNA content in our samples ranged from 2% to 62% of mapped reads, with a median of 19%...

Hoen (2013) Nat Biotech. I guess this was an automated protocol, doing manual preparations we routinely get over 50%.

**Ahaswer** · 03-31-2015, 06:12 AM

Thanks everybody for your replies. That satisfies my curiosity, especially those papers.
Also I've checked reads alignment count for each miRNA sequence, based on miRBase annotations, which gives me subset ranging from 5 - 20 000 reads per sequence. Guess despite of mentioned before small alignment rate of overall reads I can still work on this dataset.

**Brian Bushnell** · 03-31-2015, 10:56 AM

Since this was never mentioned in the thread, are the reads being adapter-trimmed prior to alignment? If not, that would cause a bias toward long rather than short sequences, and a low alignment rate.

Topics	Statistics	Last Post
Gene Misexpression in the Healthy Human Population by seqadmin Started by seqadmin, Yesterday, 06:46 AM	0 responses 9 views 0 likes	Last Post by seqadmin Yesterday, 06:46 AM
New Method for Rapid Genetic Diagnosis of Mendelian Disorders by seqadmin Started by seqadmin, 07-24-2024, 11:09 AM	0 responses 26 views 0 likes	Last Post by seqadmin 07-24-2024, 11:09 AM
Advancing Nanopore Technology for Portable Sensing Devices by seqadmin Started by seqadmin, 07-19-2024, 07:20 AM	0 responses 159 views 0 likes	Last Post by seqadmin 07-19-2024, 07:20 AM
New RNA-Based Gene Writing Technology Achieves Precise Gene Integration by seqadmin Started by seqadmin, 07-16-2024, 05:49 AM	0 responses 127 views 0 likes	Last Post by seqadmin 07-16-2024, 05:49 AM

Seqanswers Leaderboard Ad

Announcement

miRNA reads alignment with bowtie2

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News