Originally posted by mziemann
View Post
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
-
Originally posted by moses122 View PostI got this problem too. Did you figure out why exactly alignment is better for long sequences than miRBase?
Leave a comment:
-
I got this problem too. Did you figure out why exactly alignment is better for long sequences than miRBase?
Leave a comment:
-
Ahhh... to clarify, what I meant was NOT adapter trimming reads could cause the bias, but I guess that's out if the reads were successfully trimmed.
Leave a comment:
-
Hi Brian. Yes, the reads have been adapter-trimmed. Moreover alignment toward long sequences (whole reference) gives good alignment rate and low while using short (miRBase) sequences.
Leave a comment:
-
Since this was never mentioned in the thread, are the reads being adapter-trimmed prior to alignment? If not, that would cause a bias toward long rather than short sequences, and a low alignment rate.
Leave a comment:
-
Thanks everybody for your replies. That satisfies my curiosity, especially those papers.
Also I've checked reads alignment count for each miRNA sequence, based on miRBase annotations, which gives me subset ranging from 5 - 20 000 reads per sequence. Guess despite of mentioned before small alignment rate of overall reads I can still work on this dataset.
Leave a comment:
-
Originally posted by mziemann View Post@Ahaswer
"But I still wonder what is typical overall alignment rate or content of miRNA sequences in miRNA sequencing data. Can anyone provide such information?"
Your small RNA seq should be >75% microRNA as a rule of thumb. If it is lower than that, then there is an issue about contamination which happens three ways (1) lots of adapter-only reads (2) degradation of mRNA/rRNAs into small fragments that swamp the miRNAs (3) Size selection of the miRNA-containing fragments went wrong.
@peterawe
Regarding my analysis, I use featureCounts "-R" option which outputs information regarding which feature each read is mapped to. I purposely use a map quality threshold of 20 to remove multimappers. Reads which are mapped back to the correct microRNA gene (mapq>=20) are "correct" and reads that are mapped to some other part of the genome (mapq>=20) are "incorrect". Reads that are unassigned due to mapq threshold are neither.
The % correct/incorrect proportions are relative to the starting number of reads, so they are accurate way of showing the performance. One thing I didn't explore was whether mapq20 was really the right threshold or not;. Who knows, maybe bowtie1 is really good at mapq30.
@Ahaswer
miRNA content depends on biological sample, RNA quality, library preparation protocol and accuracy in gel-cutting. Hoen et al. did smRNA-seq on 465 lymphoblastoid cell lines and reported...the relative miRNA content in our samples ranged from 2% to 62% of mapped reads, with a median of 19%...
Leave a comment:
-
@Ahaswer
"But I still wonder what is typical overall alignment rate or content of miRNA sequences in miRNA sequencing data. Can anyone provide such information?"
Your small RNA seq should be >75% microRNA as a rule of thumb. If it is lower than that, then there is an issue about contamination which happens three ways (1) lots of adapter-only reads (2) degradation of mRNA/rRNAs into small fragments that swamp the miRNAs (3) Size selection of the miRNA-containing fragments went wrong.
@peterawe
Regarding my analysis, I use featureCounts "-R" option which outputs information regarding which feature each read is mapped to. I purposely use a map quality threshold of 20 to remove multimappers. Reads which are mapped back to the correct microRNA gene (mapq>=20) are "correct" and reads that are mapped to some other part of the genome (mapq>=20) are "incorrect". Reads that are unassigned due to mapq threshold are neither.
The % correct/incorrect proportions are relative to the starting number of reads, so they are accurate way of showing the performance. One thing I didn't explore was whether mapq20 was really the right threshold or not;. Who knows, maybe bowtie1 is really good at mapq30.
Leave a comment:
-
Originally posted by mziemann View PostDon't use Bowtie1 unless you've checked it with simulated reads yourself. With perfect 20nt reads Bowtie1 has a 15% error rate compared to Bowtie2 (~2%)
http://genomespot.blogspot.com.au/20...-compared.html
Leave a comment:
-
As you suggested I have used featureCounts to check reads location and it is more or less the same (4.6%). So it is an issue of contamination, however I will try to get some valuable data. Guess I can still get additional 1-2% through novel miRNA research.
Thank you yueluo and mziemann for your interest and help
But I still wonder what is typical overall alignment rate or content of miRNA sequences in miRNA sequencing data. Can anyone provide such information?
Leave a comment:
-
One explantion could be contamination as Yueluo suggested. Use featureCounts or HTSeq to see where the reads are landing in the reference genome. If these are mostly mRNA and rRNA, then you have a contamination issue.
Leave a comment:
-
Still the same overall alignment rate. Does anyone have some suggestions about bowtie2 parameters for this task?
Leave a comment:
-
Yes, I've aligned reads to the mature.fa also but this alignment provides me only 4.39% of overall alignment rate. As you suggested I'll try remove special characters from those files.
Leave a comment:
-
Regarding the original Qn its really puzzling that your alignment to miRbase gave such low alignment rates. Did you align to the mature.fa or the hairpin.fa? I would think that you would get higher alignment rates for the hairpin.fa because it will capture most isomiRs. Removing special characters from the fasta headers might help too.
Leave a comment:
Latest Articles
Collapse
-
by seqadmin
Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...-
Channel: Articles
10-18-2024, 07:11 AM -
-
by seqadmin
Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.
Nobel Prize for MicroRNA Discovery
This week,...-
Channel: Articles
10-07-2024, 08:07 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 11-01-2024, 06:09 AM
|
0 responses
12 views
0 likes
|
Last Post
by seqadmin
11-01-2024, 06:09 AM
|
||
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks
by seqadmin
Started by seqadmin, 10-30-2024, 05:31 AM
|
0 responses
14 views
0 likes
|
Last Post
by seqadmin
10-30-2024, 05:31 AM
|
||
Started by seqadmin, 10-24-2024, 06:58 AM
|
0 responses
24 views
0 likes
|
Last Post
by seqadmin
10-24-2024, 06:58 AM
|
||
New AI Model Designs Synthetic DNA Switches for Targeted Gene Expression in Specific Cell Types
by seqadmin
Started by seqadmin, 10-23-2024, 08:43 AM
|
0 responses
52 views
0 likes
|
Last Post
by seqadmin
10-23-2024, 08:43 AM
|
Leave a comment: