Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • moses122
    replied
    Originally posted by mziemann View Post
    Please don't align directly to the mature.fa, only about 50% of miR reads are the exact mature sequence, you are missing a lot of sequence diversity that will map clearly to the harpin.fa. Calling 3p and 5p arm switching on the other hand, that's a different story.
    Thanks for replying. I did mapped to hairpin.fa. The alignment rate is only around 0.01% and it's 90% for genome. I used bowtie and allowed one mismatch.

    Leave a comment:


  • mziemann
    replied
    Originally posted by moses122 View Post
    I got this problem too. Did you figure out why exactly alignment is better for long sequences than miRBase?
    Please don't align directly to the mature.fa, only about 50% of miR reads are the exact mature sequence, you are missing a lot of sequence diversity that will map clearly to the harpin.fa. Calling 3p and 5p arm switching on the other hand, that's a different story.

    Leave a comment:


  • moses122
    replied
    I got this problem too. Did you figure out why exactly alignment is better for long sequences than miRBase?

    Leave a comment:


  • Brian Bushnell
    replied
    Ahhh... to clarify, what I meant was NOT adapter trimming reads could cause the bias, but I guess that's out if the reads were successfully trimmed.

    Leave a comment:


  • Ahaswer
    replied
    Hi Brian. Yes, the reads have been adapter-trimmed. Moreover alignment toward long sequences (whole reference) gives good alignment rate and low while using short (miRBase) sequences.

    Leave a comment:


  • Brian Bushnell
    replied
    Since this was never mentioned in the thread, are the reads being adapter-trimmed prior to alignment? If not, that would cause a bias toward long rather than short sequences, and a low alignment rate.

    Leave a comment:


  • Ahaswer
    replied
    Thanks everybody for your replies. That satisfies my curiosity, especially those papers.
    Also I've checked reads alignment count for each miRNA sequence, based on miRBase annotations, which gives me subset ranging from 5 - 20 000 reads per sequence. Guess despite of mentioned before small alignment rate of overall reads I can still work on this dataset.

    Leave a comment:


  • peterawe
    replied
    Originally posted by mziemann View Post
    @Ahaswer
    "But I still wonder what is typical overall alignment rate or content of miRNA sequences in miRNA sequencing data. Can anyone provide such information?"

    Your small RNA seq should be >75% microRNA as a rule of thumb. If it is lower than that, then there is an issue about contamination which happens three ways (1) lots of adapter-only reads (2) degradation of mRNA/rRNAs into small fragments that swamp the miRNAs (3) Size selection of the miRNA-containing fragments went wrong.

    @peterawe
    Regarding my analysis, I use featureCounts "-R" option which outputs information regarding which feature each read is mapped to. I purposely use a map quality threshold of 20 to remove multimappers. Reads which are mapped back to the correct microRNA gene (mapq>=20) are "correct" and reads that are mapped to some other part of the genome (mapq>=20) are "incorrect". Reads that are unassigned due to mapq threshold are neither.

    The % correct/incorrect proportions are relative to the starting number of reads, so they are accurate way of showing the performance. One thing I didn't explore was whether mapq20 was really the right threshold or not;. Who knows, maybe bowtie1 is really good at mapq30.
    Thanks for your reply, I think this may explain the difference. Running bowtie with default parameters returns a MAPQ of 255 for all alignable reads, regardless of multi-mapping. Thus, this part of the bowtie output is not very informative for downstream filtering.

    @Ahaswer
    miRNA content depends on biological sample, RNA quality, library preparation protocol and accuracy in gel-cutting. Hoen et al. did smRNA-seq on 465 lymphoblastoid cell lines and reported
    ...the relative miRNA content in our samples ranged from 2% to 62% of mapped reads, with a median of 19%...
    Hoen (2013) Nat Biotech. I guess this was an automated protocol, doing manual preparations we routinely get over 50%.

    Leave a comment:


  • mziemann
    replied
    @Ahaswer
    "But I still wonder what is typical overall alignment rate or content of miRNA sequences in miRNA sequencing data. Can anyone provide such information?"

    Your small RNA seq should be >75% microRNA as a rule of thumb. If it is lower than that, then there is an issue about contamination which happens three ways (1) lots of adapter-only reads (2) degradation of mRNA/rRNAs into small fragments that swamp the miRNAs (3) Size selection of the miRNA-containing fragments went wrong.

    @peterawe
    Regarding my analysis, I use featureCounts "-R" option which outputs information regarding which feature each read is mapped to. I purposely use a map quality threshold of 20 to remove multimappers. Reads which are mapped back to the correct microRNA gene (mapq>=20) are "correct" and reads that are mapped to some other part of the genome (mapq>=20) are "incorrect". Reads that are unassigned due to mapq threshold are neither.

    The % correct/incorrect proportions are relative to the starting number of reads, so they are accurate way of showing the performance. One thing I didn't explore was whether mapq20 was really the right threshold or not;. Who knows, maybe bowtie1 is really good at mapq30.

    Leave a comment:


  • peterawe
    replied
    Originally posted by mziemann View Post
    Don't use Bowtie1 unless you've checked it with simulated reads yourself. With perfect 20nt reads Bowtie1 has a 15% error rate compared to Bowtie2 (~2%)
    http://genomespot.blogspot.com.au/20...-compared.html
    Hi, I read the great blog regularly. I was however a little puzzled by this analysis. My concern is what is meant by incorrectly and correctly mapped reads. If I got it right, the sequences for alignment are randomly cut out from miRNA-hairpins and mapped back against the reference genome. A possible problem is that, by chance, some of these sequences may match other loci as well. For 100% specificity, the aligner can discard such multi-mappers and thus seem very accurate, while for 100% sensitivity it could just report mapping on all corresponding loci. Thus, the large differences between the aligners may be due to different ways (default parameters) of assigning/reporting multi-mappers and not to actual errors. What's your thought on this?

    Leave a comment:


  • Ahaswer
    replied
    As you suggested I have used featureCounts to check reads location and it is more or less the same (4.6%). So it is an issue of contamination, however I will try to get some valuable data. Guess I can still get additional 1-2% through novel miRNA research.

    Thank you yueluo and mziemann for your interest and help

    But I still wonder what is typical overall alignment rate or content of miRNA sequences in miRNA sequencing data. Can anyone provide such information?

    Leave a comment:


  • mziemann
    replied
    One explantion could be contamination as Yueluo suggested. Use featureCounts or HTSeq to see where the reads are landing in the reference genome. If these are mostly mRNA and rRNA, then you have a contamination issue.

    Leave a comment:


  • Ahaswer
    replied
    Still the same overall alignment rate. Does anyone have some suggestions about bowtie2 parameters for this task?

    Leave a comment:


  • Ahaswer
    replied
    Yes, I've aligned reads to the mature.fa also but this alignment provides me only 4.39% of overall alignment rate. As you suggested I'll try remove special characters from those files.

    Leave a comment:


  • mziemann
    replied
    Regarding the original Qn its really puzzling that your alignment to miRbase gave such low alignment rates. Did you align to the mature.fa or the hairpin.fa? I would think that you would get higher alignment rates for the hairpin.fa because it will capture most isomiRs. Removing special characters from the fasta headers might help too.

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Choosing Between NGS and qPCR
    by seqadmin



    Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
    10-18-2024, 07:11 AM
  • seqadmin
    Non-Coding RNA Research and Technologies
    by seqadmin




    Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

    Nobel Prize for MicroRNA Discovery
    This week,...
    10-07-2024, 08:07 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 11-01-2024, 06:09 AM
0 responses
12 views
0 likes
Last Post seqadmin  
Started by seqadmin, 10-30-2024, 05:31 AM
0 responses
14 views
0 likes
Last Post seqadmin  
Started by seqadmin, 10-24-2024, 06:58 AM
0 responses
24 views
0 likes
Last Post seqadmin  
Started by seqadmin, 10-23-2024, 08:43 AM
0 responses
52 views
0 likes
Last Post seqadmin  
Working...
X