Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Unmapped RNA-seq reads consist of repeated nucleotides (short homopolymeric regions)

    Hello!

    I have low mapping rate for the SOLiD RNA-seq data (organism - bacteria), around 30-40%, although usually we get 70-80%. I extracted unmapped reads and reads that have multiple hits (they are all poorly aligned and discarded from the further analysis), so:

    1) average quality is the same as for good samples (~26 bases)
    2) there is an enrichment of TTTTT for unmapped reads and different kind of other k-mers for multiple-hits reads (most of them consistent between samples)
    4) GC content is higher (53-55%) for unmapped and muliple-hits reads than for mapped reads (40%)
    3) if I look at reads, they look like they consist of short straches of repeated nucleotides:

    >178_1751_207_F3
    AGGGAAAGGCGAAAAGAACCCCGGCGAGGGGAGTGAAAAAGAACCTGAAACCGTGTACGT
    ACAAGGAGGGGAGAT
    >178_1751_758_F3
    CGAAAGGCGTAGTCGATGGGAAACAGGTTAATATTCCTGTACTTGGTGTTACTGCGAAGG
    GGGGACGGAGATGCG
    >178_1752_2_F3
    AAGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATCTGATACTGGCAAGGAACTCCT
    TGCATCTAAATTTAT

    I also tried to assemble reads with Trinity, but all the derived contigs are mapped to our bacteria. Mapping agaist human genome did not give anything. It does not look like it is biological contamination. Checked for adapters and did trimming - nothing.
    Last edited by ritandr; 05-28-2014, 01:24 AM.

  • #2
    Just because unmapped reads does not fit to the human genome, it does not mean it is not contamination. I have found mouse contamination in tomato sequences.

    Comment


    • #3
      Solid reads should be in colorspace; you can't accurately convert them to base-space without mapping them. So, how did you generate those base-space reads in your post? Multi-hit and unmapped reads are fundamentally different. Also, it's hard to correctly convert a poorly-aligned read to base-space.

      In summary, I think you need to BLAST the original colorspace reads (assuming there's a colorspace version of BLAST) to see what they are.

      Comment


      • #4
        Thank you for answers,

        I did not find any difference in mapping percentage using color-space reads with LIfescope and base-space with Bowtie2, so the problem is not about their conversion. I have Blasted around 1300 of unmapped reads against nucleotide db NT, there are quite a lot of reads (25%) that are mapped to rRNA genes and to complete genome sequences (50%) of several bacteria (Bacillus and Enterococcus), and these species are the same for two different 'bad' samples. But it is impossible that they contaminate our samples. If I map against Bacillus and Enterococcus species, I get higher percentage of mapped reads (40-50%), than for our bacteria, but all of them are multiple-hit reads, and almost zero of unique reads. So, it looks like rRNA contamination, but from which source - I do not understand. The samples preparation also included rRNA exclusion...

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Recent Advances in Sequencing Analysis Tools
          by seqadmin


          The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
          Yesterday, 07:48 AM
        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Today, 06:57 AM
        0 responses
        7 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 07:17 AM
        0 responses
        13 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-02-2024, 08:06 AM
        0 responses
        19 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-30-2024, 12:17 PM
        0 responses
        21 views
        0 likes
        Last Post seqadmin  
        Working...
        X