Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • ZoeG
    Member
    • Jun 2013
    • 31

    interesting unmapped reads

    New to RNAseq, thus everything found seems interesting to me, and as well, strange to me.

    I used TopHat mapping my mouse PE100 data to its reference genome and got about 85~90% mapped.
    Then, I looked into those unmapped reads and found some reads look like this:CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCACCCCCCCACACCTCAAAAAACACCCCAAAATAAAAATAACCGATCTGATTTAAAAATTAG

    I found about 20, 000+ reads like the above one ( >40 C at the head) among total 30 M reads.

    Is this usual? Or can we tell anything from these unmapped reads?
  • ZoeG
    Member
    • Jun 2013
    • 31

    #2
    A little bit more details,

    20,000+ reads with >40C at the heads were only found in the left reads.
    only a few of this kind of reads were found in the right one.
    And only happened to C, not A, T, G.

    Comment

    • rskr
      Senior Member
      • Oct 2010
      • 249

      #3
      Those look like low complexity reads. Sometimes they get marked as low complexity, or too many multimaps, if they aren't marked you can take some of the reads and run blast to figure out where they are mapping. They are probably real data, there are regions in genomes like that, many regions in fact.

      Comment

      • ZoeG
        Member
        • Jun 2013
        • 31

        #4
        Originally posted by rskr View Post
        Those look like low complexity reads. Sometimes they get marked as low complexity, or too many multimaps, if they aren't marked you can take some of the reads and run blast to figure out where they are mapping. They are probably real data, there are regions in genomes like that, many regions in fact.
        Yes, I found miRNAs could have long and continuous C.
        But for these 100 bps reads, I tried UCSC blat and NCBI blast, it seems these reads matched nothing.

        Another question, if it is real, I found "CCCC....C" in the left reads, should I find symmetrical reads in the corresponding right reads ? Or it is not necessary?
        Last edited by ZoeG; 07-24-2013, 10:23 AM.

        Comment

        • rskr
          Senior Member
          • Oct 2010
          • 249

          #5
          Originally posted by ZoeG View Post
          Yes, I found miRNAs could have long and continuous C.
          But for these 100 bps reads, I tried UCSC blat and NCBI blast, it seems these reads matched nothing.

          Another question, if it is real, I found "CCCC....C" in the left reads, should I find symmetrical reads in the corresponding right reads ? Or it is not necessary?
          Did you turn off low complexity filtering on BLAST and BLAT?

          Comment

          • ZoeG
            Member
            • Jun 2013
            • 31

            #6
            Originally posted by rskr View Post
            Did you turn off low complexity filtering on BLAST and BLAT?
            After turning off complexity filtering, blastn found no significant similar by searching database Mouse G+T using Megablast; using database Nucleotide collection (nr/nt), it gave a list, with one record for mouse, Mus musculus BAC clone RP24-289J17 from chromosome 14, complete sequence, coverage 52%, score 84.2, ident 96%.

            Seems confusing to me..

            Comment

            • swbarnes2
              Senior Member
              • May 2008
              • 910

              #7
              Let's start with the obvious...what's the quality string look like? I bet it's all just noisy garbage.

              Comment

              • rskr
                Senior Member
                • Oct 2010
                • 249

                #8
                Originally posted by ZoeG View Post
                After turning off complexity filtering, blastn found no significant similar by searching database Mouse G+T using Megablast; using database Nucleotide collection (nr/nt), it gave a list, with one record for mouse, Mus musculus BAC clone RP24-289J17 from chromosome 14, complete sequence, coverage 52%, score 84.2, ident 96%.

                Seems confusing to me..
                so you are saying you BLASTed it, but it didn't return anything, then you turned off low complexity filtering for BLAST, and BLASTing it did return something significant with 96% identity. Which part matched? Maybe you found a missing chunk of the mouse genome!

                Comment

                • Richard Finney
                  Senior Member
                  • Feb 2009
                  • 701

                  #9
                  Coverage 52%, though, most of it was probably the stretch of Cs.
                  You can blast the various genomes at NCBI blast and that's as good as you'll get.
                  It's likely just a junk read. The 90% mapped is a good enough run. Don't worry about the junk, it's normal. Sometime these unmapped reads to go to contaminating bacteria or viruses but in your case it's probably just junk.

                  Comment

                  • ZoeG
                    Member
                    • Jun 2013
                    • 31

                    #10
                    The matched part is the stretch. Those 'C' was miserably threw out.
                    Yes, it seems these reads are just junk. The quality strings of this kind of reads show a lot of '#'.
                    Thanks, all.
                    It is funny that the machine loves only 'C', not A, T or G.

                    Comment

                    Latest Articles

                    Collapse

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by SEQadmin2, 06-05-2026, 10:09 AM
                    0 responses
                    14 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-04-2026, 08:59 AM
                    0 responses
                    24 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-02-2026, 12:03 PM
                    0 responses
                    29 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-02-2026, 11:40 AM
                    0 responses
                    23 views
                    0 reactions
                    Last Post SEQadmin2  
                    Working...