Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • ZoeG
    Member
    • Jun 2013
    • 31

    interesting unmapped reads

    New to RNAseq, thus everything found seems interesting to me, and as well, strange to me.

    I used TopHat mapping my mouse PE100 data to its reference genome and got about 85~90% mapped.
    Then, I looked into those unmapped reads and found some reads look like this:CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCACCCCCCCACACCTCAAAAAACACCCCAAAATAAAAATAACCGATCTGATTTAAAAATTAG

    I found about 20, 000+ reads like the above one ( >40 C at the head) among total 30 M reads.

    Is this usual? Or can we tell anything from these unmapped reads?
  • ZoeG
    Member
    • Jun 2013
    • 31

    #2
    A little bit more details,

    20,000+ reads with >40C at the heads were only found in the left reads.
    only a few of this kind of reads were found in the right one.
    And only happened to C, not A, T, G.

    Comment

    • rskr
      Senior Member
      • Oct 2010
      • 249

      #3
      Those look like low complexity reads. Sometimes they get marked as low complexity, or too many multimaps, if they aren't marked you can take some of the reads and run blast to figure out where they are mapping. They are probably real data, there are regions in genomes like that, many regions in fact.

      Comment

      • ZoeG
        Member
        • Jun 2013
        • 31

        #4
        Originally posted by rskr View Post
        Those look like low complexity reads. Sometimes they get marked as low complexity, or too many multimaps, if they aren't marked you can take some of the reads and run blast to figure out where they are mapping. They are probably real data, there are regions in genomes like that, many regions in fact.
        Yes, I found miRNAs could have long and continuous C.
        But for these 100 bps reads, I tried UCSC blat and NCBI blast, it seems these reads matched nothing.

        Another question, if it is real, I found "CCCC....C" in the left reads, should I find symmetrical reads in the corresponding right reads ? Or it is not necessary?
        Last edited by ZoeG; 07-24-2013, 10:23 AM.

        Comment

        • rskr
          Senior Member
          • Oct 2010
          • 249

          #5
          Originally posted by ZoeG View Post
          Yes, I found miRNAs could have long and continuous C.
          But for these 100 bps reads, I tried UCSC blat and NCBI blast, it seems these reads matched nothing.

          Another question, if it is real, I found "CCCC....C" in the left reads, should I find symmetrical reads in the corresponding right reads ? Or it is not necessary?
          Did you turn off low complexity filtering on BLAST and BLAT?

          Comment

          • ZoeG
            Member
            • Jun 2013
            • 31

            #6
            Originally posted by rskr View Post
            Did you turn off low complexity filtering on BLAST and BLAT?
            After turning off complexity filtering, blastn found no significant similar by searching database Mouse G+T using Megablast; using database Nucleotide collection (nr/nt), it gave a list, with one record for mouse, Mus musculus BAC clone RP24-289J17 from chromosome 14, complete sequence, coverage 52%, score 84.2, ident 96%.

            Seems confusing to me..

            Comment

            • swbarnes2
              Senior Member
              • May 2008
              • 910

              #7
              Let's start with the obvious...what's the quality string look like? I bet it's all just noisy garbage.

              Comment

              • rskr
                Senior Member
                • Oct 2010
                • 249

                #8
                Originally posted by ZoeG View Post
                After turning off complexity filtering, blastn found no significant similar by searching database Mouse G+T using Megablast; using database Nucleotide collection (nr/nt), it gave a list, with one record for mouse, Mus musculus BAC clone RP24-289J17 from chromosome 14, complete sequence, coverage 52%, score 84.2, ident 96%.

                Seems confusing to me..
                so you are saying you BLASTed it, but it didn't return anything, then you turned off low complexity filtering for BLAST, and BLASTing it did return something significant with 96% identity. Which part matched? Maybe you found a missing chunk of the mouse genome!

                Comment

                • Richard Finney
                  Senior Member
                  • Feb 2009
                  • 701

                  #9
                  Coverage 52%, though, most of it was probably the stretch of Cs.
                  You can blast the various genomes at NCBI blast and that's as good as you'll get.
                  It's likely just a junk read. The 90% mapped is a good enough run. Don't worry about the junk, it's normal. Sometime these unmapped reads to go to contaminating bacteria or viruses but in your case it's probably just junk.

                  Comment

                  • ZoeG
                    Member
                    • Jun 2013
                    • 31

                    #10
                    The matched part is the stretch. Those 'C' was miserably threw out.
                    Yes, it seems these reads are just junk. The quality strings of this kind of reads show a lot of '#'.
                    Thanks, all.
                    It is funny that the machine loves only 'C', not A, T or G.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      New Genomics Tools and Methods Shared at AGBT 2025
                      by seqadmin


                      This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                      The Headliner
                      The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                      03-03-2025, 01:39 PM
                    • seqadmin
                      Investigating the Gut Microbiome Through Diet and Spatial Biology
                      by seqadmin




                      The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
                      02-24-2025, 06:31 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 03-20-2025, 05:03 AM
                    0 responses
                    17 views
                    0 reactions
                    Last Post seqadmin  
                    Started by seqadmin, 03-19-2025, 07:27 AM
                    0 responses
                    18 views
                    0 reactions
                    Last Post seqadmin  
                    Started by seqadmin, 03-18-2025, 12:50 PM
                    0 responses
                    19 views
                    0 reactions
                    Last Post seqadmin  
                    Started by seqadmin, 03-03-2025, 01:15 PM
                    0 responses
                    185 views
                    0 reactions
                    Last Post seqadmin  
                    Working...