Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How much does the uniquely mapping matter?

    Hi All,

    I have some RNA-seq data from Illumina (75 bp long). I found the percentage for uniquely mapping is really low (<=5%). What could be the reason:
    1. sample preparation?
    2. sequencing?
    3. mapping software?
    ....

    In addition, how much does the low uniquely mapping percentage matter? Would these data be usable?

    Thanks



  • #2
    RE: How much does the uniquely mapping matter

    sdwy2008,

    That really depends on the organism you are working on and if that organism is expected to have most of the genes duplicated. For instance, if an organism recently went through a whole genome duplication event then I might not expect to find many uniquely mappable reads. I work on soybean which has several whole genome duplication events, the most recent being about 13 million years ago and most of our reads are a mixture of uniquely mappable and highly repetitive (5% would be very low in this system from what I have seen). I have several data sets of Illumina 36-bp reads obtained from RNA.

    As for usability, it depends on what you want to determine using the data and as I mentioned above what kind of system you are working on. If you are looking to do an RNA-Seq analysis and have a good genome annotation try aligning to just the gene models rather than the entire genome. If you don't have good annotation consider aligning Solexa data to a 454 data run on RNA.

    It is challenging to answer your question without more information. I am leaving town tomorrow for Italy and will have unknown access to the internet. Best of luck

    Andrew

    Comment


    • #3
      Originally posted by sdwy2008 View Post
      Hi All,

      Would these data be usable?
      No. Even with a mammalian genome, you should be getting something like 50% uniquely mapping. 5% is so low that you don't just have less data, you almost certainly have bad data. Sorry.

      Unless you made some trivial error with the software, something is wrong with your sample.

      Comment


      • #4
        Hi all,

        I also have the same problem like sdwy2008. But my sequence is 36bp long. My case is supposedly mapping to human mRNA. I'm using Bowtie and I downloaded a few refseq related to human such as human.rna.fna, refseqgene.genomic.fna, complete1.rna.fna, bowtie pre-built indexes of human. I've tried map my raw sequence to all those references but I got mappable gene with 25-30% only.

        Actually, I want to know is it because of the references I chose is wrong? I mean, I tried to look for refseq human_mRNA release 28 but I couldn't find it. Can anyone help me where I can get this reference please?

        Comment


        • #5
          Unless you know what you are doing it is probably better to use the standard approach of mapping even mRNA-Seq data to the genome, not to the transcriptome.

          This might even help you find the root of your problem: After all, if you map against an mRNA-only reference, you won't notice if most of your reads map on rRNA which would point to a problem with the rRNA-depletion step of your sample preparation.

          Also check your sample for low base-call quality or strange base compositions (e.g., using htseq-qa).

          Simon

          Comment


          • #6
            Originally posted by Simon Anders View Post
            Unless you know what you are doing it is probably better to use the standard approach of mapping even mRNA-Seq data to the genome, not to the transcriptome.

            This might even help you find the root of your problem: After all, if you map against an mRNA-only reference, you won't notice if most of your reads map on rRNA which would point to a problem with the rRNA-depletion step of your sample preparation.

            Also check your sample for low base-call quality or strange base compositions (e.g., using htseq-qa).

            Simon

            Sorry, I did not include more information about my experiment and analysis. The microorganism I am working on is some bacterium. I did have a (16s and 23s) rRNA-depletion step before the sequencing, but there were still some rRNA left in the treated samples, I think.

            I used Maq %map and %pileup to map my reads to the reference genome. With the default setting in %Maq map, I can map about 90% reads to the reference genome. However, in %Maq pileup, when I set the "-q INT Minimum mapping quality allowed for a read to be used" to 30, I had only <= 5% reads mapped UNIQUELY.

            So, by set q to 30, I thought the left reads are the UNIQUE mapped ones. However, I just took the q value from some similar published paper (http://www.plosgenetics.org/article/...l.pgen.1000569). I do not really know whether the setting is right??

            Could somebody who knows more about Maq give me some suggestion?

            Please also see my post at following link:

            Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


            Thanks!
            Last edited by sdwy2008; 06-07-2010, 12:16 PM.

            Comment


            • #7
              Originally posted by Simon Anders View Post
              Unless you know what you are doing it is probably better to use the standard approach of mapping even mRNA-Seq data to the genome, not to the transcriptome.

              This might even help you find the root of your problem: After all, if you map against an mRNA-only reference, you won't notice if most of your reads map on rRNA which would point to a problem with the rRNA-depletion step of your sample preparation.

              Also check your sample for low base-call quality or strange base compositions (e.g., using htseq-qa).

              Simon
              I forgot to mention that I'm doing differential expression gene analysis. I have 9 samples (tumor and control) of whole transcriptome sequencing. That's the reason I want to map to mRNA to look for the genes which expressed in this study. I also planned to map back to the genome. Will do after this.
              Actually I was confused between transcriptome assembly and transcriptome mapping. I mean in my case, should I do the assembly too? what will be recommended pipeline to do this analysis? I read in other thread of analysis pipeline http://seqanswers.com/forums/showthread.php?t=5248 and I have another person who did such analysis doing the same thing. Can somebody share their experience on this?

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:37 PM
              0 responses
              11 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 06:07 PM
              0 responses
              10 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              51 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              68 views
              0 likes
              Last Post seqadmin  
              Working...
              X