Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Lots of unmapped reads - SOLiD bacterial RNA-seq and bowtie mapping

    We've completed several bacterial RNA-seq experiments but have many unmapped reads leftover and I am not sure why.

    Experimental details:
    - RNA was isolated using a TRIzol protocol and 1-2 rounds of MICROBExpress was used to deplete the rRNA
    - All samples were checked using an Agilent Bioanalyzer for quality and rRNA depletion

    Sequencing details:
    - All library prep and sequencing was performed by a core facility. Samples were individually barcoded and sequenced (50bp single-end) on a single SOLiD 4 run

    Mapping outcome
    - We're currently using bowtie to map reads in colorspace against a reference. For a pure culture for which we have a complete genome to map to we get approx. 20-50% reads mapped.


    What I have tried:
    - Trimming reads (tried removing up to 15 bases) - only 1-3% more mappings
    - SAET to correct colorspace - worse mapping
    - Other mappers (cufflinks and SHRiMP that I can remember) - No difference in mapping or lower mapping
    - Transcript assembly (Velvet + Oases) - N50 was ridiculously low and most of the "large" contigs matched anything in the NCBI nrdb
    - converting to basespace for mapping (I think I tried a script that came with MAQ) - miserable failure


    A side issue:
    - The bowtie SAM output will print out the basespace of reads that have no hits to the reference. How does bowtie get this sequence if there is no matching reference? I've examined these unmapped reads and they don't match anything in the NCBI database.

    The only thing I can think of now is that the library preps were bad?

    Thanks for any thoughts and input!
    Last edited by Jean; 02-10-2011, 12:08 PM.

  • #2
    Just some thoughts:

    1) select a more aggressive trim
    2) try bwa
    3) look at the unmappeds, do they look right?
    4) take a random sampling of the unmapped reads and do a blast, are they hitting other organisms?

    Comment


    • #3
      What settings did you use for bowtie? If using standard settings, try -n3 -l 24 -e 200 as defaults are for Illumina reads.

      Look at the average qv per position, if you have a pattern with positions with low qv (eg every 5th base), try BFAST with appropriate masks.

      bwa is not going to help, neither is blasting unmapped since you do not have the (correct) sequence... I would try also velvet on unmapped reads in case there are rRNA sequences not found in the reference.

      Comment


      • #4
        Thanks for the ideas.

        Bowtie mapping:
        Since we have also been mapping to a mixed population I have used the following to get best hit maps:
        Code:
        bowtie -S -C -3 10 --threads 6 --best -M 1
        And as mentioned, for a pure culture I get these results:
        Code:
        # reads processed: 55208429
        # reads with at least one reported alignment: 13639433 (24.71%)
        # reads that failed to align: 29756435 (53.90%)
        # reads with alignments sampled due to -M: 11812561 (21.40%)
        Based on suggestions here, I tried trimming 20bp, and I also tried settings suggested by Chipper:
        Code:
        bowtie -S -C -3 20 --best -M 1
        # reads processed: 55208429
        # reads with at least one reported alignment: 15156628 (27.45%)
        # reads that failed to align: 27899931 (50.54%)
        # reads with alignments sampled due to -M: 12151870 (22.01%)
        Code:
        bowtie -S -C -t -n 3 -l 24 -e 200
        # reads processed: 55208429
        # reads with at least one reported alignment: 29658102 (53.72%)
        # reads that failed to align: 25550327 (46.28%)
        As you can see, this doesn't seem to affect the unmapped portion much.

        As for other suggestions:
        - Chipper is correct that the converted unmapped reads are not in the right basespace so I have BLASTed them, but they do not match anything (see my "side point" above)
        - I have tried assembling with Velvet and did not get significant contigs
        - I know half of the mappable reads are 23s and I have mapped against bacterial databases of 23s and cpn60 and there is nothing extra pulled out
        - Admittedly I have not looked at the quality scores in depth so I will do that. Would you suggest looking at the SOLiD qual files, or the bowtie output (unmapped reads)?

        Comment


        • #5
          We had something similar and half our reads would not map and we were told it is most likely during the emulsion PCR a lot of chimeric beads are made instead of those with a single read and they just fall out of mapping. So when we do SOLiD we just assume we are going to lose half our reads.

          Comment


          • #6
            Originally posted by mnkyboy View Post
            We had something similar and half our reads would not map and we were told it is most likely during the emulsion PCR a lot of chimeric beads are made instead of those with a single read and they just fall out of mapping. So when we do SOLiD we just assume we are going to lose half our reads.
            I'm suspecting that is the problem here. We were supposed to get 1.4bil reads, but only got 500mil, then half of them are unmappable, and half of those are rRNA.

            Comment


            • #7
              Not sure how useful it is for Solid data (I guess the SAM/BAM input function should work fine after alignment), but I'd recommend Fastqc for looking at the per base quality.

              Comment


              • #8
                Something similar happened with my RNA-seq data done on SOLiD 4, expecting billions of reads, got 500 million, 10-20% of them mapped. I have heard that RNA-seq mapping is always lower than gDNA. Perhaps Illumina has a higher mapping rate?

                Still got good coverage/results - I love working with bacteria!

                Comment


                • #9
                  50% hit with SOLiD is not bad. SOLiD machines generate an order of magnitude more reads than Illumina, but they have more noise as well.
                  http://homolog.us

                  Comment


                  • #10
                    I have the same problem with low mapping (SOLiD reads using bowtie). I found a weird thing is that some of the unmapped reads are actually mapped. I have posted a thread but it seems nobody has any idea why.
                    I know this sounds ridiculous: you can try add 20bp random sequences to the start and end of your reference genomes. Redo the mapping. Filter out reads mapping to 20bp random seqs. You probably see an increase mapping.

                    Comment


                    • #11
                      I strongly suspect a sample preparation problem. We have sequenced many different bacteria using SOLiD 3 & 4 platforms and typically achieve 75-85% mapping. However, we are very careful to reject poor quality samples to start with (garbage in, garbage out) and make sure that all of the QA/QC steps are correct. You should not have to trim you sequences much to get excellent mapping results.

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Strategies for Sequencing Challenging Samples
                        by seqadmin


                        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                        03-22-2024, 06:39 AM
                      • seqadmin
                        Techniques and Challenges in Conservation Genomics
                        by seqadmin



                        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                        Avian Conservation
                        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                        03-08-2024, 10:41 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 03-27-2024, 06:37 PM
                      0 responses
                      13 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-27-2024, 06:07 PM
                      0 responses
                      11 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-22-2024, 10:03 AM
                      0 responses
                      53 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-21-2024, 07:32 AM
                      0 responses
                      69 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X