Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Lots of unmapped reads - SOLiD bacterial RNA-seq and bowtie mapping

    We've completed several bacterial RNA-seq experiments but have many unmapped reads leftover and I am not sure why.

    Experimental details:
    - RNA was isolated using a TRIzol protocol and 1-2 rounds of MICROBExpress was used to deplete the rRNA
    - All samples were checked using an Agilent Bioanalyzer for quality and rRNA depletion

    Sequencing details:
    - All library prep and sequencing was performed by a core facility. Samples were individually barcoded and sequenced (50bp single-end) on a single SOLiD 4 run

    Mapping outcome
    - We're currently using bowtie to map reads in colorspace against a reference. For a pure culture for which we have a complete genome to map to we get approx. 20-50% reads mapped.


    What I have tried:
    - Trimming reads (tried removing up to 15 bases) - only 1-3% more mappings
    - SAET to correct colorspace - worse mapping
    - Other mappers (cufflinks and SHRiMP that I can remember) - No difference in mapping or lower mapping
    - Transcript assembly (Velvet + Oases) - N50 was ridiculously low and most of the "large" contigs matched anything in the NCBI nrdb
    - converting to basespace for mapping (I think I tried a script that came with MAQ) - miserable failure


    A side issue:
    - The bowtie SAM output will print out the basespace of reads that have no hits to the reference. How does bowtie get this sequence if there is no matching reference? I've examined these unmapped reads and they don't match anything in the NCBI database.

    The only thing I can think of now is that the library preps were bad?

    Thanks for any thoughts and input!
    Last edited by Jean; 02-10-2011, 12:08 PM.

  • #2
    Just some thoughts:

    1) select a more aggressive trim
    2) try bwa
    3) look at the unmappeds, do they look right?
    4) take a random sampling of the unmapped reads and do a blast, are they hitting other organisms?

    Comment


    • #3
      What settings did you use for bowtie? If using standard settings, try -n3 -l 24 -e 200 as defaults are for Illumina reads.

      Look at the average qv per position, if you have a pattern with positions with low qv (eg every 5th base), try BFAST with appropriate masks.

      bwa is not going to help, neither is blasting unmapped since you do not have the (correct) sequence... I would try also velvet on unmapped reads in case there are rRNA sequences not found in the reference.

      Comment


      • #4
        Thanks for the ideas.

        Bowtie mapping:
        Since we have also been mapping to a mixed population I have used the following to get best hit maps:
        Code:
        bowtie -S -C -3 10 --threads 6 --best -M 1
        And as mentioned, for a pure culture I get these results:
        Code:
        # reads processed: 55208429
        # reads with at least one reported alignment: 13639433 (24.71%)
        # reads that failed to align: 29756435 (53.90%)
        # reads with alignments sampled due to -M: 11812561 (21.40%)
        Based on suggestions here, I tried trimming 20bp, and I also tried settings suggested by Chipper:
        Code:
        bowtie -S -C -3 20 --best -M 1
        # reads processed: 55208429
        # reads with at least one reported alignment: 15156628 (27.45%)
        # reads that failed to align: 27899931 (50.54%)
        # reads with alignments sampled due to -M: 12151870 (22.01%)
        Code:
        bowtie -S -C -t -n 3 -l 24 -e 200
        # reads processed: 55208429
        # reads with at least one reported alignment: 29658102 (53.72%)
        # reads that failed to align: 25550327 (46.28%)
        As you can see, this doesn't seem to affect the unmapped portion much.

        As for other suggestions:
        - Chipper is correct that the converted unmapped reads are not in the right basespace so I have BLASTed them, but they do not match anything (see my "side point" above)
        - I have tried assembling with Velvet and did not get significant contigs
        - I know half of the mappable reads are 23s and I have mapped against bacterial databases of 23s and cpn60 and there is nothing extra pulled out
        - Admittedly I have not looked at the quality scores in depth so I will do that. Would you suggest looking at the SOLiD qual files, or the bowtie output (unmapped reads)?

        Comment


        • #5
          We had something similar and half our reads would not map and we were told it is most likely during the emulsion PCR a lot of chimeric beads are made instead of those with a single read and they just fall out of mapping. So when we do SOLiD we just assume we are going to lose half our reads.

          Comment


          • #6
            Originally posted by mnkyboy View Post
            We had something similar and half our reads would not map and we were told it is most likely during the emulsion PCR a lot of chimeric beads are made instead of those with a single read and they just fall out of mapping. So when we do SOLiD we just assume we are going to lose half our reads.
            I'm suspecting that is the problem here. We were supposed to get 1.4bil reads, but only got 500mil, then half of them are unmappable, and half of those are rRNA.

            Comment


            • #7
              Not sure how useful it is for Solid data (I guess the SAM/BAM input function should work fine after alignment), but I'd recommend Fastqc for looking at the per base quality.

              Comment


              • #8
                Something similar happened with my RNA-seq data done on SOLiD 4, expecting billions of reads, got 500 million, 10-20% of them mapped. I have heard that RNA-seq mapping is always lower than gDNA. Perhaps Illumina has a higher mapping rate?

                Still got good coverage/results - I love working with bacteria!

                Comment


                • #9
                  50% hit with SOLiD is not bad. SOLiD machines generate an order of magnitude more reads than Illumina, but they have more noise as well.
                  http://homolog.us

                  Comment


                  • #10
                    I have the same problem with low mapping (SOLiD reads using bowtie). I found a weird thing is that some of the unmapped reads are actually mapped. I have posted a thread but it seems nobody has any idea why.
                    I know this sounds ridiculous: you can try add 20bp random sequences to the start and end of your reference genomes. Redo the mapping. Filter out reads mapping to 20bp random seqs. You probably see an increase mapping.

                    Comment


                    • #11
                      I strongly suspect a sample preparation problem. We have sequenced many different bacteria using SOLiD 3 & 4 platforms and typically achieve 75-85% mapping. However, we are very careful to reject poor quality samples to start with (garbage in, garbage out) and make sure that all of the QA/QC steps are correct. You should not have to trim you sequences much to get excellent mapping results.

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Genetic Variation in Immunogenetics and Antibody Diversity
                        by seqadmin



                        The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
                        11-06-2024, 07:24 PM
                      • seqadmin
                        Choosing Between NGS and qPCR
                        by seqadmin



                        Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                        10-18-2024, 07:11 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, Today, 11:09 AM
                      0 responses
                      24 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, Today, 06:13 AM
                      0 responses
                      20 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 11-01-2024, 06:09 AM
                      0 responses
                      30 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 10-30-2024, 05:31 AM
                      0 responses
                      21 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X