Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Lots of unmapped reads - SOLiD bacterial RNA-seq and bowtie mapping

    We've completed several bacterial RNA-seq experiments but have many unmapped reads leftover and I am not sure why.

    Experimental details:
    - RNA was isolated using a TRIzol protocol and 1-2 rounds of MICROBExpress was used to deplete the rRNA
    - All samples were checked using an Agilent Bioanalyzer for quality and rRNA depletion

    Sequencing details:
    - All library prep and sequencing was performed by a core facility. Samples were individually barcoded and sequenced (50bp single-end) on a single SOLiD 4 run

    Mapping outcome
    - We're currently using bowtie to map reads in colorspace against a reference. For a pure culture for which we have a complete genome to map to we get approx. 20-50% reads mapped.


    What I have tried:
    - Trimming reads (tried removing up to 15 bases) - only 1-3% more mappings
    - SAET to correct colorspace - worse mapping
    - Other mappers (cufflinks and SHRiMP that I can remember) - No difference in mapping or lower mapping
    - Transcript assembly (Velvet + Oases) - N50 was ridiculously low and most of the "large" contigs matched anything in the NCBI nrdb
    - converting to basespace for mapping (I think I tried a script that came with MAQ) - miserable failure


    A side issue:
    - The bowtie SAM output will print out the basespace of reads that have no hits to the reference. How does bowtie get this sequence if there is no matching reference? I've examined these unmapped reads and they don't match anything in the NCBI database.

    The only thing I can think of now is that the library preps were bad?

    Thanks for any thoughts and input!
    Last edited by Jean; 02-10-2011, 12:08 PM.

  • #2
    Just some thoughts:

    1) select a more aggressive trim
    2) try bwa
    3) look at the unmappeds, do they look right?
    4) take a random sampling of the unmapped reads and do a blast, are they hitting other organisms?

    Comment


    • #3
      What settings did you use for bowtie? If using standard settings, try -n3 -l 24 -e 200 as defaults are for Illumina reads.

      Look at the average qv per position, if you have a pattern with positions with low qv (eg every 5th base), try BFAST with appropriate masks.

      bwa is not going to help, neither is blasting unmapped since you do not have the (correct) sequence... I would try also velvet on unmapped reads in case there are rRNA sequences not found in the reference.

      Comment


      • #4
        Thanks for the ideas.

        Bowtie mapping:
        Since we have also been mapping to a mixed population I have used the following to get best hit maps:
        Code:
        bowtie -S -C -3 10 --threads 6 --best -M 1
        And as mentioned, for a pure culture I get these results:
        Code:
        # reads processed: 55208429
        # reads with at least one reported alignment: 13639433 (24.71%)
        # reads that failed to align: 29756435 (53.90%)
        # reads with alignments sampled due to -M: 11812561 (21.40%)
        Based on suggestions here, I tried trimming 20bp, and I also tried settings suggested by Chipper:
        Code:
        bowtie -S -C -3 20 --best -M 1
        # reads processed: 55208429
        # reads with at least one reported alignment: 15156628 (27.45%)
        # reads that failed to align: 27899931 (50.54%)
        # reads with alignments sampled due to -M: 12151870 (22.01%)
        Code:
        bowtie -S -C -t -n 3 -l 24 -e 200
        # reads processed: 55208429
        # reads with at least one reported alignment: 29658102 (53.72%)
        # reads that failed to align: 25550327 (46.28%)
        As you can see, this doesn't seem to affect the unmapped portion much.

        As for other suggestions:
        - Chipper is correct that the converted unmapped reads are not in the right basespace so I have BLASTed them, but they do not match anything (see my "side point" above)
        - I have tried assembling with Velvet and did not get significant contigs
        - I know half of the mappable reads are 23s and I have mapped against bacterial databases of 23s and cpn60 and there is nothing extra pulled out
        - Admittedly I have not looked at the quality scores in depth so I will do that. Would you suggest looking at the SOLiD qual files, or the bowtie output (unmapped reads)?

        Comment


        • #5
          We had something similar and half our reads would not map and we were told it is most likely during the emulsion PCR a lot of chimeric beads are made instead of those with a single read and they just fall out of mapping. So when we do SOLiD we just assume we are going to lose half our reads.

          Comment


          • #6
            Originally posted by mnkyboy View Post
            We had something similar and half our reads would not map and we were told it is most likely during the emulsion PCR a lot of chimeric beads are made instead of those with a single read and they just fall out of mapping. So when we do SOLiD we just assume we are going to lose half our reads.
            I'm suspecting that is the problem here. We were supposed to get 1.4bil reads, but only got 500mil, then half of them are unmappable, and half of those are rRNA.

            Comment


            • #7
              Not sure how useful it is for Solid data (I guess the SAM/BAM input function should work fine after alignment), but I'd recommend Fastqc for looking at the per base quality.

              Comment


              • #8
                Something similar happened with my RNA-seq data done on SOLiD 4, expecting billions of reads, got 500 million, 10-20% of them mapped. I have heard that RNA-seq mapping is always lower than gDNA. Perhaps Illumina has a higher mapping rate?

                Still got good coverage/results - I love working with bacteria!

                Comment


                • #9
                  50% hit with SOLiD is not bad. SOLiD machines generate an order of magnitude more reads than Illumina, but they have more noise as well.
                  http://homolog.us

                  Comment


                  • #10
                    I have the same problem with low mapping (SOLiD reads using bowtie). I found a weird thing is that some of the unmapped reads are actually mapped. I have posted a thread but it seems nobody has any idea why.
                    I know this sounds ridiculous: you can try add 20bp random sequences to the start and end of your reference genomes. Redo the mapping. Filter out reads mapping to 20bp random seqs. You probably see an increase mapping.

                    Comment


                    • #11
                      I strongly suspect a sample preparation problem. We have sequenced many different bacteria using SOLiD 3 & 4 platforms and typically achieve 75-85% mapping. However, we are very careful to reject poor quality samples to start with (garbage in, garbage out) and make sure that all of the QA/QC steps are correct. You should not have to trim you sequences much to get excellent mapping results.

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Understanding Genetic Influence on Infectious Disease
                        by seqadmin




                        During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

                        Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
                        09-09-2024, 10:59 AM
                      • seqadmin
                        Addressing Off-Target Effects in CRISPR Technologies
                        by seqadmin






                        The first FDA-approved CRISPR-based therapy marked the transition of therapeutic gene editing from a dream to reality1. CRISPR technologies have streamlined gene editing, and CRISPR screens have become an important approach for identifying genes involved in disease processes2. This technique introduces targeted mutations across numerous genes, enabling large-scale identification of gene functions, interactions, and pathways3. Identifying the full range...
                        08-27-2024, 04:44 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 09-06-2024, 08:02 AM
                      0 responses
                      143 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 09-03-2024, 08:30 AM
                      0 responses
                      146 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 08-27-2024, 04:40 AM
                      0 responses
                      157 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 08-22-2024, 05:00 AM
                      0 responses
                      400 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X