Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bowtie alignments not matching 100%

    I'm trying to decide if our data shows that we require the depth of HiSeq or if we could do MiSeq.

    What I have done is concatenated all the contigs into one fasta file which I used to create an index in Bowtie.

    Then, I tried aligning the raw reads to this index using Bowtie.

    Unfortunately, I found that only 75% of my "genome" was covered. Is there something I'm missing?

  • #2
    How many times do you need to roll a pair of dice to be guaranteed to see every combination at least once? When you know why the answer is "infinite times", you'll know why you shouldn't expect 100% coverage (this greatly oversimplifies things, of course).

    The goal isn't 100% coverage, but an average coverage of some fold (10x, 4x, whatever).

    Comment


    • #3
      I'm not sure I understand. You're saying if I map back the raw reads to the sequence created from those same raw reads, I should not expect that they will cover the entire sequence?

      Comment


      • #4
        Maybe I wasn't clear in my explanation (I apologize, I'm new to all of this). What I'm trying to do is map back the raw reads that created the sequence, to see what the coverage is. Unfortunately, I am seeing that they only cover 75% of the sequence they were used to create, regardless of the depth of coverage.
        Last edited by sewellh; 06-13-2014, 03:25 PM.

        Comment


        • #5
          It's important to place ambiguously-mapped reads randomly or to all possible locations if you want to analyze coverage. What's your mapping command line?

          But as dpryan said, what's most important is that you get high enough coverage for whatever your purpose is. You can estimate coverage with a kmer-counter, without even assembling. What are you trying to do, and how were the contigs generated?

          Comment


          • #6
            I didn't do the original assembly, but the contigs were generated via the SPAdes assembler.

            To map the raw reads back to the assembled sequence I used the following:

            bowtie2 -p 2 -x DscP-kaster -1 KM01_R1.fastq -2 KM01_R2.fastq -S KM01_bowtie.sam

            Comment


            • #7
              Originally posted by sewellh View Post
              Maybe I wasn't clear in my explanation (I apologize, I'm new to all of this). What I'm trying to do is map back the raw reads that created the sequence, to see what the coverage is. Unfortunately, I am seeing that they only cover 75% of the sequence they were used to create, regardless of the depth of coverage.
              I suspect that this means that 25% of your contigs -- which, from what I gather, you generated via a denovo SPADES assembly of your reads -- are incorrect or at least a poorer representation of the reads than the other contigs. While the 25% number is high I am not surprised that there are some of your contigs which are not the best ones to use for back-mapping of reads.

              If you have not already do so then I suggest only looking at the long contigs. 500+ bases is my usual cutoff. That will get rid of the outliers and make your back-mapping better.


              Looking at one of my recent bacterial projects I am able to find 100% mapping to the 500+bp contigs. Some of the contigs have very low number of reads back-mapping but at least all were found. This is at around 200x coverage.

              Looking at an avian project (where my cutoff was 200bp contigs) with about 15x coverage I am able to get around 98% of the contigs to have reads back-mapped to them.

              This was using Bowtie2. BWA would be similar.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Recent Advances in Sequencing Analysis Tools
                by seqadmin


                The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                05-06-2024, 07:48 AM
              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 05-10-2024, 06:35 AM
              0 responses
              20 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-09-2024, 02:46 PM
              0 responses
              26 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-07-2024, 06:57 AM
              0 responses
              21 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-06-2024, 07:17 AM
              0 responses
              21 views
              0 likes
              Last Post seqadmin  
              Working...
              X