Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bowtie alignments not matching 100%

    I'm trying to decide if our data shows that we require the depth of HiSeq or if we could do MiSeq.

    What I have done is concatenated all the contigs into one fasta file which I used to create an index in Bowtie.

    Then, I tried aligning the raw reads to this index using Bowtie.

    Unfortunately, I found that only 75% of my "genome" was covered. Is there something I'm missing?

  • #2
    How many times do you need to roll a pair of dice to be guaranteed to see every combination at least once? When you know why the answer is "infinite times", you'll know why you shouldn't expect 100% coverage (this greatly oversimplifies things, of course).

    The goal isn't 100% coverage, but an average coverage of some fold (10x, 4x, whatever).

    Comment


    • #3
      I'm not sure I understand. You're saying if I map back the raw reads to the sequence created from those same raw reads, I should not expect that they will cover the entire sequence?

      Comment


      • #4
        Maybe I wasn't clear in my explanation (I apologize, I'm new to all of this). What I'm trying to do is map back the raw reads that created the sequence, to see what the coverage is. Unfortunately, I am seeing that they only cover 75% of the sequence they were used to create, regardless of the depth of coverage.
        Last edited by sewellh; 06-13-2014, 03:25 PM.

        Comment


        • #5
          It's important to place ambiguously-mapped reads randomly or to all possible locations if you want to analyze coverage. What's your mapping command line?

          But as dpryan said, what's most important is that you get high enough coverage for whatever your purpose is. You can estimate coverage with a kmer-counter, without even assembling. What are you trying to do, and how were the contigs generated?

          Comment


          • #6
            I didn't do the original assembly, but the contigs were generated via the SPAdes assembler.

            To map the raw reads back to the assembled sequence I used the following:

            bowtie2 -p 2 -x DscP-kaster -1 KM01_R1.fastq -2 KM01_R2.fastq -S KM01_bowtie.sam

            Comment


            • #7
              Originally posted by sewellh View Post
              Maybe I wasn't clear in my explanation (I apologize, I'm new to all of this). What I'm trying to do is map back the raw reads that created the sequence, to see what the coverage is. Unfortunately, I am seeing that they only cover 75% of the sequence they were used to create, regardless of the depth of coverage.
              I suspect that this means that 25% of your contigs -- which, from what I gather, you generated via a denovo SPADES assembly of your reads -- are incorrect or at least a poorer representation of the reads than the other contigs. While the 25% number is high I am not surprised that there are some of your contigs which are not the best ones to use for back-mapping of reads.

              If you have not already do so then I suggest only looking at the long contigs. 500+ bases is my usual cutoff. That will get rid of the outliers and make your back-mapping better.


              Looking at one of my recent bacterial projects I am able to find 100% mapping to the 500+bp contigs. Some of the contigs have very low number of reads back-mapping but at least all were found. This is at around 200x coverage.

              Looking at an avian project (where my cutoff was 200bp contigs) with about 15x coverage I am able to get around 98% of the contigs to have reads back-mapped to them.

              This was using Bowtie2. BWA would be similar.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Non-Coding RNA Research and Technologies
                by seqadmin




                Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                Nobel Prize for MicroRNA Discovery
                This week,...
                10-07-2024, 08:07 AM
              • seqadmin
                Recent Developments in Metagenomics
                by seqadmin





                Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
                09-23-2024, 06:35 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Today, 06:35 AM
              0 responses
              7 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 02:44 PM
              0 responses
              7 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 10-11-2024, 06:55 AM
              0 responses
              15 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 10-02-2024, 04:51 AM
              0 responses
              112 views
              0 likes
              Last Post seqadmin  
              Working...
              X