Seqanswers Leaderboard Ad

**dpryan** · 06-13-2014, 03:06 PM

How many times do you need to roll a pair of dice to be guaranteed to see every combination at least once? When you know why the answer is "infinite times", you'll know why you shouldn't expect 100% coverage (this greatly oversimplifies things, of course).

The goal isn't 100% coverage, but an average coverage of some fold (10x, 4x, whatever).

**sewellh** · 06-13-2014, 03:13 PM

I'm not sure I understand. You're saying if I map back the raw reads to the sequence created from those same raw reads, I should not expect that they will cover the entire sequence?

**sewellh** · 06-13-2014, 03:17 PM

Maybe I wasn't clear in my explanation (I apologize, I'm new to all of this). What I'm trying to do is map back the raw reads that created the sequence, to see what the coverage is. Unfortunately, I am seeing that they only cover 75% of the sequence they were used to create, regardless of the depth of coverage.

**Brian Bushnell** · 06-13-2014, 04:15 PM

It's important to place ambiguously-mapped reads randomly or to all possible locations if you want to analyze coverage. What's your mapping command line?

But as dpryan said, what's most important is that you get high enough coverage for whatever your purpose is. You can estimate coverage with a kmer-counter, without even assembling. What are you trying to do, and how were the contigs generated?

**sewellh** · 06-13-2014, 04:21 PM

I didn't do the original assembly, but the contigs were generated via the SPAdes assembler.

To map the raw reads back to the assembled sequence I used the following:

bowtie2 -p 2 -x DscP-kaster -1 KM01_R1.fastq -2 KM01_R2.fastq -S KM01_bowtie.sam

**westerman** · 06-16-2014, 09:32 AM

Originally posted by sewellh View Post

Maybe I wasn't clear in my explanation (I apologize, I'm new to all of this). What I'm trying to do is map back the raw reads that created the sequence, to see what the coverage is. Unfortunately, I am seeing that they only cover 75% of the sequence they were used to create, regardless of the depth of coverage.

I suspect that this means that 25% of your contigs -- which, from what I gather, you generated via a denovo SPADES assembly of your reads -- are incorrect or at least a poorer representation of the reads than the other contigs. While the 25% number is high I am not surprised that there are some of your contigs which are not the best ones to use for back-mapping of reads.

If you have not already do so then I suggest only looking at the long contigs. 500+ bases is my usual cutoff. That will get rid of the outliers and make your back-mapping better.

Looking at one of my recent bacterial projects I am able to find 100% mapping to the 500+bp contigs. Some of the contigs have very low number of reads back-mapping but at least all were found. This is at around 200x coverage.

Looking at an avian project (where my cutoff was 200bp contigs) with about 15x coverage I am able to get around 98% of the contigs to have reads back-mapped to them.

This was using Bowtie2. BWA would be similar.

Topics	Statistics	Last Post
A Closer Look at the Enigmatic Genomes of Oikopleura dioica by seqadmin Started by seqadmin, 05-10-2024, 06:35 AM	0 responses 20 views 0 likes	Last Post by seqadmin 05-10-2024, 06:35 AM
Advanced Epigenome Editing Platform Explores Gene Regulation Mechanisms by seqadmin Started by seqadmin, 05-09-2024, 02:46 PM	0 responses 26 views 0 likes	Last Post by seqadmin 05-09-2024, 02:46 PM
Telomere Maintenance by PARP1: A New Perspective in Cancer Research by seqadmin Started by seqadmin, 05-07-2024, 06:57 AM	0 responses 21 views 0 likes	Last Post by seqadmin 05-07-2024, 06:57 AM
Enhanced Neoantigen Detection: Introducing NeoHunter by seqadmin Started by seqadmin, 05-06-2024, 07:17 AM	0 responses 21 views 0 likes	Last Post by seqadmin 05-06-2024, 07:17 AM

Seqanswers Leaderboard Ad

Announcement

Bowtie alignments not matching 100%

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News