Seqanswers Leaderboard Ad

**GenoMax** · 12-06-2013, 04:06 AM

While it probably won't give you the gross % value you are looking for "genomecov" from BEDTools would be a good starting point to experiment: http://bedtools.readthedocs.org/en/l...genomecov.html

**jgibbons1** · 12-06-2013, 08:07 AM

If I understand you correctly, you'd like to know the average coverage of your read set. Is that right?

You may consider using the samtools depth function. This calculates depth for every base of the reference. You can then take the sum of the depth value and divide by non 'N' base pairs in hg19.

This will give you a quick and dirty estimate.

samtools depth you.file.bam | awk '{sum=sum+$3} END {print sum}'

(this will give you the sum of depth values)

**jgibbons1** · 12-06-2013, 08:09 AM

If you just want to know how many reads are covered, you can use the same tool, as it does not report '0' depth values. You can simply subtract "covered" bases from total bases.

**papori** · 01-19-2014, 02:32 AM

Originally posted by jgibbons1 View Post

If I understand you correctly, you'd like to know the average coverage of your read set. Is that right?

You may consider using the samtools depth function. This calculates depth for every base of the reference. You can then take the sum of the depth value and divide by non 'N' base pairs in hg19.

This will give you a quick and dirty estimate.

samtools depth you.file.bam | awk '{sum=sum+$3} END {print sum}'

(this will give you the sum of depth values)

I want to know how many bases from the reference has hit from my reads.
I dont care from the depth.
Is it clearer?
Thanks

**papori** · 01-19-2014, 02:36 AM

Originally posted by GenoMax View Post

While it probably won't give you the gross % value you are looking for "genomecov" from BEDTools would be a good starting point to experiment: http://bedtools.readthedocs.org/en/l...genomecov.html

This function is great, but the output is huge!!! more than 2TB per bam file..
This is to much for my storage

Any solutions?

**dpryan** · 01-19-2014, 03:58 AM

Originally posted by papori View Post

This function is great, but the output is huge!!! more than 2TB per bam file..
This is to much for my storage

Any solutions?

Just don't store the output then. If you just want to know how many bases have a read aligning to them, just use the "-d" option and then pipe the output to awk:

Code:

bedtools genomecov ...stuff.. | awk '{if($3==0) { nocov+=1;}else{cov+=1}}END{printf("%i have hits and %i do not\n",cov,nocov)}'

That will occupy no space and be faster since you don't have to write anything to disk.

**papori** · 01-19-2014, 01:14 PM

Originally posted by dpryan View Post

Just don't store the output then. If you just want to know how many bases have a read aligning to them, just use the "-d" option and then pipe the output to awk:

Code:

bedtools genomecov ...stuff.. | awk '{if($3==0) { nocov+=1;}else{cov+=1}}END{printf("%i have hits and %i do not\n",cov,nocov)}'

That will occupy no space and be faster since you don't have to write anything to disk.

Is it report 1 for base which have also more than 1 hits?
I guess yes is the answer, just want to be sure..

Many Thanks!

**dpryan** · 01-19-2014, 01:17 PM

It does. A base with a million reads mapping to it counts the same as another base with only a single read covering it.

You'll find awk and the other standard unix tools extremely useful.

**papori** · 01-19-2014, 01:19 PM

Originally posted by dpryan View Post

It does. A base with a million reads mapping to it counts the same as another base with only a single read covering it.

You'll find awk and the other standard unix tools extremely useful.

Thanks!
Very Help

**SeqMlife** · 02-18-2014, 09:35 PM

Hi papori,
I am currently doing something similar to you.
Could you share one of the example final output from the code?
like cov= ,ncov=?
The result I got is extremely huge.
879370384 hits and 1984311365 no hits. Not sure if I was doing it right.
Thanks for your reply!

Topics	Statistics	Last Post
ASHG 2024 Highlights – Part Two by seqadmin Started by seqadmin, Today, 11:09 AM	0 responses 23 views 0 likes	Last Post by seqadmin Today, 11:09 AM
ASHG 2024 Highlights – Part One by seqadmin Started by seqadmin, Today, 06:13 AM	0 responses 20 views 0 likes	Last Post by seqadmin Today, 06:13 AM
Seq-Scope Expands Possibilities for High-Resolution Gene Expression Analysis by seqadmin Started by seqadmin, 11-01-2024, 06:09 AM	0 responses 30 views 0 likes	Last Post by seqadmin 11-01-2024, 06:09 AM
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks by seqadmin Started by seqadmin, 10-30-2024, 05:31 AM	0 responses 21 views 0 likes	Last Post by seqadmin 10-30-2024, 05:31 AM

Seqanswers Leaderboard Ad

Announcement

coverage precentage of the hg19 using bowtie2

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News