Unconfigured Ad

**gringer** · 06-04-2013, 03:45 PM

Bedtools genomecov should do this.

genomecov — bedtools 2.31.0 documentation

http://bedtools.readthedocs.org/en/latest/content/tools/genomecov.html

The default options produce the following histogram output / fields, which should be easily modifiable for your purposes:

1. chromosome (or entire genome)
2. depth of coverage from features in input file
3. number of bases on chromosome (or genome) with depth equal to column 2.
4. size of chromosome (or entire genome) in base pairs
5. fraction of bases on chromosome (or entire genome) with depth equal to column 2.

If you set the maximum coverage to 1 (-max 1), then you can easily get the coverage fraction for all bases covered at any depth. If you set the max coverage to 20 (or something more suitable for your needs), then you get the covered fraction for well-covered bases.

**sdriscoll** · 06-04-2013, 04:59 PM

I've used 'bedtools coverage' for this in the past but it's a little tricky if you're working with an alternatively spliced transcriptome. For example you can compare your alignments to a GTF with this command and for every feature you get 4 values:

1) The number of features in A that overlapped the B interval.
2) The number of bases in B that had non-zero coverage.
3) The length of the entry in B.
4) The fraction of bases in B that had non-zero coverage.

Where A is your alignments file and B is the GTF file. The thing is you have to parse this output fetching all of the rows corresponding to 'exon' features from the GTF and then group them based on the 'transcript_id' GTF field (which is included in the output) and finally using the values they provide with each exon feature you can compute the total transcript coverage ratio. If you can write perl or python you can do this. Even if you get that far it's still a little deceptive because you don't now which isoform(s) are actually expressed so therein lies the real issue. It does, however, produce a number telling you how well covered the isoforms are.

There's another kinda wacky idea you can try. The above method isn't much different than looking at the coverage of alignments to a transcriptome (rather than a genome) allowing all possible alignments to be reported since the above case will take a single aligned read to the genome and count it towards all features it overlaps. Alternatively you may build an aligner index for the transcriptome reference, align with bowtie using the -a option, make a small bed file that describes each transcript feature in the transcriptome reference and then use 'bedtools coverage' to access the coverage of all features that the reads may align to. This is getting a little messy but for rough estimates it should be OK especially if you can group the output into genes by joining in some gene names and sorting on that column.

Here's kinda how it would work provided you have a genome fasta (genome.fa) file, a GTF gene annotation (genes.gtf) and some reads in a fastq file (reads.fq). You also need to have Tophat installed on your computer (for its gffread utility), bowtie1 and bedtools.

Code:

# make the transcriptome sequence file
gffread -w transcriptome.fa -g genome.fa genes.gtf

# index with samtools
samtools faidx transcriptome.fa

# build bowtie index for the transcriptome
bowtie-build transcriptome_ref transcriptome.fa

# align your reads with bowtie
bowtie -a --best -S transcriptome_ref reads.fq | samtools view -bS -o aligned.bam - 

# parse the fasta index to make a bed file describing the transcripts
cut -f1,2 transcriptome.fa.fai | perl -slane 'print join("\t", $F[0], 0, $F[1]);' > transcriptome_fa.bed

# use bedtools to compute coverage of features
bedtools coverage -abam aligned.bam -b transcriptome_fa.bed > feature_coverages.bed

as long as its understood that this process in no way whatsoever is producing correct counts at the isoform level but you can use this to get an idea of how well the isoforms are potentially covered by your reads. call me crazy but it does work.

**sdriscoll** · 06-04-2013, 09:39 PM

while I'm at it you could even incorporate the results of eXpress or RSEM into this transcript coverage pipeline since those seek to unambiguously assign each read. For eXpress you could use the same bowtie index built from my previous example and then run these commands:

Code:

# express needs alignments with all possible (and reasonable) 
# mapping locations
bowtie -aS -n 2 -e 999 transcriptome_ref reads.fq | samtools view -bS -o aligned.bam - 

# run express making a BAM file with all of the uniquely mapped reads
express --output-align-samp -B 1 transcriptome.fa aligned.bam

the '--output-align-samp' option instructs eXpress to produce a BAM file named 'hits.1.samp.bam' which contains only the single alignments for each read selected by their EM algorithm as the alignment with the highest probability of being correct. in other words the alignments in this file should produce counts from the 'bedtools coverage' command that match up with the 'est_counts' column in the 'results.xprs' file produced by eXpress. You can use this BED file in the same 'bedtools coverage' command I put into my previous post. Now you'll have coverages without any redundant alignments.

**migs54** · 06-05-2013, 04:47 PM

Thanks for the suggestions. I ended up using the following.

coverageBed -abam CCE.bam -s -b mm9.refSeq.Wholegene.bed > CCEcoverage.txt

The final column gave percentage coverage and it matches well with the bedgraphs I have on UCSC.

Alternative splicing is not a primary concern at this point so this should be fine for now. If that changes I'll be sure to have a headache trying to figure out! I'll come back to this post for sure.

Thanks!

**pengchy** · 04-21-2015, 03:37 PM

Originally posted by migs54 View Post

Thanks for the suggestions. I ended up using the following.

coverageBed -abam CCE.bam -s -b mm9.refSeq.Wholegene.bed > CCEcoverage.txt

The final column gave percentage coverage and it matches well with the bedgraphs I have on UCSC.

Alternative splicing is not a primary concern at this point so this should be fine for now. If that changes I'll be sure to have a headache trying to figure out! I'll come back to this post for sure.

Thanks!

Just to remind that this method will not give the coverage of the exons of one gene.

coverageBed -abam CCE.bam -s -b mm9.refSeq.Wholegene.bed > CCEcoverage.txt

The percentage coverage given by this command is the gene body, including introns, coverage. And further, if you donn't use "-split" parameter, the region in "-abam" will use all the span that include "Ns". The detailed explanation can be found at section 1.3.19.

**migs54** · 04-21-2015, 03:44 PM

Hi Pengchy,

Yes you're right. This experiment was chromatin RNA seq, so included introns when calculating coverage.

Topics	Statistics	Last Post
Single-Cell Atlases Skew Toward European Ancestry, Analysis Finds by SEQadmin2 Started by SEQadmin2, 07-20-2026, 11:10 AM	0 responses 14 views 0 reactions	Last Post by SEQadmin2 07-20-2026, 11:10 AM
UC San Diego Bioengineers Map Gene Function in Human Stem Cells by SEQadmin2 Started by SEQadmin2, 07-13-2026, 10:26 AM	0 responses 32 views 0 reactions	Last Post by SEQadmin2 07-13-2026, 10:26 AM
New Analysis Splits Leukemia Into 16 Epigenomic Subgroups by SEQadmin2 Started by SEQadmin2, 07-09-2026, 10:04 AM	0 responses 43 views 0 reactions	Last Post by SEQadmin2 07-09-2026, 10:04 AM
Genome-Wide CRISPR Screen Uncovers Unlikely Psoriasis Target by SEQadmin2 Started by SEQadmin2, 07-08-2026, 10:08 AM	0 responses 29 views 0 reactions	Last Post by SEQadmin2 07-08-2026, 10:08 AM

Unconfigured Ad

Read coverage for each gene

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News