I'm trying to make a list of gaps in coverage in Illumina sequence data? IGV displays the number of reads covering each base, but is there a way to output this count data to a text file? This will save me having to manually scroll through the data in IGV and annotate by hand. Many thanks in advance.
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
You could do this in SeqMonk. The process would be:- Import your raw data
- Do a contig probe generation with a depth of 1 read
- Do a read count quantitation
- Go back to probe generation and do an interstitial probe generation (so you put probes between the existing probes, which will be all the gaps in your assembly) - make sure not to exclude probes with no data!
- Do a read count quantitation (even through they will all be 0)
- Create an annotated probe report and you'll have a list of all of the gaps in your assembly
- Alternatively you can convert your probe set into an annotation track so you can see where the gaps are when performing further analyses
-
I have a python program that can do that. Please contact me at [email protected] if you are interested in using it. It can do sums, averages and determine streches of coverages based on thresholds using aligned bam files as input.
Comment
-
Originally posted by Scotch View PostI'm trying to make a list of gaps in coverage in Illumina sequence data? IGV displays the number of reads covering each base, but is there a way to output this count data to a text file? This will save me having to manually scroll through the data in IGV and annotate by hand. Many thanks in advance.
The "-d" option will report depth at every base (perhaps excessive). The -bga option will create a BEDGRAPH file documenting intervals with common depth. You could use awk on the output to report solely those intervals with 0 or <X coverage. Something like this:
Code:genomeCoverageBed -ibam aln.possorted.bam -bga | awk '$4 == 0' > intervals.with.0.cov.bedg
Check out the BEDTools manual or see the examples here:
Comment
-
I also go the Bedtools route, but I wrote a quick perlscript to output some coverage statistics that can then be imported into Excel. I've been working with alignments against multiple fragments, but it works for single sequence alignments as well.
Here's my method:
Calculating alignment coverage
Code:genomecoveragebed –ibam alignment_sorted.bam –g reference.fa.fai –bga > alignment.coverage
Code:perl path/to/get_fragment_coverage.pl –c alignment.coverage –o alignment_coverage.txt –d 1
By the way, if you want you can pipe genomecoveragebed and get_fragment_coverage.pl together if you want, like so:
Code:genomecoveragebed –ibam alignment_sorted.bam –g reference.fa.fai –bga | perl path/to/get_fragment_coverage.pl –c STDIN –o alignment_coverage.txt –d 1
Attached Files
Comment
-
I just tried bedtools genomeCoverage and it gave an error and exit. The error is Chromosome chr_random10 found in your bed file but not in your genome file . So my hg18.fai only has regular chromosomes and it is these type of issues that led me to write my own program. I agree that bedtools is very fast and efficient but it doesn't fit every requirement easily at least for me. Do you know any workaround to this (i.e a flag to tell bedtools to ignore such errors) other than creating another fai with random chromosome positions.
Comment
-
I'm not sure why you wouldn't be using the same .fai as the header in your BAM, but in this case you could just make a new "genome" file from your BAM header.
Code:samtools view -H [BAM] | grep "@SQ" | sed -e "s/SN://" -e "s/LN://" | cut -f 2,3 > chroms.txt
Comment
-
Originally posted by quinlana View PostI'm not sure why you wouldn't be using the same .fai as the header in your BAM, but in this case you could just make a new "genome" file from your BAM header.
Code:samtools view -H [BAM] | grep "@SQ" | sed -e "s/SN://" -e "s/LN://" | cut -f 2,3 > chroms.txt
Last edited by msincan; 02-09-2011, 01:23 PM.
Comment
-
At the risk of this being a stupid question: What exactly are these "random" chromosomes? I stumbled over this when converting a soap alignment to bam. I think alignments to these random chromosome s were just skipped in the conversion and I would like to know if I'm now missing important data in my bamfile.
Comment
-
They are unassembled supercontigs. They can be found in hg19 (at least, in 1kg.37) but not in hg18.
Can someone confirm if I got this right?
Anyway, we do use them to align to, because there may be reads that should be aligned there instead of at the regular chromosomes. If the supercontigs are missing, those reads may be forced to align in a place where they don't belong. In short: reduce the number of false positives.
Sorry for being slightly off topic.
Comment
-
Latest Articles
Collapse
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Today, 10:49 AM
|
0 responses
12 views
0 likes
|
Last Post
by seqadmin
Today, 10:49 AM
|
||
Started by seqadmin, 04-25-2024, 11:49 AM
|
0 responses
22 views
0 likes
|
Last Post
by seqadmin
04-25-2024, 11:49 AM
|
||
Started by seqadmin, 04-24-2024, 08:47 AM
|
0 responses
20 views
0 likes
|
Last Post
by seqadmin
04-24-2024, 08:47 AM
|
||
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
62 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
Comment