Seqanswers Leaderboard Ad

**quinlana** · 07-05-2010, 04:53 PM

Hi all,

I just updated the latest release to include a new tool called bedToIgv that will create an IGV (v1.5 and later) "batch" script from any BED/GFF/VCF file or stream. The resulting batch scripts can be run using IGV to automagically take "snapshots" of as many loci as are in your input file (time permitting). Batch files are run in IGV via File->"Run Batch Script".

Download: http://bedtools.googlecode.com/files....v2.8.1.tar.gz

Best,
Aaron

**Margarida** · 07-06-2010, 01:36 PM

bedToIgv looks like an outstanding tool!

Many thanks!

**epigen** · 07-07-2010, 06:26 AM

Detecting insertions from dbSNP

Hi Aaron,

BEDTools is just the thing I need for my research. After trying to develop my own overlap script and dealing with the UCSC database and tools locally, I really appreciate your work. However, I have to bug you because insertions from dbSNP are giving me trouble: They seem to be ignored.

Example SNPs:

Code:

#chrom	chromStart	chromEnd	name
chr1	10430	10431	fake_SNP
chr1	10433	10433	rs56289060_insertion
chr1	10491	10492	rs55998931
chr1	10518	10519	rs62636508
chr1	10518	10519	same_coordinate

Example genes:

Code:

# genes
chr1	10430	10440	Gene_with_Insertion_and_fake_SNP
chr1	10490	10520	Gene_with_3SNPs

intersectBed -a SNPs.txt -b genes.txt -wo results in not reporting the insertion:

Code:

chr1	10430	10431	fake_SNP	chr1	10430	10440	Gene_with_Insertion_and_fake_SNP	1
chr1	10491	10492	rs55998931	chr1	10490	10520	Gene_with_3SNPs	1
chr1	10518	10519	rs62636508	chr1	10490	10520	Gene_with_3SNPs	1
chr1	10518	10519	same_coordinate	chr1	10490	10520	Gene_with_3SNPs	1

It seems that the problem is that for insertions, the start coordinate is the same as the end coordinate. Is there any trick you could recommend or should I just go through my SNP file and subtract 1 from all the insertion starts?

Thank you

Barbara

**idot** · 07-08-2010, 07:35 AM

split working correctly?

Hi,
I am wondering if coverage with split is working correctly, also if b contains the splits?
bed12b.bed contains a split 1000-1010 & 1090-1100 and is reported as being covered by 30 basepairs and to have a length of 100.

also: a -bbam not only -abam would be nice to have for coverage.

best,
ido

Code:

coverageBed -split -a bed12a.bed -b bed12b.bed
chr1    1000    1100    test    0       +       1000    1100    255,0,0 2       10,10   0,90    1       30      100     0.3000000
coverageBed -split -b bed12a.bed -a bed12b.bed
chr1    1040    1070    test    0       +       1040    1070    0,0,255 1       30      0       0       0       30      0.0000000
cat bed12a.bed
chr1    1040    1070    test    0       +       1040    1070    0,0,255 1       30      0
cat bed12b.bed
chr1    1000    1100    test    0       +       1000    1100    255,0,0 2       10,10   0,90

**adamdeluca** · 07-09-2010, 04:37 AM

Originally posted by epigen View Post

Is there any trick you could recommend or should I just go through my SNP file and subtract 1 from all the insertion starts?

awk '/insertion/{print $1"\t"$2-1"\t"$3"\t"$4};!/insertion/' test.bed

will subtract 1 from the start position of any SNP listed as insertion and leave other lines alone.

awk '$2==$3{print $1"\t"$2-1"\t"$3"\t"$4};$2!=$3' test.bed

Will subtract 1 from the start position of any line where the start and end positions match.

**quinlana** · 07-10-2010, 10:33 AM

Originally posted by idot View Post

Hi,
I am wondering if coverage with split is working correctly, also if b contains the splits?
bed12b.bed contains a split 1000-1010 & 1090-1100 and is reported as being covered by 30 basepairs and to have a length of 100.

also: a -bbam not only -abam would be nice to have for coverage.

best,
ido

Code:

coverageBed -split -a bed12a.bed -b bed12b.bed
chr1    1000    1100    test    0       +       1000    1100    255,0,0 2       10,10   0,90    1       30      100     0.3000000
coverageBed -split -b bed12a.bed -a bed12b.bed
chr1    1040    1070    test    0       +       1040    1070    0,0,255 1       30      0       0       0       30      0.0000000
cat bed12a.bed
chr1    1040    1070    test    0       +       1040    1070    0,0,255 1       30      0
cat bed12b.bed
chr1    1000    1100    test    0       +       1000    1100    255,0,0 2       10,10   0,90

Currently, -split applies to only the "A" file. Adding both is possible but I have not yet implemented it. It is on my mind, however...

-abam and -bam will be much more difficult as BEDTools loads the "B" file into memory which could be a bad idea for many BAM files.

Thanks for your suggestions.
Aaron
Thanks fo

**quinlana** · 07-18-2010, 10:27 AM

Hi all,

I just posted version 2.8.2, which addresses the bugs that have been reported since the release of 2.8.0. I've also updated the manual to reflect the new VCF support, new tools, and the new support for "spliced / split" alignments.

Source: http://bedtools.googlecode.com/files....v2.8.2.tar.gz
Manual: http://bedtools.googlecode.com/files...-Manual.v3.pdf

Best,
Aaron

==================
Bug fixes (the dirty laundry):
==================
1. Fixed a clutzy bug in bedFile.h preventing GFF strands from being read properly.
2. Fixed a bug in intersectBed that occasionally caused spurious overlaps between BAM alignments and BED features.
3. Fixed bug in intersectBed causing -r to not report the same result when files are swapped.
4. Added checks to groupBy to prevent the selection of improper opCols and groups.
5. Fixed various compilation issues for newer GCC versions, esp. for groupBy, bedToBam, and bedToIgv.
6. Updated the usage statements to reflect bed/gff/vcf support.
7. Added new fileType functions for auto-detecting gzipped or regular files. Thanks to Assaf Gordon.

**quinlana** · 07-18-2010, 12:46 PM

I should also mention that in addition to the usage example in the documentation and on the Google Code site, there is a nascent collection of usage examples posted by users at:

Error 404 (Not Found)!!1

http://groups.google.com/group/bedtools-discuss/web/community-usage-examples

You may find these to be useful.
Aaron

**thinkRNA** · 08-05-2010, 04:26 PM

Hi Aaron, I love bedtools and thanks for developing it. I want to see the per base coverage profile of a few housekeeping genes between my replicates as I suspect an experimental error. I was wondering if there is a way to normalize the coverage using bedtools? Right now, I cannot compare coverages between samples, because they are sequenced to different depths. Please let me know if you know of any other way I can answer this simple but critical question.

Thanks!

**quinlana** · 08-06-2010, 04:07 AM

Originally posted by thinkRNA View Post

Hi Aaron, I love bedtools and thanks for developing it. I want to see the per base coverage profile of a few housekeeping genes between my replicates as I suspect an experimental error. I was wondering if there is a way to normalize the coverage using bedtools? Right now, I cannot compare coverages between samples, because they are sequenced to different depths. Please let me know if you know of any other way I can answer this simple but critical question.

Thanks!

Hi,
There is no tool for directly normalizing coverage. However, assuming I correctly understand your issue, one thing I could think of would be to use genomeCoverageBed to get the per base coverage (-d) option for each replicate. You could then compute the mean or median per base coverage for the replicate with the deeper coverage and normalize the depth in the shallower replicate accordingly. For example, given housekeeping gene A, you might find the median coverage in the deeper replicate is 10, while the analogous value in the shallower replicate is 1. You could then multiple the depths in the shallower sample by 10 and do your comparison.

This approach will be slow because you must use genomeCoverageBed and then cull out the relevant regions. I am hoping to include a "per base" depth option for coverageBed in the next release. This will allow you to quickly grab per base coverage profiles for subsets of the genome.

Best,
Aaron

**thinkRNA** · 08-06-2010, 09:19 AM

Alternate way get normalized per base coverage

Originally posted by quinlana View Post

Hi,
There is no tool for directly normalizing coverage. However, assuming I correctly understand your issue, one thing I could think of would be to use genomeCoverageBed to get the per base coverage (-d) option for each replicate. You could then compute the mean or median per base coverage for the replicate with the deeper coverage and normalize the depth in the shallower replicate accordingly. For example, given housekeeping gene A, you might find the median coverage in the deeper replicate is 10, while the analogous value in the shallower replicate is 1. You could then multiple the depths in the shallower sample by 10 and do your comparison.

This approach will be slow because you must use genomeCoverageBed and then cull out the relevant regions. I am hoping to include a "per base" depth option for coverageBed in the next release. This will allow you to quickly grab per base coverage profiles for subsets of the genome.

Best,
Aaron

Hi Aaron,
Thanks for your prompt reply. Unfortunately genomeCoverageBed -d option on a mouse genome generates too big a file . I came up with a round about way, do you think it will work?
genomeCoverageBed -bg -ibam B1.bam -g mm9.genome > genomeCoverage.out

use intersect bed to get the housekeeping gene's coverage where GAPDH.bed contains start and end position of the gene
intersectBed -wb -a genomeCoverage.out -b GAPDH.bed > onlyGAPDH.txt

[Do I need to use -split here?]

loop thru each base in onlyGAPDH.txt to break it into per base coverage and multiply each base's coverage by (1000000 / (total bases covered)).

I'll have to write another script to get total bases covered for each sample.

Thanks so much,
Priyam

**dsobral** · 07-25-2012, 08:40 AM

Hi all,

I know this is a old thread, but I thought my question was adequate here.

I'm using bedtools coverage and bedtools multicov to get read counts in intervals.
I'm using alignments from tophat (containing split alignments) and bwa (no split).

1) Does "bedtools multicov" handle -split like bedtools coverage?

It doesn't seem so, as read counts from tophat seem quite exaggerated. I also tried another splice-aware aligner (osa), and I had the same, when comparing to bwa alignments.

Then I tried using "bedtools coverage -split -counts", which should handle this properly, and still bwa counts are much more in agreement with what I see in the IGV browser, even if I look at the tophat alignments! Again, when I use edgeR or baySeq differential expression, bwa results make much more sense...

2) Does anyone know of issues with split alignments in bedtools?

Any thoughts welcome, or suggestions of other tools to do the job?
I tried htseq-counts, but I have to convert the bam to a sam, and my bed to a gff, and still I get errors. Finally, the R package easyrnaseq is virtually useless, as it starts exploding the memory (and I have a bit of memory available!).

Thanks,
Daniel

PS: Thanks for bedtools! It's really a great tool!

Topics	Statistics	Last Post
Gene Misexpression in the Healthy Human Population by seqadmin Started by seqadmin, 07-25-2024, 06:46 AM	0 responses 9 views 0 likes	Last Post by seqadmin 07-25-2024, 06:46 AM
New Method for Rapid Genetic Diagnosis of Mendelian Disorders by seqadmin Started by seqadmin, 07-24-2024, 11:09 AM	0 responses 26 views 0 likes	Last Post by seqadmin 07-24-2024, 11:09 AM
Advancing Nanopore Technology for Portable Sensing Devices by seqadmin Started by seqadmin, 07-19-2024, 07:20 AM	0 responses 160 views 0 likes	Last Post by seqadmin 07-19-2024, 07:20 AM
New RNA-Based Gene Writing Technology Achieves Precise Gene Integration by seqadmin Started by seqadmin, 07-16-2024, 05:49 AM	0 responses 127 views 0 likes	Last Post by seqadmin 07-16-2024, 05:49 AM

Seqanswers Leaderboard Ad

Announcement

BEDTools v2.8: VCF, split-alignments, new tools

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News