Header Leaderboard Ad

Collapse

BEDTools: new tools / support for paired-end features.

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BEDTools: new tools / support for paired-end features.

    Hello all,
    I just posted version 2.3.0 of BEDTools (http://code.google.com/p/bedtools/) which includes several new and useful updates.

    (1) I added four new tools:

    (a) shuffleBed. Randomly permutes the locations of a BED file among a genome. Useful for testing for significant enrichment of say, an experimental observation with a genome feature. It also allows one to define a separate BED file of genomic regions that should be _exluded_ from random placement (e.g. genome gaps).
    (b) slopBed. Adds a requested number of base pairs to each end of a BED feature. More clever than an awk on a BED file, as it is constrained by the size of each chromosome.
    (c) maskFastaFromBed. Masks a FASTA file based on BED coordinates. Useful making custom genome files for, as an example, targeted capture experiments, etc.
    (d) pairToPair. Returns overlaps between two paired-end BED files. This is great for finding structural variants that are private or shared among samples. Specifically, pairToPair will find paired-end alignments / or variants that have the same orientation on both ends and have overlapping alignments on both ends. I've found this to be very useful for classifying structural variation detected by paired-end mapping.

    (2) I increased the speed of intersectBed by nearly 50%.
    (3) I improved / corrected some of the help messages.
    (4) I improved sanity checking for BED entries.

    (5) I added two new scripts. The first, samToBed, will convert alignments in SAM format to BED format. It also accepts input from standard input so as to play nicely with the "samtools view" command. The second, gffToBed, converts GFF annotations to BED.

    I hope you find these useful.
    Aaron

  • #2
    Hi Aaron,
    I exploit this post to ask you how closestBed works... I really don't get what a tie is.
    As example

    Code:
    $ closestBed   -a mysplit/merged_IRR1.bed -b mm9.refseq.tss.bed6 | head
    chr1	4172972	4173006	1	+	chr1	4334223	4350473	NM_011283	0	-
    chr1	4557081	4557115	1	+	chr1	4334223	4350473	NM_011283	0	-
    chr1	4557081	4557115	1	+	chr1	4481008	4486494	NM_011441	0	-
    chr1	4562824	4562858	1	+	chr1	4334223	4350473	NM_011283	0	-
    chr1	4562824	4562858	1	+	chr1	4481008	4486494	NM_011441	0	-
    chr1	5120005	5120039	1	-	chr1	5073253	5152630	NM_133826	0	+
    chr1	5493224	5493258	1	+	chr1	4334223	4350473	NM_011283	0	-
    chr1	5493224	5493258	1	+	chr1	4481008	4486494	NM_011441	0	-
    chr1	5493224	5493258	1	+	chr1	4764014	4775768	NM_025300	0	-
    chr1	5493224	5493258	1	+	chr1	4797973	4836816	NM_008866	0	+
    I expect closestBed to search the closest feature up/downstream, instead I get a list of features from the farthest to the closest (in abs(dist)). I'm a bit puzzled :-)

    Comment


    • #3
      Hi Dawe,
      You are rightfully puzzled...I was too. Your expectation of how it should behave is correct. Unfortunately, I injected a typo while modifying an unrelated piece of code prior to this release. A new version (2.3.1) has been posted which behaves as you would expect. I tested it with your sample data below and all appears well.

      As for ties, these occur in two ways:

      1) When there are two or more features in B that _overlap_ the same fraction of feature in A, by default both features in B are reported. By using the -t first or -t last, you can choose just one.

      2) When there are two or more that while not overlapping a feature in A, are exactly the same distance from A (say 1Mb), both will be reported.

      Sorry for the confusion.
      Aaron

      Comment


      • #4
        I've got it (both the tie definition and the new version tarball!).
        As you said, it works!
        Thanks

        d

        Comment

        Working...
        X