Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BEDTools: new tools / support for paired-end features.

    Hello all,
    I just posted version 2.3.0 of BEDTools (http://code.google.com/p/bedtools/) which includes several new and useful updates.

    (1) I added four new tools:

    (a) shuffleBed. Randomly permutes the locations of a BED file among a genome. Useful for testing for significant enrichment of say, an experimental observation with a genome feature. It also allows one to define a separate BED file of genomic regions that should be _exluded_ from random placement (e.g. genome gaps).
    (b) slopBed. Adds a requested number of base pairs to each end of a BED feature. More clever than an awk on a BED file, as it is constrained by the size of each chromosome.
    (c) maskFastaFromBed. Masks a FASTA file based on BED coordinates. Useful making custom genome files for, as an example, targeted capture experiments, etc.
    (d) pairToPair. Returns overlaps between two paired-end BED files. This is great for finding structural variants that are private or shared among samples. Specifically, pairToPair will find paired-end alignments / or variants that have the same orientation on both ends and have overlapping alignments on both ends. I've found this to be very useful for classifying structural variation detected by paired-end mapping.

    (2) I increased the speed of intersectBed by nearly 50%.
    (3) I improved / corrected some of the help messages.
    (4) I improved sanity checking for BED entries.

    (5) I added two new scripts. The first, samToBed, will convert alignments in SAM format to BED format. It also accepts input from standard input so as to play nicely with the "samtools view" command. The second, gffToBed, converts GFF annotations to BED.

    I hope you find these useful.
    Aaron

  • #2
    Hi Aaron,
    I exploit this post to ask you how closestBed works... I really don't get what a tie is.
    As example

    Code:
    $ closestBed   -a mysplit/merged_IRR1.bed -b mm9.refseq.tss.bed6 | head
    chr1	4172972	4173006	1	+	chr1	4334223	4350473	NM_011283	0	-
    chr1	4557081	4557115	1	+	chr1	4334223	4350473	NM_011283	0	-
    chr1	4557081	4557115	1	+	chr1	4481008	4486494	NM_011441	0	-
    chr1	4562824	4562858	1	+	chr1	4334223	4350473	NM_011283	0	-
    chr1	4562824	4562858	1	+	chr1	4481008	4486494	NM_011441	0	-
    chr1	5120005	5120039	1	-	chr1	5073253	5152630	NM_133826	0	+
    chr1	5493224	5493258	1	+	chr1	4334223	4350473	NM_011283	0	-
    chr1	5493224	5493258	1	+	chr1	4481008	4486494	NM_011441	0	-
    chr1	5493224	5493258	1	+	chr1	4764014	4775768	NM_025300	0	-
    chr1	5493224	5493258	1	+	chr1	4797973	4836816	NM_008866	0	+
    I expect closestBed to search the closest feature up/downstream, instead I get a list of features from the farthest to the closest (in abs(dist)). I'm a bit puzzled :-)

    Comment


    • #3
      Hi Dawe,
      You are rightfully puzzled...I was too. Your expectation of how it should behave is correct. Unfortunately, I injected a typo while modifying an unrelated piece of code prior to this release. A new version (2.3.1) has been posted which behaves as you would expect. I tested it with your sample data below and all appears well.

      As for ties, these occur in two ways:

      1) When there are two or more features in B that _overlap_ the same fraction of feature in A, by default both features in B are reported. By using the -t first or -t last, you can choose just one.

      2) When there are two or more that while not overlapping a feature in A, are exactly the same distance from A (say 1Mb), both will be reported.

      Sorry for the confusion.
      Aaron

      Comment


      • #4
        I've got it (both the tie definition and the new version tarball!).
        As you said, it works!
        Thanks

        d

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 03-27-2024, 06:37 PM
        0 responses
        13 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-27-2024, 06:07 PM
        0 responses
        11 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        53 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        69 views
        0 likes
        Last Post seqadmin  
        Working...
        X