Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BEDTools: new tools / support for paired-end features.

    Hello all,
    I just posted version 2.3.0 of BEDTools (http://code.google.com/p/bedtools/) which includes several new and useful updates.

    (1) I added four new tools:

    (a) shuffleBed. Randomly permutes the locations of a BED file among a genome. Useful for testing for significant enrichment of say, an experimental observation with a genome feature. It also allows one to define a separate BED file of genomic regions that should be _exluded_ from random placement (e.g. genome gaps).
    (b) slopBed. Adds a requested number of base pairs to each end of a BED feature. More clever than an awk on a BED file, as it is constrained by the size of each chromosome.
    (c) maskFastaFromBed. Masks a FASTA file based on BED coordinates. Useful making custom genome files for, as an example, targeted capture experiments, etc.
    (d) pairToPair. Returns overlaps between two paired-end BED files. This is great for finding structural variants that are private or shared among samples. Specifically, pairToPair will find paired-end alignments / or variants that have the same orientation on both ends and have overlapping alignments on both ends. I've found this to be very useful for classifying structural variation detected by paired-end mapping.

    (2) I increased the speed of intersectBed by nearly 50%.
    (3) I improved / corrected some of the help messages.
    (4) I improved sanity checking for BED entries.

    (5) I added two new scripts. The first, samToBed, will convert alignments in SAM format to BED format. It also accepts input from standard input so as to play nicely with the "samtools view" command. The second, gffToBed, converts GFF annotations to BED.

    I hope you find these useful.
    Aaron

  • #2
    Hi Aaron,
    I exploit this post to ask you how closestBed works... I really don't get what a tie is.
    As example

    Code:
    $ closestBed   -a mysplit/merged_IRR1.bed -b mm9.refseq.tss.bed6 | head
    chr1	4172972	4173006	1	+	chr1	4334223	4350473	NM_011283	0	-
    chr1	4557081	4557115	1	+	chr1	4334223	4350473	NM_011283	0	-
    chr1	4557081	4557115	1	+	chr1	4481008	4486494	NM_011441	0	-
    chr1	4562824	4562858	1	+	chr1	4334223	4350473	NM_011283	0	-
    chr1	4562824	4562858	1	+	chr1	4481008	4486494	NM_011441	0	-
    chr1	5120005	5120039	1	-	chr1	5073253	5152630	NM_133826	0	+
    chr1	5493224	5493258	1	+	chr1	4334223	4350473	NM_011283	0	-
    chr1	5493224	5493258	1	+	chr1	4481008	4486494	NM_011441	0	-
    chr1	5493224	5493258	1	+	chr1	4764014	4775768	NM_025300	0	-
    chr1	5493224	5493258	1	+	chr1	4797973	4836816	NM_008866	0	+
    I expect closestBed to search the closest feature up/downstream, instead I get a list of features from the farthest to the closest (in abs(dist)). I'm a bit puzzled :-)

    Comment


    • #3
      Hi Dawe,
      You are rightfully puzzled...I was too. Your expectation of how it should behave is correct. Unfortunately, I injected a typo while modifying an unrelated piece of code prior to this release. A new version (2.3.1) has been posted which behaves as you would expect. I tested it with your sample data below and all appears well.

      As for ties, these occur in two ways:

      1) When there are two or more features in B that _overlap_ the same fraction of feature in A, by default both features in B are reported. By using the -t first or -t last, you can choose just one.

      2) When there are two or more that while not overlapping a feature in A, are exactly the same distance from A (say 1Mb), both will be reported.

      Sorry for the confusion.
      Aaron

      Comment


      • #4
        I've got it (both the tie definition and the new version tarball!).
        As you said, it works!
        Thanks

        d

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Recent Advances in Sequencing Technologies
          by seqadmin







          Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

          Long-Read Sequencing
          Long-read sequencing has...
          12-02-2024, 01:49 PM
        • seqadmin
          Genetic Variation in Immunogenetics and Antibody Diversity
          by seqadmin



          The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
          11-06-2024, 07:24 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 12-02-2024, 09:29 AM
        0 responses
        139 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 12-02-2024, 09:06 AM
        0 responses
        49 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 12-02-2024, 08:03 AM
        0 responses
        38 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 11-22-2024, 07:36 AM
        0 responses
        69 views
        0 likes
        Last Post seqadmin  
        Working...
        X