Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BEDTools Version 2.0

    Hi all,
    Version 2 of BEDTools has been released. These tools allow one to answer common questions of genomic features in BED format. Version 2 has two major improvements:

    1. Enforcing "strandedness". The previous version of BEDTools reported overlaps between BED features regardless of the strand of two features. Now, with the "-s" option, all relevant utilities (e.g. intersectBed, mergeBed, windowBed, closestBed, etc.) will enforce that overlaps are reported ONLY if they are on the same strand. By default, stand is ignored.

    2. Intersecting paired-end reads/SV calls to regular BED files. There is now a program called peIntersectBed that compares features (e.g. paired-end reads, SV calls, etc.) to a regular BED file (e.g. RefSeq genes). In order to do such comparisons, I have defined a new BEDPE format that is very similar to traditional BED formats. The new utility allows one to ask for:

    1. All cases where _either_ end of a BEDPE entry overlaps a BED file.
    2. All cases where _both_ ends of a BEDPE entry overlaps a BED file.
    3. All cases where _neither_ end of a BEDPE entry overlaps a BED file.
    4. All cases where _one and only one_ (i.e. xor) end of a BEDPE entry overlaps a BED file.
    5. All cases where the "inner span" of a BEDPE entry overlaps a BED file.
    6. All cases where the "outer span" of a BEDPE entry overlaps a BED file.

    peIntersectBed is really useful for screening paired-end sequencing reads against genomic annotations.

    The source code for BEDTools Version 2.0 is posted on sourceforge at:
    Download BEDTools for free. BEDTools is a suite of utilities for comparing genomic features in BED format. These utilities allow one to quickly address tasks such as: 1.


    Examples and high-level descriptions can be found here:


    The USAGE_EXAMPLES document in the BEDTools package contains more detailed examples of common usage. If you have used Galaxy, many of the concepts should be familiar.

    All the best,
    Aaron
    Last edited by quinlana; 05-12-2009, 06:07 PM. Reason: typos

  • #2
    Hi Aaron,
    I am trying to compare two bed files. For example I started exploring a small example as below to test the usuage of the tool.
    Code:
    track name=pairedReads2 description="Clone Paired Reads2" useScore=1
    chr22   1000    5000    cloneA  960     +       1000    5000    0       2       567,488,        0,3512
    chr22   2000    6000    cloneB  900     -       2000    6000    0       2       433,399,        0,3601
    But I get the error as below:

    HTML Code:
     ./mergeBed -i ../../chr22_data/test2.bed 
    Only one BED field detected: 1.  Verify that your files are TAB-delimited.  Exiting... 
    
    or 
    
     ./mergeBed -i ../../chr22_data/test1.bed 
    Unexpected number of fields: 1.  Verify that your files are TAB-delimited and that your BED file has 3,4,5 or 6 fields.  Exiting...
    How do I proceed further. I have a bed file with 12 columns. B'cos each line in the bed file contains 2 blocks of sequence. Is it possible to use the tool for this kind of analysis. Please verify. Thanks.

    Comment


    • #3
      Originally posted by seq_GA View Post
      Hi Aaron,
      I am trying to compare two bed files. For example I started exploring a small example as below to test the usuage of the tool.
      Code:
      track name=pairedReads2 description="Clone Paired Reads2" useScore=1
      chr22   1000    5000    cloneA  960     +       1000    5000    0       2       567,488,        0,3512
      chr22   2000    6000    cloneB  900     -       2000    6000    0       2       433,399,        0,3601
      But I get the error as below:

      HTML Code:
       ./mergeBed -i ../../chr22_data/test2.bed 
      Only one BED field detected: 1.  Verify that your files are TAB-delimited.  Exiting... 
      
      or 
      
       ./mergeBed -i ../../chr22_data/test1.bed 
      Unexpected number of fields: 1.  Verify that your files are TAB-delimited and that your BED file has 3,4,5 or 6 fields.  Exiting...
      How do I proceed further. I have a bed file with 12 columns. B'cos each line in the bed file contains 2 blocks of sequence. Is it possible to use the tool for this kind of analysis. Please verify. Thanks.
      Hi,
      BEDTools only supports tab-delimited BED files with a minimum of 3 (chrom, start and end) fields and a maximum of 6 (optionally adding name, score and strand).

      For example, if you extracted the first 6 columns of your example file, it could be merged as follows:
      PHP Code:
      cut -f 1-6 test.bed mergeBed -i stdin
      chr22    1000    6000 
      I also note that you seem to be dealing with paired sequences. BEDTools has a utility (peIntersectBed) that will intersect paired-end fearures with normal BED files. The file format paired-end BED entries can be found by using the "-h" option with peIntersectBed.

      Lastly, if you are using exactly version 2.0.0, there is a much newer version available here:
      http://code.google.com/p/bedtools.

      All the best,
      Aaron

      Comment


      • #4
        I should also note that one can track the names of which entries were merged (separated by a semicolon) by using the "-names" option.

        From your example:

        PHP Code:
        cut -f 1-6 test.bed mergeBed -i stdin -names
        chr22    1000    6000    cloneA
        ;cloneB 
        This is undocumented in the help and I am changing this as we "speak".
        --Aaron

        Comment


        • #5
          Hi Aaron,

          Thanks for your response. I have downloaded the recent version and start using.

          Code:
          ./mergeBed -n -i ../newdata/full.bed > /../newdata/merged.bed
          The above command works.

          When I try to force with -s options to check the strand information, I don't get any output.

          Code:
          ./mergeBed -n -s -i ../newdata/full.bed > /../newdata/merged.bed
          Without strand, it works fine. Even in the example you have give above no strand info is being printed in the output. Why is it so?

          Basically I am trying to remove duplicate records and merge them as 1 record.

          Thanks and Regards
          Last edited by seq_GA; 10-28-2009, 03:02 AM.

          Comment


          • #6
            Originally posted by seq_GA View Post
            Hi Aaron,

            Thanks for your response. I have downloaded the recent version and start using.

            Code:
            ./mergeBed -n -i ../newdata/full.bed > /../newdata/merged.bed
            The above command works.

            When I try to force with -s options to check the strand information, I don't get any output.

            Code:
            ./mergeBed -n -s -i ../newdata/full.bed > /../newdata/merged.bed
            Without strand, it works fine. Even in the example you have give above no strand info is being printed in the output. Why is it so?

            Basically I am trying to remove duplicate records and merge them as 1 record.

            Thanks and Regards
            Hmm, it works as expected for me using Version 2.2.4. test.bed below is the same as your file above.

            __without__ strand, thus ignores the fact that the two entries are on different strands and combines them:
            PHP Code:
            cut -f 1-6 test.bed mergeBed -i stdin -names
            chr22    1000    6000    cloneA
            ;cloneB 

            __with__ strand, thus observes the fact that the two entries are on different strands and does not combines them:
            PHP Code:
            cut -f 1-6 test.bed mergeBed -i stdin -s
            chr22    1000    5000    
            +
            chr22    2000    6000    

            Comment


            • #7
              Hi Aaron,

              How would you like me to cite your tools if we use them in a publication?

              Thanks!
              Lizzy

              Comment


              • #8
                Originally posted by ewilbanks View Post
                Hi Aaron,

                How would you like me to cite your tools if we use them in a publication?

                Thanks!
                Lizzy
                Hi Lizzy,
                We are working on the manuscript, but until then, please cite it as: Aaron R. Quinlan and Ira M. Hall, unpublished: http://code.google.com/p/bedtools/).
                Thanks for asking and good luck with your manuscript.
                Aaron

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM
                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                13 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                17 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                14 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                43 views
                0 likes
                Last Post seqadmin  
                Working...
                X