Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • quinlana
    Senior Member
    • Sep 2008
    • 119

    BEDTools Version 2.1

    Hi all,

    I updated the BEDTools utility "coverageBed" so that it now reports the density/breadth of coverage for a given interval. Specifically, for each interval in B, it reports:
    1) the number of overlapping features in A
    2) the number of bases in B that had non-zero coverage from A.
    3) the fraction (density) of non-zero bases in B covered by A.


    An example (note start coordinates are UCSC 0-based and thus are interpreted to actually be 1 greater):

    > cat A.bed
    chr1 10 20
    chr1 11 21
    chr1 12 25

    > cat B.bed
    chr1 0 50

    > coverageBed -a A.bed -b B.bed
    chr1 0 50 3 15 50 0.3

    where:
    column 4 is the number of intervals in A that overlap with B.
    column 5 is the number of bases in B with non-zero "coverage" from A.
    column 6 is the length of the interval in B.
    column 7 is the fraction of bases in B that have non-zero coverage from A.

    In essence, this describes the "breadth" of coverage, whereas column 4 describes the "depth". In this case, B is overlapped by 3 features in A and these features cover 30% of the 50bp interval in B.

    The newest version is here:
    http://people.virginia.edu/~arq5x/bedtools.html OR,
    Download BEDTools for free. BEDTools is a suite of utilities for comparing genomic features in BED format. These utilities allow one to quickly address tasks such as: 1.


    All the best,
    Aaron
  • pko
    Junior Member
    • Apr 2009
    • 2

    #2
    BEDTools 2.1 fails to compile under linux because of the following line [bedFile.cpp]:

    bedEntry.minOverlapStart = INT_MAX;

    Comment

    • quinlana
      Senior Member
      • Sep 2008
      • 119

      #3
      Thanks for finding this, pko. Ostensibly my system and those of other users allow for me to get away with omitting limits.h from that source file. I'll post a new version as soon as I get back from vacation. In the interim, if others face this problem, add the following to bedFile.h on line 12:

      #include <limits.h>

      Save, re-make and you should be good to go.

      Apologies and thanks much for pointing this out.

      Best,
      Aaron

      Comment

      • dlepp
        Junior Member
        • Mar 2009
        • 5

        #4
        I'm having some strange issues with complementBed - it appears to be highly sensitive to convention used in the chromosome field. For example, this works:

        Bed file:

        chr21 32345 65443

        genome file:

        chr21 48099781

        but this gives no output:

        Bed file:

        hr21 32345 65443

        genome file:

        hr21 48099781


        Maybe there are some restrictions in the bed format that I'm unaware of? Haven't tested any of the other tools.

        Thanks,

        Dion

        Comment

        • quinlana
          Senior Member
          • Sep 2008
          • 119

          #5
          complementBed

          Hi dlepp,
          Thanks for your post, this is a strange problem. I was able to recreate it as well. There is nothing that explicitly limits what can be used for the "chrom" field. The intent is that any string could be used. Oddly, it seems to be a problem with the C++ string tokenizing function I wrote, which is basically just lifted from a "best practices" book. To make things more odd, the following works (not h22 instead of hr22):

          Bed file:

          h21 32345 65443

          genome file:

          h21 48099781

          I tried using other tokenizing methods and the problem persists. I am on vacation until early August and will fix it when I return. In the meantime, if you just use chr22 or 22, all should be well.

          Thanks for pointing this out as it is a strange error that needs to be addressed.

          Best,
          Aaron

          Comment

          • quinlana
            Senior Member
            • Sep 2008
            • 119

            #6
            BEDTools v2.1.1

            Hi,
            I have posted a new version (2.1.1) that addresses the issues that dlepp and pko have so kindly pointed out.

            I've posted it to http://people.virginia.edu/~arq5x/bedtools.html and will update sourceforge soon.

            Thanks again for letting me know of these problems.

            Best,
            Aaron

            Comment

            • ohofmann
              Member
              • Jan 2009
              • 37

              #7
              Aaron,


              been trying bedTools for mapping SNPs to genomic features -- which often overlap. How does 'closestBed' handle these cases? E.g., two genes that overlap, and an SNP in the overlap region -- does it pick one gene at random? Amount of overlap is going to be identical in these cases.

              Thanks!

              Comment

              • quinlana
                Senior Member
                • Sep 2008
                • 119

                #8
                closestBed

                Hi ohofmann,

                Currently, in such situations, closestBed will return the first feature that occurs in the feature file. This works well for larger intervals (e.g. genes, not SNPs), but in the case you describe, it really isn't ideal.

                My guess is that in this case, you'd prefer more control. For example:
                a) return _all_ features that overlap with the SNP.
                b) return the largest feature that overlaps with the SNP.
                c) return the smallest feature that overlaps with the SNP.
                d) randomly select a feature.

                All of these options are quite easy to implement. I can likely implement them this week or early next week if it helps you. To be precise, cases a-d will only be invoked when there are multiple features in B that have 100% overlap with the interval in A (in your case, a SNP). Otherwise, only the closest (i.e. closest non-overlapping or most overlapping) feature will be reported.

                Thanks for pointing this out.
                Aaron

                Comment

                • ohofmann
                  Member
                  • Jan 2009
                  • 37

                  #9
                  Aaron,


                  not sure it's worth the hassle -- just adding the information to the man page should be more than enough. My current workflow, using the mapping of SNPs to genes within a 25kb window as an example:

                  * Run windowBed on all SNPs (streamed) vs a gene file, +/- 25kb, printing out all hits
                  * Cutting out the overlapping gene regions from the result file
                  * Sort/Unique to remove duplicate genes (not sure how closest handles those, just in case), likewise for SNPs (ensures to remove SNPs that do not have a gene within 25kb which otherwise might end up mapped to genes a few megabases away)

                  Take those files as input for closestBed. If an SNP actually overlaps more than one gene it probably makes sense to return all since closest really isn't defined. Closest to .. the start of a gene (depends on strand)? The UTR? Etc.

                  All features is quite likely the only alternative that makes sense in this context.

                  Best, Oliver

                  Comment

                  • quinlana
                    Senior Member
                    • Sep 2008
                    • 119

                    #10
                    Hi Oliver,
                    I agree that returning all features either optionally or by default is best in this case. Such behavior would allow the user to "pipe" to a downstream Perl/AWK/Python/Ruby/VogueLanguageOfTheMonth in order to choose max, min, random, etc.

                    I'll try to knock this out in the next couple of weeks. Not hard, just difficult to find time at the moment.

                    Aaron

                    Comment

                    • ohofmann
                      Member
                      • Jan 2009
                      • 37

                      #11
                      No rush at all, and thanks!

                      -- Oliver

                      Comment

                      Latest Articles

                      Collapse

                      • SEQadmin2
                        From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                        by SEQadmin2


                        Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                        The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                        ...
                        06-02-2026, 10:05 AM
                      • SEQadmin2
                        Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                        by SEQadmin2


                        With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                        Introduction

                        Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                        05-22-2026, 06:42 AM
                      • SEQadmin2
                        Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                        by SEQadmin2

                        Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                        Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                        05-06-2026, 09:04 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by SEQadmin2, Today, 08:59 AM
                      0 responses
                      7 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-02-2026, 12:03 PM
                      0 responses
                      21 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-02-2026, 11:40 AM
                      0 responses
                      14 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 05-28-2026, 11:40 AM
                      0 responses
                      29 views
                      0 reactions
                      Last Post SEQadmin2  
                      Working...