Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • daler
    Junior Member
    • Feb 2011
    • 8

    pybedtools - a Python wrapper for BEDTools

    [cross-posted to BEDTools mailing list]

    I use both BEDTools and Python a lot. So I've posted a new Python package, pybedtools, which wraps BEDTools programs in an easy-to-use Pythonic interface. The GitHub repo is here:

    Python wrapper -- and more -- for BEDTools (bioinformatics tools for "genome arithmetic") - daler/pybedtools


    and you can view the docs here:



    Note this is different from the recently-released "bedtools-python", which is an API for low-level access to interval files. In contrast, "pybedtools" tries to give you all the functionality of BEDTools from your Python code.

    This is still very much in development so not all parts of BEDTools are fully supported yet. I've tried to focus on documentation and doctests for this release; future releases will be more code-centric.

    For bug reports/feature requests, please use either this thread or the BEDTools mailing list.

    -Ryan
  • daler
    Junior Member
    • Feb 2011
    • 8

    #2
    Version 0.5 of pybedtools has now been released. There have been many, many improvements since the last version, thanks to collaboration with Brent Pedersen and Aaron Quinlan.

    In addition to wrapping all the BEDTools programs (including the latest multiBamCov, tagBam, and nucBed programs) and making them accessible from within Python, pybedtools now extends BEDTools by allowing feature-by-feature manipulation of BED/GFF/GTF/BAM/SAM/VCF files.

    There's lots more that pybedtools provides . . . as a brief example, here's the complete code that identifies genes that are <5kb from intergenic SNPs, given a file of genes and a file of SNPs:

    Code:
    from pybedtools import BedTool
    snps = BedTool('snps.bed.gz')
    genes = BedTool('hg19.gff')
    intergenic_snps = (snps - genes)
    nearby = genes.closest(intergenic_snps, d=True, stream=True)
    for gene in nearby:
        if int(gene[-1]) < 5000:
            print gene.name
    In contrast, here's how to do the same analysis in a shell script, which requires knowledge of bash, awk, and Perl:

    Code:
    snps=../test/data/snps.bed.gz
    genes=../test/data/hg19.gff
    intergenic_snps=/tmp/intergenic_snps
    
    snp_fields=`zcat $snps | awk '(NR == 2){print NF; exit;}'`
    gene_fields=9
    distance_field=$(($gene_fields + $snp_fields + 1))
    
    intersectBed -a $snps -b $genes -v > $intergenic_snps
    
    closestBed -a $genes -b $intergenic_snps -d \
    | awk '($'$distance_field' < 5000){print $9;}' \
    | perl -ne 'm/[ID|Name|gene_id]=(.*?);/; print "$1\n"'
    
    rm $intergenic_snps
    (You can read notes on this comparison at http://packages.python.org/pybedtool...omparison.html)


    Full documentation: http://packages.python.org/pybedtools/
    Source (contributions welcome): https://github.com/daler/pybedtools

    Or you can get a brief overview of pybedtools in:

    Pybedtools: a flexible Python library for manipulating genomic datasets and annotations
    Ryan K. Dale, Brent S. Pedersen, and Aaron R. Quinlan
    Bioinformatics (2011) first published online September 23, 2011
    doi:10.1093/bioinformatics/btr539

    Comment

    • allenma
      Junior Member
      • Mar 2011
      • 5

      #3
      ftp

      Thanks for pybedtools. It has been very helpful. Do you know if I can use remote bam files (FTP) with pybedtools yet? I know this feature was just added to Bedtools, but I can't seem to figure out how to use it.

      Thanks,
      Mary

      Comment

      • daler
        Junior Member
        • Feb 2011
        • 8

        #4
        Thanks for pointing this out. I've just added support for remote BAMs in the development version (https://github.com/daler/pybedtools).

        When creating a new BedTool object, you can now use the remote=True keyword argument. Here's an example that grabs the first 10 lines of a remote BAM and converts to BED:

        Code:
        import pybedtools
        url = 'ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/data/HG00096/exome_alignment/HG00096.chrom11.ILLUMINA.bwa.GBR.exome.20120522.bam'
        x = pybedtools.BedTool(url, remote=True)
        y = x.bam_to_bed(stream=True)
        for i, feature in enumerate(y):
            if i == 9:
                break
            print feature,

        this prints:

        Code:
         11	60636	60736	SRR081241.13799221/1	0	+
         11	60674	60774	SRR077487.5548889/1	0	+
         11	60684	60784	SRR077487.12853301/1	0	+
         11	60789	60889	SRR077487.5548889/2	0	-
         11	60950	61050	SRR077487.13826494/1	0	+
         11	60959	61059	SRR081241.13799221/2	0	-
         11	61052	61152	SRR077487.12853301/2	0	-
         11	61548	61648	SRR081241.16743804/2	0	+
         11	61665	61765	SRR081241.16743804/1	0	-

        Keep in mind that doing more extensive work that may need to read large parts of the remote file (like intersections) may take a long time.

        Comment

        • agroff11
          Junior Member
          • Sep 2014
          • 2

          #5
          implementation question

          UPDATE: Please ignore! I had a problem with my bedtools installation, fixed that and now this is no longer an issue! -_-

          Hello, pybedtools looks absolutely perfect for what I need to do, but I have a newbie implementation question.

          I'm trying to get the simple examples working, but I'm having trouble with the subtract function.
          I've just installed pybedtools-0.6.7 on a mac and i'm running:

          Code:
          import pybedtools
          
          # set up 3 different bedtools
          a = pybedtools.BedTool(pybedtools.example_filename('a.bed'))
          b = pybedtools.BedTool(pybedtools.example_filename('a.bed'))
          c = pybedtools.BedTool(pybedtools.example_filename('a.bed'))
          print a.count()
          print a
          print a.subtract(b)
          the result is:
          Code:
          4
          chr1	1	100	feature1	0	+
          chr1	100	200	feature2	0	+
          chr1	150	500	feature3	0	-
          chr1	900	950	feature4	0	+
          
          Traceback (most recent call last):
            File "test.py", line 9, in <module>
              print a.subtract(b)
            File "/Library/Python/2.7/site-packages/pybedtools-0.6.7-py2.7-macosx-10.8-intel.egg/pybedtools/bedtool.py", line 664, in decorated
              result = method(self, *args, **kwargs)
            File "/Library/Python/2.7/site-packages/pybedtools-0.6.7-py2.7-macosx-10.8-intel.egg/pybedtools/bedtool.py", line 146, in not_implemented_func
              raise NotImplementedError(help_str)
          NotImplementedError: "subtractBed" does not appear to be installed or on the path, so this method is disabled.  Please install a more recent version of BEDTools and re-import to use this method.
          I have the same issue when I try version 0.6.6 on a mac or 0.6.7 on linux. Have I missed something basic? Neither (a-b-c).count() nor (a-b).count() work for me either

          Thanks!!
          Last edited by agroff11; 09-30-2014, 11:04 AM. Reason: answered own question

          Comment

          • daler
            Junior Member
            • Feb 2011
            • 8

            #6
            pybedtools wraps and extends BEDTools, so you need to make sure BEDTools is available on your path.

            What version of BEDTools are you running? The easiest way to find out is by typing

            Code:
            bedtools --version
            in the terminal.

            Comment

            • agroff11
              Junior Member
              • Sep 2014
              • 2

              #7
              thanks

              Thank you! I had not updated my path post installation

              Comment

              • daler
                Junior Member
                • Feb 2011
                • 8

                #8
                OK, good -- glad it works.

                Comment

                Latest Articles

                Collapse

                • SEQadmin2
                  Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                  by SEQadmin2


                  I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


                  Here are nine questions we think about, in roughly the order they matter, before...
                  06-18-2026, 07:11 AM
                • SEQadmin2
                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                  by SEQadmin2


                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                  ...
                  06-02-2026, 10:05 AM
                • SEQadmin2
                  Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                  by SEQadmin2


                  With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                  Introduction

                  Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                  05-22-2026, 06:42 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by SEQadmin2, 06-17-2026, 06:09 AM
                0 responses
                21 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-09-2026, 11:58 AM
                0 responses
                39 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-05-2026, 10:09 AM
                0 responses
                46 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-04-2026, 08:59 AM
                0 responses
                49 views
                0 reactions
                Last Post SEQadmin2  
                Working...