Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • finding common genomic regions from multiple (>2) BED files

    Hi all,

    I have 6 bed files and I am looking for common genomic regions among all the 6 files.
    Is there any tool to do the same?? Bedtools only takes 2 files at a time. Is there any way to do this in one go?? I am guessing Buioconductor-GRanges can achieve this, but I am not sure.

    At present I am doing it pairwise using bedtools, which is really hectic. To begin with there will be 10 comparisions.

    any suggestions ??

    Thanks all.
    Last edited by a_mt; 12-05-2012, 05:24 AM. Reason: Solved : just found multiIntersectBed option :)

  • #2
    You can do it using piping.

    intersectBed -a 1.bed -b 2.bed | intersectBed -a stdin -b 3.bed | ... and so on.

    Comment


    • #3
      how long are the files
      how are they separated
      how much memory has the computer
      can I just count common 15-substrings

      Comment


      • #4
        If you're only looking for the intersection of all 6 - then you just go

        cat 1 | intersect 2 stdin | intersect 3 stdin |intersect 4 stdin |intersect 5 stdin | intersect 6 stdin >out

        can't really get easier (or faster).

        If you want a "venn diagram" of all 6 - then you have a lot of comparisons to do

        Probably easier to combine and unique all of it - and add information to each position of which files it is present in - then you can query it in R og awk or something else....

        cat *.bed | sort -k1,1 -k2,2n |uniq >all.bed

        Then bedtools intersect -loj -a file1.bed -b all.bed - do this for all 6 files and keep that information (-loj = left outer join) - if there is a overlap it will add that info - otherwise it will add -1.

        Then you must remove some unwanted columns etc. - but it's a start.

        Comment


        • #5
          BEDOPS works directly with any number of files

          bedops --intersect f1.bed f2.bed f3.bed f4.bed f5.bed f6.bed > answer.bed

          (or even more consicely: bedops --intersect *.bed > final-answer)

          As you can see, this program usage is more concise than anything else you could do. It turns out to be more efficient than any other approach out there too (both in time and memory).

          You can pass any number of files to the bedops program directly. It doesn't read everything into memory, unlike other tool suites (those other suites actually require 2x their usual memory overhead too once you start using pipes as suggested above). Memory overhead is almost nothing for bedops (say < 20 MB), no matter how many or how big your input files get. And the program will run significantly faster than anything else out there right now.

          The only requirement is that each of your files is pre-sorted. Yet, every output result produced by bedops is guaranteed to be sorted for you, so any results can be used in the future and you never need to sort them.

          Pre-built binaries and source for the BEDOPS suite are available at http://code.google.com/p/bedops/ .

          To sort files, run them through the sort-bed program:
          sort-bed file1.bed > f1.bed

          You'll find that sort-bed happens to sort files faster than any other BED sorting program out there, as well. Our motto is simple: sort (at most) one time and run efficiently forever afterwards. Alternative suites do the equivalent of sorting every BED file every single time you call a program.

          As a final remark, doing the intersection between various sets is pretty easy, and you can do it in a pairwise fashion with pipes as shown in other posts above, which seems kind of cute. While that approach is not as efficient in memory nor time as a simple bedops call, it still seems nice on the surface.

          No such cute solution exists with pipes if you change the problem very slightly - instead, give me all regions specific to exactly 1 file. Try to build up a solution with pairwise set-difference operations with no (or few) intermediates files or fifos. See what happens when you go from 2 BED files to 3. Now, go to 4 and beyond (hint, it ain't good).

          However, this symmetric difference problem is easy for bedops. It's 1 command, regardless of the number of input files, just as in the intersection case.

          bedops --symmdiff f1.bed f2.bed f3.bed f4.bed f5.bed f6.bed > symmdiff-answer.bed

          This is concise and just as efficient as the intersection case. The bedops program was built from the ground up to work efficiently, both in time and memory, with any number of sorted input files at once.
          Last edited by sjneph; 01-31-2013, 11:59 PM.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Latest Developments in Precision Medicine
            by seqadmin



            Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

            Somatic Genomics
            “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
            05-24-2024, 01:16 PM
          • seqadmin
            Recent Advances in Sequencing Analysis Tools
            by seqadmin


            The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
            05-06-2024, 07:48 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 01:32 PM
          0 responses
          10 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 05-24-2024, 07:15 AM
          0 responses
          199 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 05-23-2024, 10:28 AM
          0 responses
          221 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 05-23-2024, 07:35 AM
          0 responses
          232 views
          0 likes
          Last Post seqadmin  
          Working...
          X