Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to transform BAM format to .TXT or .BED?

    Dear all,

    I downloaded a file in .BAM format and want to transform it into .BED format. What can I do? Thanks a lot!

    Zhen

  • #2
    Hi,
    I just finished a new version of BEDTools which has a C++ utility call bamToBed. This tool will convert BAM alignments to BED or BEDPE (see the BEDTools documentation) format. For example:

    1. Convert BAM alignments to BED format.
    Code:
    $ bamToBed -i reads.bam > reads.bed
    2. Convert BAM alignments to BED format using edit distance (NM) as the BED “score”. Default is mapping quality.
    Code:
    $ bamToBed -i reads.bam -ed > reads.bed
    3. Convert BAM alignments to BEDPE format.
    Code:
    $ bamToBed -i reads.bam -bedpe > reads.bedpe
    Heng Li also posted a nice example of how to create a BAMToBED utility using the SamTools code base.


    You might also be interested in two other utilities in BEDTools that now support BAM input and output. Namely, intersectBed now accepts BAM files as input and will separately compare each alignment (each end separately if paired-end) to a BED file. One can create a new BAM file based on those alignments that do or do not overlap the BED features in question. Similarly, pairToBed does the same thing, but requires that the BAM file be paired. This tools is a bit more sophisticated in that one can require the "span" of the aligned pair to overlap, as well as either/both/neither/xor/notboth ends of the pair.

    For example:

    1. Retain only paired-end BAM alignments where neither end overlaps simple sequence repeats.
    Code:
    $ pairToBed -abam reads.bam -b SSRs.bed -type neither > reads.noSSRs.bam
    2. Retain only paired-end BAM alignments where both ends overlap segmental duplications.
    Code:
    $ pairToBed -abam reads.bam -b segdups.bed -type both > reads.SSRs.bam
    3. Retain only paired-end BAM alignments where neither or one and only one end overlaps segmental duplications.
    Code:
    $ pairToBed -abam reads.bam -b segdups.bed -type notboth > reads.notbothSSRs.bam

    The BAM support is built upon Derek Barnett's nice C++ BAM API called BAMTools (http://sourceforge.net/projects/bamtools/). I'd encourage you to take a look at the new BEDTools manual for more details if you are interested.

    Best,
    Aaron

    Comment


    • #3
      Hi,

      Originally posted by zhenshao View Post
      Dear all,

      I downloaded a file in .BAM format and want to transform it into .BED format. What can I do? Thanks a lot!

      Zhen
      I'm happy that BEDTools now include a bam->bed conversion utility... btw you still may try this (at least for Illumina reads):

      Code:
      samtools view -F 0x0004 $filein | awk '{OFS="\t"; if (and($2, 16)) print $3,$4,$4+length($10),$1,$5,"-"; else print $3,$4,$4+length($10),$1,$5,"+" }
      d

      Comment


      • #4
        Here is the sample code from Heng Li.

        Download SAM tools for free. SAM (Sequence Alignment/Map) is a flexible generic format for storing nucleotide sequence alignment. SAMtools provide efficient utilities on manipulating alignments in the SAM format.

        Protocol #4 describes his cut at bamToBed.

        Aaron

        Comment


        • #5
          bedtools bamtobed when large indels are present

          I really like the bedtools bamtobed command, although there are some instances where reads skip very large indels and you don't want the bed file to include those indels. The only information you need is contained in columns 3 (chromosome), 4 (base pair start), and 6 (CIGAR) of the BAM file. Here is a simple awk script that should work (it really should be an option of bedtools bamtobed):
          Code:
          samtools view in.bam |
            awk '{split ($6,a,"[MIDNSHP]"); bp=$4-1; n=0;
              for (i=1; i<=length(a); i++) {
                n+=1+length(a[i]);
                if (substr($6,n,1)=="M") print $3"\t"bp"\t"(bp+=a[i]);
                if (substr($6,n,1)=="D") bp+=a[i];
              }
            }' > out.bed

          Comment


          • #6
            --Hi,

            i have a strange result using BamToBed and awk command line:

            samtools view -F 0x0004 464_J3_D1.bam | head -1
            IP6FNQC01CAO42 0 gi|2281652|gb|AF004394.1| 18 40 5S304M7841N12M1D38M3S * 0 0 TTAACTCCCAGAAAAGACAAGATATCCTTGATCTGTGGGTCTACCACACGCAAGGCTACTTCCCTGATTGGCAGAACTACACACCAGGGCCAGGGATCAGATATCCACTGACCTTTGGATGGTGCTTCAAGCTAGTACCAGTGGAGCCAGAGAAGGTAGAAGAGGCCAATGAAGGAGAGAACAACAGCCTGTTACACCCTATGAGCCTGCATGGGATGGAGGACCCGGAGAAGGAAGTGTTAATGTGGCGGTTTGACAGCAGCCTAGCATTTCATCACATGGCCCGAGAGCTGCATCCGGAGCACTACAAGAACCAACAAGAAAGAATGAACAAGAATTATTAGAATTGGATAAATGGGACA 433146444?8.//153FFFFFFIIIIGGIIIIIII:::=IIIGGIIIIIIIIIIIIIIIIIIIHIIIIIIIIIIGIIIIIIIGGG888GGI666GIIIIIIIIIIIIIGFEEGGIIIII===GGGGGGGGGGGGGGGGGGGGGGGGGGGGG@@@@GGGDDE@>>AACCGDDDDDDDDE<<<>DCDFIIIIEDFFDDDDFDDFFDDDFFECC221;C>>>B?>888EGC>>>@CGGGC>>>BBBBB>>>::333<>;;;>>BBBGDDDDCCCDDDDDCCBAAAABCDBBBBBBAAA4444@@BB?==A???A?<<444;40..../588633579<<../0009<<<<<::=988:////25 MD:Z:17G17T8A45G46T9C2A14A14T21A6A5T3A6GA8GCA4AA10C41T13G4^A27A5G4 NH:i:1 HI:i:1 NM:i:26 SM:i:40 XQ:i:40 X2:i:0 XS:A:?

            bamToBed give me this result:
            gi|2281652|gb|AF004394.1| 17 8213 IP6FNQC01CAO42 40 +

            and
            awk '{OFS="\t"; if (and($2, 16)) print $3,$4,$4+length($10),$1,$5,"-"; else print $3,$4,$4+length($10),$1,$5,"+" }'

            give me:
            gi|2281652|gb|AF004394.1| 18 380 IP6FNQC01CAO42 40 +


            in bamToBed result i have 8213, why ?

            thank you --

            Comment


            • #7
              The read is spliced (note 7841N in the CIGAR string), so bamToBed is correct.

              Comment


              • #8
                yes but what's mean the second coordinate 8213 ?

                Comment


                • #9
                  The second value would be the end of where the read aligns.

                  Comment


                  • #10
                    okay i understand,
                    and is there a way to calculate the length of splicing region ?

                    Comment


                    • #11
                      Have a read through the SAM specification.

                      Comment


                      • #12
                        Is there a way I could extract a range, say [chr3,a,b] to a BED format from a BAM file?

                        Comment


                        • #13
                          How about samtools view to select the region of interest, and then bamtools bedtobam to convert the BAM file to the BED format?

                          You can pipe the output of samtools view directly to bedtobam.
                          Last edited by blancha; 11-14-2015, 06:33 PM.

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Recent Developments in Metagenomics
                            by seqadmin





                            Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
                            09-23-2024, 06:35 AM
                          • seqadmin
                            Understanding Genetic Influence on Infectious Disease
                            by seqadmin




                            During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

                            Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
                            09-09-2024, 10:59 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 10-02-2024, 04:51 AM
                          0 responses
                          13 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 10-01-2024, 07:10 AM
                          0 responses
                          21 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 09-30-2024, 08:33 AM
                          0 responses
                          25 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 09-26-2024, 12:57 PM
                          0 responses
                          18 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X