Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • gmarco
    Member
    • Oct 2012
    • 36

    Convert pileup to bed

    Hello,

    I'm currently using samtools mpileup like this:

    Code:
    samtools mpileup -l target_regions.bed file1.bam file2.bam file3.bam > mpileup-all
    The output format is pileup here's a short example:

    Code:
    chr1	6384820	N	0	*	*	1	G	E	0	*	*	0	*	*	0	*	*	1	G	A0	*	*
    chr1	6384821	N	0	*	*	1	G	H	0	*	*	0	*	*	0	*	*	1	G	H0	*	*
    chr1	6384822	N	0	*	*	1	C	I	0	*	*	0	*	*	0	*	*	1	C	G0	*	*
    chr1	6384825	N	0	*	*	1	C	I	0	*	*	0	*	*	0	*	*	1	C	G0	*	*
    chr1	6384826	N	0	*	*	1	C	I	0	*	*	0	*	*	0	*	*	1	C	G0	*	*
    chr1	6384827	N	0	*	*	1	C	I	0	*	*	0	*	*	0	*	*	1	C	G0	*	*
    chr1	6384828	N	0	*	*	1	C	I	0	*	*	0	*	*	0	*	*	1	C	G0	*	*
    Is there any way to convert his pileup format to bed. For this short example the desired output would be:

    Code:
    chr1	6384820	6384822
    chr1	6384825	6384828
    Thanks.
  • AlexReynolds
    Member
    • Feb 2013
    • 45

    #2
    Perhaps a basic awk script would work:

    $ awk 'BEGIN{\
    first_start=$2;\
    old_start=first_start;\
    }\
    {\
    chr=$1;\
    current_start=$2;\
    if (current_start > (old_start + 1)) {\
    print chr"\t"first_start"\t"old_start;\
    first_start=current_start;\
    old_start=first_start;\
    }\
    else {\
    old_start=current_start;\
    }\
    }\
    END {\
    print chr"\t"first_start"\t"old_start;\
    }' pileup.txt

    Comment

    • dgscofield
      Member
      • Nov 2010
      • 28

      #3
      A bed for your example would be

      Code:
      chr1	6384819	6384822
      chr1	6384824	6384828
      because bed is 0-based, right-open.

      You could use my intervalBed script from here in a little pipeline:

      Code:
      samtools mpileup ... \
      | cut -f1,2 \
      | intervalBed header=0 val_col=0 trackname="pileupIntervals" > pileup.bed
      The repository as well as the intervalBed script (written in awk) include more documentation.
      Last edited by dgscofield; 06-05-2013, 03:10 PM. Reason: correct typo in trackname label

      Comment

      • AlexReynolds
        Member
        • Feb 2013
        • 45

        #4
        I think dgscofield is right — while BAM is 0-based, the output from SAM is still 1-based. Easy to screw up. Subtracting one from the start coordinate in my awk script should fix the indexing.

        Comment

        Latest Articles

        Collapse

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by SEQadmin2, 06-09-2026, 11:58 AM
        0 responses
        24 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-05-2026, 10:09 AM
        0 responses
        29 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-04-2026, 08:59 AM
        0 responses
        39 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-02-2026, 12:03 PM
        0 responses
        61 views
        0 reactions
        Last Post SEQadmin2  
        Working...