Welcome to the New Seqanswers!

Welcome to the new Seqanswers! We'd love your feedback, please post any you have to this topic: New Seqanswers Feedback.
See more
See less

insert PG and CL tags into BAM header with BWA

  • Filter
  • Time
  • Show
Clear All
new posts

  • insert PG and CL tags into BAM header with BWA

    What do you recommend I use to insert full command line into the BAM header created with BWA. BWA already creates this:

    @PG ID:bwa PN:bwa VN:0.5.9-r16

    But I'd like to capture the whole process (aln and sampe) and insert them as CL tags.

    Tophat does it nicely like this for example:

    @PG ID:TopHat VN:1.2.0 CL:/packages/tophat-1.2.0/tophat -r 43 -p 8 -o /some/path/s_1_topHat -G /some/path/Homo_sapiens.NCBI36.54.noMT.gtf /some/path/human_b36_both_noM /some/path/s_1_1.fastq /some/path/s_1_3.fastq

  • #2
    One solution is a script in your language of choice (Perl, Python, etc) to edit the current header, combined with samtools reheader to update the BAM file.


    • #3
      First way:

      samtools view -H file.bam > header.sam

      edit header.sam file

      samtools reheader header.sam file.bam > file_new_head.bam

      OR as a one liner, something like this:

      samtools view -h file.bam | perl -pe 's/^\@PG\told line$/@PG\tnew line/' | samtools view -Sb -o file_new_head.bam -



      • #4
        adding @PG @RG and more with BamUtil polishBam

        I disc-overed this great 'Umich' tooolbox after searching for quite some time (

        I first stored my full bwa command in a variable 'cmd' then looked for my bwa version using 'expr' and finally built the full @PG string taking care of keeping tabs (that was hard! ).

        The result was accepted by polishBam and creates a true @PG line.
        I guess similar approach will work for other header tags.

        ... more parameters not replicated here (obvious)
        cmd="bwa mem -R \"${rgstring}\" \
        	-M \
        	-t ${nthr} \
        	${refgen} \
        	${fq1} ${fq2} > ${outfolder}/${bwape}.sam \
        # execute the command
        eval ${cmd}
        #### add @PG group to results with **bamUtil**
        # important formatting issues here
        # $'\t' is adding a 'tab'
        # \' surrounding ${cmd} \' make sure the long text is not interpreted
        # finally, $pgstring should be quoted
        bwaver=$(expr "$(bwa 2>&1)" : '.*Version:\(.*\)Contact.*')
        [email protected]$'\t'ID:BWA$'\t'VN:${bwaver}$'\t'CL:\'${cmd}\'
        bam polishBam --PG "${pgstring}" \
        	-i ${outfolder}/${bwape}.sam \
        	-o ${outfolder}/${bwape}pg.sam \
        	-l ${outfolder}/polishBam_${infile%%.bam}.log
        Hope this will help others!
        thanks to the BamUtil team for these great tools

        PS: the quick and dirty method consisting in adding the @PG line with the @RG (-R) option seems to work too.

        # substitute the following string in the upper command
        rgstring='@RG\tID:NA18507\tLB:lib-NA18507\tPU:unkn-0.0\tPL:ILLUMINA\tSM:GAIIx-chr21-BWA.mem\[email protected]\tID:BWA\tVN:...'

        still, I would prefer an additional BWA parameter to do this and more about headers.