Hi,
I have some human genome data in bam format that I want to upload to some software (Congenica) to perform variant filtering.
The problem I have is that presently the data has been mapped to a reference genome containing alternate contigs but the software does not accept them, it only accepts contigs from the standard chromosomes chr1-22, chrX, chrY and chrM.
I therefore want to remove the alternate contigs and then update the header to only include the standard chromosomes. Sounds simple but nothing I have tried will actually give me a usable bam file to upload.
I have started using a bam file where the header looks like:
There are thousands of the alt contigs and then some lines with @RG and @PG data.
I first used the following samtools command:
samtools view -L bam_contigs_to_keep.bed -O BAM -o Foo_edit_1.bam Foo.bam
where the bed file looks like:
This has reduced the bam file size from ~12GB to ~9GB which I assume is a result of the alternate contigs being removed.
However, the alternate contigs are still included in the header and so I’ve next used the following command to update the header:
samtools reheader reordered_head_GRCh38.dict Foo_edit_1.bam > Foo_edit_2.bam
the .dict file looks like this:
This reduces the bam file size by just ~100KB which I’ve assumed is due to the alternate contigs being removed from the header.
My problem is that when I run samtools flagstat the original bam and edit_1.bam have >1billion reads but edit_2.bam has ~600 and is truncated.
I have also tried using the picard ReplaceSamHeader command, and this example from another forum:
samtools view Foo.bam chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 chrX chrY chrM | samtools view -bo Foo_edit_2.bam -t corrected_bam_head.sam -
Nothing has yet worked and given me a usable edited bam file.
I would really appreciate any help or advice someone could give about this.
Many thanks
Hywel
I have some human genome data in bam format that I want to upload to some software (Congenica) to perform variant filtering.
The problem I have is that presently the data has been mapped to a reference genome containing alternate contigs but the software does not accept them, it only accepts contigs from the standard chromosomes chr1-22, chrX, chrY and chrM.
I therefore want to remove the alternate contigs and then update the header to only include the standard chromosomes. Sounds simple but nothing I have tried will actually give me a usable bam file to upload.
I have started using a bam file where the header looks like:
There are thousands of the alt contigs and then some lines with @RG and @PG data.
I first used the following samtools command:
samtools view -L bam_contigs_to_keep.bed -O BAM -o Foo_edit_1.bam Foo.bam
where the bed file looks like:
This has reduced the bam file size from ~12GB to ~9GB which I assume is a result of the alternate contigs being removed.
However, the alternate contigs are still included in the header and so I’ve next used the following command to update the header:
samtools reheader reordered_head_GRCh38.dict Foo_edit_1.bam > Foo_edit_2.bam
the .dict file looks like this:
This reduces the bam file size by just ~100KB which I’ve assumed is due to the alternate contigs being removed from the header.
My problem is that when I run samtools flagstat the original bam and edit_1.bam have >1billion reads but edit_2.bam has ~600 and is truncated.
I have also tried using the picard ReplaceSamHeader command, and this example from another forum:
samtools view Foo.bam chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 chrX chrY chrM | samtools view -bo Foo_edit_2.bam -t corrected_bam_head.sam -
Nothing has yet worked and given me a usable edited bam file.
I would really appreciate any help or advice someone could give about this.
Many thanks
Hywel