Announcement

Collapse
No announcement yet.

changing chromosome notation in .BAM file

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • changing chromosome notation in .BAM file

    Hi everyone,

    I have .bam files in which the chromosomes are notated as '1', '2', '3', ... 'X', 'Y'.
    However, for further analyses I need a .bam file in which the chromosomes are notated as 'chr1', 'chr2', 'chr3', ... 'chrX'. Does someone know a way to do this?

    I thought I might need the substitute (s) command ...

  • #2
    I think it might be easiest to change the reference fasta file you are using to match the BAM?

    Comment


    • #3
      This is of course one option ... I'm remapping my data for the moment, but since I have 5 .bam files from exome data, it takes quite some time. So I was looking for a faster option

      Comment


      • #4
        Look at the samtools reheader command. Presumably you would just "samtools view -H file.bam > header.sam", edit the header, and then use that with reheader.

        Comment


        • #5
          That would be a way to change the header, but if I'm not mistaking the chromosome numbers are also in the actual read-part of the file. So how to change them there?

          Comment


          • #6
            You are mistaken. The reads only contain a number that tells which entry from the header to pick.

            Comment


            • #7
              Originally posted by ddaneels View Post
              That would be a way to change the header, but if I'm not mistaking the chromosome numbers are also in the actual read-part of the file. So how to change them there?
              As was mentioned by ffinkernagel, this is incorrect. I should note that you should avoid swapping the order of chromosomes or any other major edits. Just adding or removing "chr" won't break anything, but changing the order of things in the header or removing chromosomes could cause issues.

              Comment


              • #8
                Originally posted by ffinkernagel View Post
                You are mistaken. The reads only contain a number that tells which entry from the header to pick.
                In SAM spec v 1.4 document, column 3 RNAME is of String type, to hold Reference sequence NAME of the alignment.
                Last edited by xied75; 08-14-2012, 08:14 AM. Reason: My mistake.

                Comment


                • #9
                  Thanks for the info.

                  Everything will work fine then. I was confused with the "1" in the 7th column of the read-part in the .bam file.

                  Example:

                  HWI-ST571_103:4:1302:9610:62449 99 1 604269 254 100M = 60324 152 GGAA...

                  I thought that the highlighted 1 also had to be changed to chr1.

                  Comment


                  • #10
                    1, This Big Huge Black 1, is column 3, not 7. It is not a number, but a string.
                    2, You don't need to change this 1 to chr1, is because programs like BWA and Samtools will read both format; but it doesn't mean this is a number ref to the header lines. Your understanding is more correct.

                    Comment


                    • #11
                      Originally posted by ffinkernagel View Post
                      You are mistaken. The reads only contain a number that tells which entry from the header to pick.
                      Or to be more precise, in BAM the reads just store an integer to say which reference it mapped to (referencing a table at the start of the BAM file, which is separate to any embedded SAM header), but in SAM the reads store the reference sequence's name.

                      Comment


                      • #12
                        Originally posted by dpryan View Post
                        Look at the samtools reheader command. Presumably you would just "samtools view -H file.bam > header.sam", edit the header, and then use that with reheader.
                        I don't think that would work. Using 'samtools reheader' would only edit the embedded SAM header embedded in a BAM file, it would not IIRC update the separate BAM specific header table containing the list of references (their names and references).

                        You could turn the BAM file into SAM (e.g. with samtools view -h), do the replacement (e.g. with sed), and then optionally convert back to BAM (again with samtools view). That can be done as one line by piping the output from one tool to the next.

                        Comment


                        • #13
                          Originally posted by maubp View Post
                          I don't think that would work. Using 'samtools reheader' would only edit the embedded SAM header embedded in a BAM file, it would not IIRC update the separate BAM specific header table containing the list of references (their names and references).

                          You could turn the BAM file into SAM (e.g. with samtools view -h), do the replacement (e.g. with sed), and then optionally convert back to BAM (again with samtools view). That can be done as one line by piping the output from one tool to the next.
                          Actually, bam_reheader runs the full bam_header_write using only the new header, so it seems it does both (I haven't bothered looking into the source of bam_header_write, I should note). I decided to run a quick test, since I can't say I've ever actually run the reheader command. For that, I took the header of a sorted alignment (written to a file called header.sam), and changed "chr1" to "chr100".
                          Code:
                          samtools view accepted_hits.bam | head -n 2
                          HWI-ST143:530:C102UACXX:5:1101:3568:162900	272	chr1	3005607	0	51M	*	0	0	CATAAATTCATTTTTTAATAGCTGAGTAGTATTCCATTGTGTAAATGTACC	*	AS:i:0	XN:i:0	XM:i:0	XO:i:0	XG:i:0	NM:i:0	MD:Z:51	YT:Z:UU	NH:i:20	CC:Z:=	CP:i:105668244	HI:i:0
                          HWI-ST143:530:C102UACXX:5:1308:5464:137163	272	chr1	3006556	0	51M	*	0	0	TTAGCTCCCTTGTCAAAGATCAGGTGACCATAGGTGTGTGGATTCATCTCT	*	AS:i:0	XN:i:0	XM:i:0	XO:i:0	XG:i:0	NM:i:0	MD:Z:51	YT:Z:UU	NH:i:20	CC:Z:chr15	CP:i:16439024	HI:i:0
                          Code:
                          samtools reheader header.sam accepted_hits.bam | samtools view - | head -n 2
                          HWI-ST143:530:C102UACXX:5:1101:3568:162900	272	chr100	3005607	0	51M	*	0	0	CATAAATTCATTTTTTAATAGCTGAGTAGTATTCCATTGTGTAAATGTACC	*	AS:i:0	XN:i:0	XM:i:0	XO:i:0	XG:i:0	NM:i:0	MD:Z:51	YT:Z:UU	NH:i:20	CC:Z:=	CP:i:105668244	HI:i:0
                          HWI-ST143:530:C102UACXX:5:1308:5464:137163	272	chr100	3006556	0	51M	*	0	0	TTAGCTCCCTTGTCAAAGATCAGGTGACCATAGGTGTGTGGATTCATCTCT	*	AS:i:0	XN:i:0	XM:i:0	XO:i:0	XG:i:0	NM:i:0	MD:Z:51	YT:Z:UU	NH:i:20	CC:Z:chr15	CP:i:16439024	HI:i:0
                          So, it seems to work.

                          Comment


                          • #14
                            OK - I may have been worrying over nothing then.

                            Comment


                            • #15
                              my header.sam file looks OK. "chr" has been added.

                              But when I run the samtools reheader command, nothing changes in the original .bam file...

                              Code:
                              samtools reheader header.sam sample1.bam | samtools view -H sample1.bam

                              Sample1.bam is my original file, so I was hoping that the header in sample1.bam would have changed, but it didn't.

                              I'm a programming newbie ... so maybe there's a mistake in my code?

                              Comment

                              Working...
                              X