Announcement

Collapse
No announcement yet.

BAM quality how to convert illumina 1.5 to 1.9?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BAM quality how to convert illumina 1.5 to 1.9?

    Welcoming 2014 the best way I know...

    I've aligned a good deal of illumina data I have accumulated in order to call SNPs with GATK. I had over looked the fact that some of the older runs are with the illumina 1.5 (phred+64) quality encoding. GATK does not like this. I would like to convert the BAM files to 1.9/sanger/phred+33 quality. Can anyone recommend a tool that would do this? If I have to do it to fastq files so be it... I will happily take recommendations for that conversion although there seems to be more options available.

    Thank you and happy new year.

    p.s. if there is a way to inform GATK that specific files have different encoding I am all ears.

  • #2
    I think you could convert BAM to FastQ using Samtools and then use FASTQ Groomer to convert FastQsanger format on the Galaxy.

    http://www.biostars.org/p/770/ and https://usegalaxy.org/u/dan/p/fastq

    I hope this helps.

    Comment


    • #3
      Thanks Zapages,
      Looks like a do-able work around BUT i would like to avoid uploading data to galaxy if possible...

      Comment


      • #4
        Given that you have multiple files and they're probably not terribly small, a C-based solution is likely preferred:

        Code:
        #include "bam.h"
        
        int main(int argc, char *argv[]) {
            int i;
            uint8_t *qual;
            bam_header_t *header;
            char *oname, *p;
            bam1_t *read;
            bamFile f, of;
        
            if(argc != 2 || strcmp(argv[1], "-h") == 0) {
                printf("Usage: %s file.bam\n", argv[0]);
                printf("Converts a BAM file with Phred scores encoded as QUAL+64 into the standard QUAL+33 format\n");
                return -1;
            }
        
            //Open the input
            f = bam_open(argv[1], "rb");
            header = bam_header_read(f);
            read = bam_init1();
        
            //Create the output file
            oname = strdup(argv[1]);
            oname = realloc(oname, sizeof(char) * (strlen(argv[1]) + strlen(".recalibrated ")));
            p = strrchr(oname, '.');
            *p = '\0';
            sprintf(oname, "%s.recalibrated.bam", oname);
            of = bam_open(oname, "wb");
            bgzf_mt(of, 4, 256); //Use 4 compression threads
        
            //Write the header
            bam_header_write(of, header);
        
            //Iterate over the reads
            while(bam_read1(f, read) > 1) {
                qual = bam1_qual(read);
                for(i=0; i<read->core.l_qseq; i++) *(qual+i) -= 31;
                bam_write1(of, read);
            }
        
            bam_close(of);
            bam_close(f);
            bam_destroy1(read);
            free(oname);
            return 0;
        }
        This uses the samtools C API. If you download the samtools source code and compile it in "/home/your_user_name/Downloads/samtools", then compilation would be something of the form:

        Code:
        gcc -Wall -o convert convert.c -lbam -lz -lpthread -I/home/your_user_name/Downloads/samtools -L/home/your_user_name/Downloads/samtools
        If you saved the above code as convert.c, at least. The you could just:

        Code:
        convert somefile.bam
        Assuming the program is in your PATH, this will produce "somefile.recalibrated.bam".

        Comment


        • #5
          Wow, awesome. This is exactly what I was hoping for. I will give it a try right away and let you know how it goes.

          Comment


          • #6
            dpryan, I successfully compiled and ran your program. Unfortunately, as far as I can tell the resulting bam files are no longer indexed and I think have also been stripped of the read group header info. Is it possible to use this and still retain that information?
            Thanks!

            Comment


            • #7
              The indexing will be different every time, so I doubt there's a good way to maintain that. The read groups should actually be maintained, I'm surprised they've not been. Can you post the header and a couple example alignments from a file before/after conversion where the read group wasn't kept? I'm not sure why that would happen, but that'd give me something to use to find the cause.

              Comment


              • #8
                In getting examples ready for you I realized I made an embarrassingly simple error. Your script works great. Thanks again! Happy new year.

                Comment


                • #9
                  Cool, glad it worked!

                  Comment

                  Working...
                  X