Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BAM quality how to convert illumina 1.5 to 1.9?

    Welcoming 2014 the best way I know...

    I've aligned a good deal of illumina data I have accumulated in order to call SNPs with GATK. I had over looked the fact that some of the older runs are with the illumina 1.5 (phred+64) quality encoding. GATK does not like this. I would like to convert the BAM files to 1.9/sanger/phred+33 quality. Can anyone recommend a tool that would do this? If I have to do it to fastq files so be it... I will happily take recommendations for that conversion although there seems to be more options available.

    Thank you and happy new year.

    p.s. if there is a way to inform GATK that specific files have different encoding I am all ears.

  • #2
    I think you could convert BAM to FastQ using Samtools and then use FASTQ Groomer to convert FastQsanger format on the Galaxy.

    http://www.biostars.org/p/770/ and https://usegalaxy.org/u/dan/p/fastq

    I hope this helps.

    Comment


    • #3
      Thanks Zapages,
      Looks like a do-able work around BUT i would like to avoid uploading data to galaxy if possible...

      Comment


      • #4
        Given that you have multiple files and they're probably not terribly small, a C-based solution is likely preferred:

        Code:
        #include "bam.h"
        
        int main(int argc, char *argv[]) {
            int i;
            uint8_t *qual;
            bam_header_t *header;
            char *oname, *p;
            bam1_t *read;
            bamFile f, of;
        
            if(argc != 2 || strcmp(argv[1], "-h") == 0) {
                printf("Usage: %s file.bam\n", argv[0]);
                printf("Converts a BAM file with Phred scores encoded as QUAL+64 into the standard QUAL+33 format\n");
                return -1;
            }
        
            //Open the input
            f = bam_open(argv[1], "rb");
            header = bam_header_read(f);
            read = bam_init1();
        
            //Create the output file
            oname = strdup(argv[1]);
            oname = realloc(oname, sizeof(char) * (strlen(argv[1]) + strlen(".recalibrated ")));
            p = strrchr(oname, '.');
            *p = '\0';
            sprintf(oname, "%s.recalibrated.bam", oname);
            of = bam_open(oname, "wb");
            bgzf_mt(of, 4, 256); //Use 4 compression threads
        
            //Write the header
            bam_header_write(of, header);
        
            //Iterate over the reads
            while(bam_read1(f, read) > 1) {
                qual = bam1_qual(read);
                for(i=0; i<read->core.l_qseq; i++) *(qual+i) -= 31;
                bam_write1(of, read);
            }
        
            bam_close(of);
            bam_close(f);
            bam_destroy1(read);
            free(oname);
            return 0;
        }
        This uses the samtools C API. If you download the samtools source code and compile it in "/home/your_user_name/Downloads/samtools", then compilation would be something of the form:

        Code:
        gcc -Wall -o convert convert.c -lbam -lz -lpthread -I/home/your_user_name/Downloads/samtools -L/home/your_user_name/Downloads/samtools
        If you saved the above code as convert.c, at least. The you could just:

        Code:
        convert somefile.bam
        Assuming the program is in your PATH, this will produce "somefile.recalibrated.bam".

        Comment


        • #5
          Wow, awesome. This is exactly what I was hoping for. I will give it a try right away and let you know how it goes.

          Comment


          • #6
            dpryan, I successfully compiled and ran your program. Unfortunately, as far as I can tell the resulting bam files are no longer indexed and I think have also been stripped of the read group header info. Is it possible to use this and still retain that information?
            Thanks!

            Comment


            • #7
              The indexing will be different every time, so I doubt there's a good way to maintain that. The read groups should actually be maintained, I'm surprised they've not been. Can you post the header and a couple example alignments from a file before/after conversion where the read group wasn't kept? I'm not sure why that would happen, but that'd give me something to use to find the cause.

              Comment


              • #8
                In getting examples ready for you I realized I made an embarrassingly simple error. Your script works great. Thanks again! Happy new year.

                Comment


                • #9
                  Cool, glad it worked!

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM
                  • seqadmin
                    Techniques and Challenges in Conservation Genomics
                    by seqadmin



                    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                    Avian Conservation
                    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                    03-08-2024, 10:41 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 03-27-2024, 06:37 PM
                  0 responses
                  12 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-27-2024, 06:07 PM
                  0 responses
                  11 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-22-2024, 10:03 AM
                  0 responses
                  52 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-21-2024, 07:32 AM
                  0 responses
                  68 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X