No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • bowtie vs bwa + samtools == confusion

    I have a dataset that was generated using the GAIIx.

    I took the fastq, ran it through bowtie with --best and -k1, transformed it in a sam format, pileuped it, varFiltered it with -Q30 and obtained 9309 SNPs.

    I usually just start with bwa, but in this case I wanted to look at a coverage graph quickly that's why I used bowtie first.

    In any case, I used default params for bwa, I used -q20 when creating the bam file, ran pileup, ran varFIlter -Q30. I got 56k SNPs.

    After falling from my chair and getting coffee, I started looking at the data.

    One thing that struck me, was that the samtools spec says that base quality is phred33, but when I check the pileup files from my bwa run, the characters are in the phred64 range. Can this affect the consensus/SNP calling from pileup?

    Should I convert the original fastq files from phred64 to phred33 (bwa doesn't use qualities so it doesn't matter for it)?

    Another thing, why would the SNP score for these be 1
    chrI 1599 T Y 1 1 37 10 ,,.c,.,c.^F. \RcL^dMHfe
    chrI 3658 A M 1 1 37 23 .$....,........,cc,c,,,^F. ^fdccdeefeddff^GGbGY\L`

    especially for the first one, it looks good to me...but what do I know :-)

    I am at the point of comparing the alignment of individual reads between both to see what's going on, but before doing this I was wondering if any of you had any clues.

    BTW, I'm also trying the GATK IndelRealigner to see how it changes the SNP calling for bwa.
    Last edited by lletourn; 04-29-2010, 06:25 AM.

  • #2
    If somebody want's to see the pipelined script.
    The name of the files have been changed to preserve the identity of those files...ahem... ;-)

    $BWA aln -t 8 <GENOME> <FASTQ> > bwa.sai
    $BWA samse <GENOME> bwa.sai <FASTQ> > sam.sam

    $SAMTOOLS_HOME/samtools view -bS -q20 -o bam.bam sam.sam
    $SAMTOOLS_HOME/samtools sort bam.bam bam.sorted
    $SAMTOOLS_HOME/samtools index bam.sorted.bam

    mkdir -p wig
    java -Xmx15G -Xms15G -jar ~/fp4/FindPeaks.jar -input bam.sorted.bam -aligner sam -output wig/ -dist_type 3 -name bam

    $SAMTOOLS_HOME/samtools pileup -vcf <GENOME> bam.sorted.bam > bam.sorted.pileup
    $SAMTOOLS_HOME/ varFilter -Q 30 bam.sorted.pileup > bam.sorted.coveraged.pileup


    • #3
      For bowtie it's the same except for the aligner call:
      $BOWTIE -t --chunkmbs 1024 -p 14 -k 1 --best -y -S --solexa1.3-quals -S <GENOME> <FASTQ> > sam.sam


      • #4
        bwa does not use base quality, but samtools cares. you should convert the quality.


        • #5
          Hi Ih3.

          I'm having the same problem.

          (I have aligned paired end reads from Illumina GAIIx with Bowtie and BWA. I called SNVs with VarScan, but I would like to call SNVs also with samtools varfilter so that I can compare the resulting SNVs).

          What is the best way to convert these qualities for samtools? Is there maybe some software already available?



          • #6
            I used embosses seqret. lh3 was right, it solved all my problems.
            $EMBOSS_HOME/bin/seqret fastq-illumina::q64.fastq fastq::q33.fastq

            I use the CSHL fastx tool first to trim. It reads phred-64 by default, but you can run it on phred33 data by using the (hidden) -Q33 parameter


            • #7

              I'm new to these analyses, so I don't really understand it yet.
              Do you use this seqret on the original fastq files? And only when aligning with BWA, or also when aligning with Bowtie?
              Do you trim before or after the seqret, and what do you trim for? Only when you use barcodes, or also for noise?

              Thanks a lot!


              • #8

                I'm new to these analyses, so I don't understand everything yet.

                Do you use the seqret on original fastq files? Do you only have to convert the qualities when aligning with BWA, or also when aligning with Bowtie?
                Do you trim before or after seqret, and what do you trim for? (is it for barcodes, or also for noise?)

                Thanks a lot!


                • #9
                  You may use MAQ to convert illumina fastq into sanger fastq and proceed with BWA. While using Bowtie, you have to specify --solexa1.3-quals for pipeline 1.3+ reads.


                  • #10
                    The thing with maq is that you need a patch to make it work. It doesn't convert solexa >1.3 only <1.3.
                    The patch is here:

                    As seq_GA said bowtie can use phred33 or phred64 formats.

                    BWA doesn't use quality to align so you don't have to convert the qualities, but if you use samtools afterwards, the qualities in your sam file *must* be phred33.

                    If you don't it might call snps where bases are of low quality in phred 64 but look of high quality when you interpret them as phred33.

                    Instead of patching and compiling I prefer to use emboss. Hence, my suggestion.


                    • #11

                      I used the Emboss seqret and I think it worked alright.
                      Thanks for the help!


                      Latest Articles


                      • seqadmin
                        Advanced Methods for the Detection of Infectious Disease
                        by seqadmin

                        The recent pandemic caused worldwide health, economic, and social disruptions with its reverberations still felt today. A key takeaway from this event is the need for accurate and accessible tools for detecting and tracking infectious diseases. Timely identification is essential for early intervention, managing outbreaks, and preventing their spread. This article reviews several valuable tools employed in the detection and surveillance of infectious diseases.
                        11-27-2023, 01:15 PM
                      • seqadmin
                        Strategies for Investigating the Microbiome
                        by seqadmin

                        Microbiome research has led to the discovery of important connections to human and environmental health. Sequencing has become a core investigational tool in microbiome research, a subject that we covered during a recent webinar. Our expert speakers shared a number of advancements including improved experimental workflows, research involving transmission dynamics, and invaluable analysis resources. This article recaps their informative presentations, offering insights...
                        11-09-2023, 07:02 AM





                      Topics Statistics Last Post
                      Started by seqadmin, 12-01-2023, 09:55 AM
                      0 responses
                      Last Post seqadmin  
                      Started by seqadmin, 11-30-2023, 10:48 AM
                      0 responses
                      Last Post seqadmin  
                      Started by seqadmin, 11-29-2023, 08:26 AM
                      0 responses
                      Last Post seqadmin  
                      Started by seqadmin, 11-29-2023, 08:12 AM
                      0 responses
                      Last Post seqadmin