Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • lletourn
    Member
    • Oct 2009
    • 63

    bowtie vs bwa + samtools == confusion

    I have a dataset that was generated using the GAIIx.

    I took the fastq, ran it through bowtie with --best and -k1, transformed it in a sam format, pileuped it, varFiltered it with -Q30 and obtained 9309 SNPs.

    I usually just start with bwa, but in this case I wanted to look at a coverage graph quickly that's why I used bowtie first.

    In any case, I used default params for bwa, I used -q20 when creating the bam file, ran pileup, ran varFIlter -Q30. I got 56k SNPs.

    After falling from my chair and getting coffee, I started looking at the data.

    One thing that struck me, was that the samtools spec says that base quality is phred33, but when I check the pileup files from my bwa run, the characters are in the phred64 range. Can this affect the consensus/SNP calling from pileup?

    Should I convert the original fastq files from phred64 to phred33 (bwa doesn't use qualities so it doesn't matter for it)?

    Another thing, why would the SNP score for these be 1
    chrI 1599 T Y 1 1 37 10 ,,.c,.,c.^F. \RcL^dMHfe
    chrI 3658 A M 1 1 37 23 .$....,........,cc,c,,,^F. ^fdccdeefeddff^GGbGY\L`

    especially for the first one, it looks good to me...but what do I know :-)

    I am at the point of comparing the alignment of individual reads between both to see what's going on, but before doing this I was wondering if any of you had any clues.

    BTW, I'm also trying the GATK IndelRealigner to see how it changes the SNP calling for bwa.
    Last edited by lletourn; 04-29-2010, 06:25 AM.
  • lletourn
    Member
    • Oct 2009
    • 63

    #2
    If somebody want's to see the pipelined script.
    The name of the files have been changed to preserve the identity of those files...ahem... ;-)

    $BWA aln -t 8 <GENOME> <FASTQ> > bwa.sai
    $BWA samse <GENOME> bwa.sai <FASTQ> > sam.sam

    $SAMTOOLS_HOME/samtools view -bS -q20 -o bam.bam sam.sam
    $SAMTOOLS_HOME/samtools sort bam.bam bam.sorted
    $SAMTOOLS_HOME/samtools index bam.sorted.bam

    mkdir -p wig
    java -Xmx15G -Xms15G -jar ~/fp4/FindPeaks.jar -input bam.sorted.bam -aligner sam -output wig/ -dist_type 3 -name bam

    $SAMTOOLS_HOME/samtools pileup -vcf <GENOME> bam.sorted.bam > bam.sorted.pileup
    $SAMTOOLS_HOME/samtools.pl varFilter -Q 30 bam.sorted.pileup > bam.sorted.coveraged.pileup

    Comment

    • lletourn
      Member
      • Oct 2009
      • 63

      #3
      For bowtie it's the same except for the aligner call:
      $BOWTIE -t --chunkmbs 1024 -p 14 -k 1 --best -y -S --solexa1.3-quals -S <GENOME> <FASTQ> > sam.sam

      Comment

      • lh3
        Senior Member
        • Feb 2008
        • 686

        #4
        bwa does not use base quality, but samtools cares. you should convert the quality.

        Comment

        • Lien
          Member
          • Dec 2009
          • 47

          #5
          Hi Ih3.

          I'm having the same problem.

          (I have aligned paired end reads from Illumina GAIIx with Bowtie and BWA. I called SNVs with VarScan, but I would like to call SNVs also with samtools varfilter so that I can compare the resulting SNVs).

          What is the best way to convert these qualities for samtools? Is there maybe some software already available?

          Thanks!

          Comment

          • lletourn
            Member
            • Oct 2009
            • 63

            #6
            I used embosses seqret. lh3 was right, it solved all my problems.
            $EMBOSS_HOME/bin/seqret fastq-illumina::q64.fastq fastq::q33.fastq

            I use the CSHL fastx tool first to trim. It reads phred-64 by default, but you can run it on phred33 data by using the (hidden) -Q33 parameter

            Comment

            • Lien
              Member
              • Dec 2009
              • 47

              #7
              Hi,

              I'm new to these analyses, so I don't really understand it yet.
              Do you use this seqret on the original fastq files? And only when aligning with BWA, or also when aligning with Bowtie?
              Do you trim before or after the seqret, and what do you trim for? Only when you use barcodes, or also for noise?

              Thanks a lot!

              Comment

              • Lien
                Member
                • Dec 2009
                • 47

                #8
                Hi

                I'm new to these analyses, so I don't understand everything yet.

                Do you use the seqret on original fastq files? Do you only have to convert the qualities when aligning with BWA, or also when aligning with Bowtie?
                Do you trim before or after seqret, and what do you trim for? (is it for barcodes, or also for noise?)

                Thanks a lot!

                Comment

                • seq_GA
                  Senior Member
                  • Feb 2009
                  • 124

                  #9
                  You may use MAQ to convert illumina fastq into sanger fastq and proceed with BWA. While using Bowtie, you have to specify --solexa1.3-quals for pipeline 1.3+ reads.

                  Comment

                  • lletourn
                    Member
                    • Oct 2009
                    • 63

                    #10
                    The thing with maq is that you need a patch to make it work. It doesn't convert solexa >1.3 only <1.3.
                    The patch is here:


                    As seq_GA said bowtie can use phred33 or phred64 formats.

                    BWA doesn't use quality to align so you don't have to convert the qualities, but if you use samtools afterwards, the qualities in your sam file *must* be phred33.

                    If you don't it might call snps where bases are of low quality in phred 64 but look of high quality when you interpret them as phred33.


                    Instead of patching and compiling I prefer to use emboss. Hence, my suggestion.

                    Comment

                    • Lien
                      Member
                      • Dec 2009
                      • 47

                      #11
                      Hi,

                      I used the Emboss seqret and I think it worked alright.
                      Thanks for the help!

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        New Genomics Tools and Methods Shared at AGBT 2025
                        by seqadmin


                        This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                        The Headliner
                        The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                        03-03-2025, 01:39 PM
                      • seqadmin
                        Investigating the Gut Microbiome Through Diet and Spatial Biology
                        by seqadmin




                        The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
                        02-24-2025, 06:31 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, Yesterday, 05:03 AM
                      0 responses
                      16 views
                      0 reactions
                      Last Post seqadmin  
                      Started by seqadmin, 03-19-2025, 07:27 AM
                      0 responses
                      16 views
                      0 reactions
                      Last Post seqadmin  
                      Started by seqadmin, 03-18-2025, 12:50 PM
                      0 responses
                      17 views
                      0 reactions
                      Last Post seqadmin  
                      Started by seqadmin, 03-03-2025, 01:15 PM
                      0 responses
                      185 views
                      0 reactions
                      Last Post seqadmin  
                      Working...