Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bowtie vs bwa + samtools == confusion

    I have a dataset that was generated using the GAIIx.

    I took the fastq, ran it through bowtie with --best and -k1, transformed it in a sam format, pileuped it, varFiltered it with -Q30 and obtained 9309 SNPs.

    I usually just start with bwa, but in this case I wanted to look at a coverage graph quickly that's why I used bowtie first.

    In any case, I used default params for bwa, I used -q20 when creating the bam file, ran pileup, ran varFIlter -Q30. I got 56k SNPs.

    After falling from my chair and getting coffee, I started looking at the data.

    One thing that struck me, was that the samtools spec says that base quality is phred33, but when I check the pileup files from my bwa run, the characters are in the phred64 range. Can this affect the consensus/SNP calling from pileup?

    Should I convert the original fastq files from phred64 to phred33 (bwa doesn't use qualities so it doesn't matter for it)?

    Another thing, why would the SNP score for these be 1
    chrI 1599 T Y 1 1 37 10 ,,.c,.,c.^F. \RcL^dMHfe
    chrI 3658 A M 1 1 37 23 .$....,........,cc,c,,,^F. ^fdccdeefeddff^GGbGY\L`

    especially for the first one, it looks good to me...but what do I know :-)

    I am at the point of comparing the alignment of individual reads between both to see what's going on, but before doing this I was wondering if any of you had any clues.

    BTW, I'm also trying the GATK IndelRealigner to see how it changes the SNP calling for bwa.
    Last edited by lletourn; 04-29-2010, 06:25 AM.

  • #2
    If somebody want's to see the pipelined script.
    The name of the files have been changed to preserve the identity of those files...ahem... ;-)

    $BWA aln -t 8 <GENOME> <FASTQ> > bwa.sai
    $BWA samse <GENOME> bwa.sai <FASTQ> > sam.sam

    $SAMTOOLS_HOME/samtools view -bS -q20 -o bam.bam sam.sam
    $SAMTOOLS_HOME/samtools sort bam.bam bam.sorted
    $SAMTOOLS_HOME/samtools index bam.sorted.bam

    mkdir -p wig
    java -Xmx15G -Xms15G -jar ~/fp4/FindPeaks.jar -input bam.sorted.bam -aligner sam -output wig/ -dist_type 3 -name bam

    $SAMTOOLS_HOME/samtools pileup -vcf <GENOME> bam.sorted.bam > bam.sorted.pileup
    $SAMTOOLS_HOME/samtools.pl varFilter -Q 30 bam.sorted.pileup > bam.sorted.coveraged.pileup

    Comment


    • #3
      For bowtie it's the same except for the aligner call:
      $BOWTIE -t --chunkmbs 1024 -p 14 -k 1 --best -y -S --solexa1.3-quals -S <GENOME> <FASTQ> > sam.sam

      Comment


      • #4
        bwa does not use base quality, but samtools cares. you should convert the quality.

        Comment


        • #5
          Hi Ih3.

          I'm having the same problem.

          (I have aligned paired end reads from Illumina GAIIx with Bowtie and BWA. I called SNVs with VarScan, but I would like to call SNVs also with samtools varfilter so that I can compare the resulting SNVs).

          What is the best way to convert these qualities for samtools? Is there maybe some software already available?

          Thanks!

          Comment


          • #6
            I used embosses seqret. lh3 was right, it solved all my problems.
            $EMBOSS_HOME/bin/seqret fastq-illumina::q64.fastq fastq::q33.fastq

            I use the CSHL fastx tool first to trim. It reads phred-64 by default, but you can run it on phred33 data by using the (hidden) -Q33 parameter

            Comment


            • #7
              Hi,

              I'm new to these analyses, so I don't really understand it yet.
              Do you use this seqret on the original fastq files? And only when aligning with BWA, or also when aligning with Bowtie?
              Do you trim before or after the seqret, and what do you trim for? Only when you use barcodes, or also for noise?

              Thanks a lot!

              Comment


              • #8
                Hi

                I'm new to these analyses, so I don't understand everything yet.

                Do you use the seqret on original fastq files? Do you only have to convert the qualities when aligning with BWA, or also when aligning with Bowtie?
                Do you trim before or after seqret, and what do you trim for? (is it for barcodes, or also for noise?)

                Thanks a lot!

                Comment


                • #9
                  You may use MAQ to convert illumina fastq into sanger fastq and proceed with BWA. While using Bowtie, you have to specify --solexa1.3-quals for pipeline 1.3+ reads.

                  Comment


                  • #10
                    The thing with maq is that you need a patch to make it work. It doesn't convert solexa >1.3 only <1.3.
                    The patch is here:


                    As seq_GA said bowtie can use phred33 or phred64 formats.

                    BWA doesn't use quality to align so you don't have to convert the qualities, but if you use samtools afterwards, the qualities in your sam file *must* be phred33.

                    If you don't it might call snps where bases are of low quality in phred 64 but look of high quality when you interpret them as phred33.


                    Instead of patching and compiling I prefer to use emboss. Hence, my suggestion.

                    Comment


                    • #11
                      Hi,

                      I used the Emboss seqret and I think it worked alright.
                      Thanks for the help!

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Understanding Genetic Influence on Infectious Disease
                        by seqadmin




                        During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

                        Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
                        09-09-2024, 10:59 AM
                      • seqadmin
                        Addressing Off-Target Effects in CRISPR Technologies
                        by seqadmin






                        The first FDA-approved CRISPR-based therapy marked the transition of therapeutic gene editing from a dream to reality1. CRISPR technologies have streamlined gene editing, and CRISPR screens have become an important approach for identifying genes involved in disease processes2. This technique introduces targeted mutations across numerous genes, enabling large-scale identification of gene functions, interactions, and pathways3. Identifying the full range...
                        08-27-2024, 04:44 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, Today, 01:02 PM
                      0 responses
                      8 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, Yesterday, 06:39 AM
                      0 responses
                      10 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 09-11-2024, 02:44 PM
                      0 responses
                      13 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 09-06-2024, 08:02 AM
                      0 responses
                      148 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X