Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Questions on BWA

    My data was generated in solexa 1.3,pair-end,100bp.When I use BWT as my aligner,I should trim <INT> reads towards the 3',because as closer to 3',the read qualty is getting worse. My question is:HOW TO SET the <INT>.CAN somebody tell me if you are an expert or have encouter such problem.

    YOU can also e-mail me:[email protected]

  • #2
    Originally posted by zeam View Post
    My data was generated in solexa 1.3,pair-end,100bp.When I use BWT as my aligner,I should trim <INT> reads towards the 3',because as closer to 3',the read qualty is getting worse. My question is:HOW TO SET the <INT>.CAN somebody tell me if you are an expert or have encouter such problem.

    YOU can also e-mail me:[email protected]
    bwa aln -q 20 is a reasonable filter.

    d

    Comment


    • #3
      Hi,thanks for your attention!
      Your answer is nice to me~But if you use -q option in pair-end reads alignment and reads quality of your two pair-end resds files is not equal file-to-file,then some of your pair-end reads base number will be different.Does that matter much to us.
      Additionally,the red font words below are from FAQ documents on BWA website.I want your recommendations on this issue.
      What is the tolerance of sequencing errors?
      Bwa-short is mainly designed for sequencing error rates below 2%. Although users can ask it to tolerate more errors by tuning command-line options, its performance is quickly degraded. Note that for Illumina reads, bwa-short may optionally trim low-quality bases from the 3'-end before alignment and thus is able to align more reads with high error rate in the tail, which is typical to Illumina data.

      Last edited by zeam; 10-13-2010, 04:12 PM.

      Comment


      • #4
        Originally posted by zeam View Post
        Hi,thanks for your attention!
        Your answer is nice to me~But if you use -q option in pair-end reads alignment and reads quality of your two pair-end resds files is not equal file-to-file,then some of your pair-end reads base number will be different.Does that matter much to us.
        trimming doesn't reduce the number of reads in a file, also it may help in aligning more reads. If you perform your alignment without trimming 100bp reads, you'll likely have "monopairs", i.e. PE in which one pair can't be aligned.

        Originally posted by zeam View Post
        Additionally,the red font words below are from FAQ documents on BWA website.I want your recommendations on this issue.
        What is the tolerance of sequencing errors?
        Bwa-short is mainly designed for sequencing error rates below 2%. Although users can ask it to tolerate more errors by tuning command-line options, its performance is quickly degraded. Note that for Illumina reads, bwa-short may optionally trim low-quality bases from the 3'-end before alignment and thus is able to align more reads with high error rate in the tail, which is typical to Illumina data.

        In my experience, trimming raises more reads than increasing alignment tolerance and it's probably more precise. Note that bwa does not "hard trim" your reads (i.e. at a fixed position), if you have a 100bp reads that is good from the 5' to the 3' it won't be trimmed.
        BTW, chemistry and flowcell version do matter: I've seen that latest versions do not suffer of 3' degradation, we usually have qualities higher than 20 up to the 76th position. Which versions are you using?

        d

        Comment


        • #5
          And BTW, don't forget to convert in Sanger scale your reads, otherwise trimming won't work as expected.

          d

          Comment


          • #6
            Originally posted by dawe View Post
            And BTW, don't forget to convert in Sanger scale your reads, otherwise trimming won't work as expected.

            d
            Thanks very much!I really mean it!
            I'm a freshman to the bioinformatics.
            (1)In your reply,you mentioned "don't forget to convert in Sanger scale your reads".SO in terms of my understanding if I want to use the option -q 15 for Sanger FASTQ,then I will use -q 46 to output the equal result.AM I RIGHT? Can you explain it to me explicitly.

            (2)The red font words are your reply for another person,how to use the patch you mentioned,how to set '-I' option?
            As pointed by lh3 you should always have your scores in Sanger format and then you may apply a filter to 15-20 (which corresponds to a ~0.03-0.01 probability).
            BTW, if you have your fastq in Illumina (Pipieline 1.3+) you may try this patch I've written. It enables a '-I' option to bwa aln so that you can use Illumina reads and trim (and output) as they were in Sanger scale.

            (3)Dose it matter that I put the data files in different directories or should I copy them to one directory? I have this query because I saw a run_bwa.sh from cornell university workshop.
            The run_bwa.sh file shows as follows:

            #set path for nextgen software
            export PATH=$PATH:/home/gfs08/qs24/session2/bwa-0.5.7:/opt/nextgen/bin
            export PERL5LIB=/opt/nextgen/lib/perl5


            # delete all data from previous session, create a new working directory on local drive /tmp
            rm -rf /tmp/$USER
            mkdir /tmp/$USER
            cd /tmp/$USER


            #copy data files to the working directory
            cp $HOME/session2/chr21.fa /tmp/$USER/
            cp $HOME/session2/na18507.chr21.fastq /tmp/$USER/

            #run software:
            #1) index the reference database with bwa index tool. For each reference, you only need to do it once. Next time you align to the same reference, you can simply copy the indexed database
            bwa index -p chr21.fa -a bwtsw chr21.fa
            #2) align reads using the bwa alignment tool
            bwa aln chr21.fa na18507.chr21.fastq > na18507.chr21.sai
            #3) generate SAM output
            bwa samse -n 3 chr21.fa na18507.chr21.sai na18507.chr21.fastq > na18507.chr21.sam

            #4) convert to BAM.
            #samtools import function requires a file with a list of chromosome
            #if it is supplied with a non-exist file, in this case in.reflist, it will retrieve the information from the SAM file
            samtools import in.reflist na18507.chr21.sam na18507.chr21.bam

            #5) sort the BAM file
            samtools sort na18507.chr21.bam na18507.chr21.sorted

            #6) index the sorted BAM file
            samtools index na18507.chr21.sorted.bam

            #7) build a pileup file with variant calls
            samtools pileup -vcf chr21.fa na18507.chr21.sorted.bam > raw.pileup

            #8) filter variant calls using default filters
            samtools.pl varFilter raw.pileup | awk '$6>=20' > na18507.chr21.SNP.pileup

            #move result files from the working directory to my home directory
            cp na18507.chr21.sam $HOME/session2/
            cp na18507.chr21.sorted.* $HOME/session2/
            cp na18507.chr21.SNP.pileup $HOME/session2/

            #clean up the working directory
            cd $HOME
            rm -rf /tmp/$USER
            Last edited by zeam; 10-14-2010, 07:31 AM. Reason: Add a query

            Comment


            • #7
              Originally posted by zeam View Post
              Can you explain it to me explicitly.
              Mmm... Take a look at this thread

              Originally posted by zeam View Post
              (2)The red font words are your reply for another person,how to use the patch you mentioned,how to set '-I' option?
              Just download it and use the patch command...

              Code:
              cd bwa-source-directory
              patch -p1 < patch.file
              make
              But take a look at

              Code:
              man patch

              Originally posted by zeam View Post
              (3)Dose it matter that I put the data files in different directories or should I copy them to one directory?
              as long as you specify a path to an existing file, everything will work fine.

              HTH

              d
              Last edited by dawe; 10-14-2010, 09:26 AM. Reason: Typo

              Comment


              • #8
                Hello,

                Does anyone know what happen to the "hard trim" option (-B) in BWA?

                -B INT Length of barcode starting from the 5’-end. When INT is positive, the barcode of each read will be trimmed before mapping and will be written at the BC SAM tag. For paired-end reads, the barcode from both ends are concatenated. [0]

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Essential Discoveries and Tools in Epitranscriptomics
                  by seqadmin




                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                  04-22-2024, 07:01 AM
                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Today, 08:47 AM
                0 responses
                12 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                60 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                59 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                54 views
                0 likes
                Last Post seqadmin  
                Working...
                X