Seqanswers Leaderboard Ad

**dawe** · 10-13-2010, 11:46 AM

Originally posted by zeam View Post

My data was generated in solexa 1.3,pair-end,100bp.When I use BWT as my aligner,I should trim <INT> reads towards the 3',because as closer to 3',the read qualty is getting worse. My question is:HOW TO SET the <INT>.CAN somebody tell me if you are an expert or have encouter such problem.

YOU can also e-mail me:[email protected]

bwa aln -q 20 is a reasonable filter.

d

**zeam** · 10-13-2010, 04:09 PM

Hi,thanks for your attention!
Your answer is nice to me~But if you use -q option in pair-end reads alignment and reads quality of your two pair-end resds files is not equal file-to-file,then some of your pair-end reads base number will be different.Does that matter much to us.
Additionally,the red font words below are from FAQ documents on BWA website.I want your recommendations on this issue.
What is the tolerance of sequencing errors?
Bwa-short is mainly designed for sequencing error rates below 2%. Although users can ask it to tolerate more errors by tuning command-line options, its performance is quickly degraded. Note that for Illumina reads, bwa-short may optionally trim low-quality bases from the 3'-end before alignment and thus is able to align more reads with high error rate in the tail, which is typical to Illumina data.

**dawe** · 10-13-2010, 09:05 PM

Originally posted by zeam View Post

Hi,thanks for your attention!
Your answer is nice to me~But if you use -q option in pair-end reads alignment and reads quality of your two pair-end resds files is not equal file-to-file,then some of your pair-end reads base number will be different.Does that matter much to us.

trimming doesn't reduce the number of reads in a file, also it may help in aligning more reads. If you perform your alignment without trimming 100bp reads, you'll likely have "monopairs", i.e. PE in which one pair can't be aligned.

Originally posted by zeam View Post

Additionally,the red font words below are from FAQ documents on BWA website.I want your recommendations on this issue.
What is the tolerance of sequencing errors?
Bwa-short is mainly designed for sequencing error rates below 2%. Although users can ask it to tolerate more errors by tuning command-line options, its performance is quickly degraded. Note that for Illumina reads, bwa-short may optionally trim low-quality bases from the 3'-end before alignment and thus is able to align more reads with high error rate in the tail, which is typical to Illumina data.

In my experience, trimming raises more reads than increasing alignment tolerance and it's probably more precise. Note that bwa does not "hard trim" your reads (i.e. at a fixed position), if you have a 100bp reads that is good from the 5' to the 3' it won't be trimmed.
BTW, chemistry and flowcell version do matter: I've seen that latest versions do not suffer of 3' degradation, we usually have qualities higher than 20 up to the 76th position. Which versions are you using?

d

**dawe** · 10-14-2010, 12:12 AM

And BTW, don't forget to convert in Sanger scale your reads, otherwise trimming won't work as expected.

d

**zeam** · 10-14-2010, 07:08 AM

Originally posted by dawe View Post

And BTW, don't forget to convert in Sanger scale your reads, otherwise trimming won't work as expected.

d

Thanks very much!I really mean it!
I'm a freshman to the bioinformatics.
(1)In your reply,you mentioned "don't forget to convert in Sanger scale your reads".SO in terms of my understanding if I want to use the option -q 15 for Sanger FASTQ,then I will use -q 46 to output the equal result.AM I RIGHT? Can you explain it to me explicitly.

(2)The red font words are your reply for another person,how to use the patch you mentioned,how to set '-I' option?
As pointed by lh3 you should always have your scores in Sanger format and then you may apply a filter to 15-20 (which corresponds to a ~0.03-0.01 probability).
BTW, if you have your fastq in Illumina (Pipieline 1.3+) you may try this patch I've written. It enables a '-I' option to bwa aln so that you can use Illumina reads and trim (and output) as they were in Sanger scale.
(3)Dose it matter that I put the data files in different directories or should I copy them to one directory? I have this query because I saw a run_bwa.sh from cornell university workshop.
The run_bwa.sh file shows as follows:

#set path for nextgen software
export PATH=$PATH:/home/gfs08/qs24/session2/bwa-0.5.7:/opt/nextgen/bin
export PERL5LIB=/opt/nextgen/lib/perl5

# delete all data from previous session, create a new working directory on local drive /tmp
rm -rf /tmp/$USER
mkdir /tmp/$USER
cd /tmp/$USER

#copy data files to the working directory
cp $HOME/session2/chr21.fa /tmp/$USER/
cp $HOME/session2/na18507.chr21.fastq /tmp/$USER/

#run software:
#1) index the reference database with bwa index tool. For each reference, you only need to do it once. Next time you align to the same reference, you can simply copy the indexed database
bwa index -p chr21.fa -a bwtsw chr21.fa
#2) align reads using the bwa alignment tool
bwa aln chr21.fa na18507.chr21.fastq > na18507.chr21.sai
#3) generate SAM output
bwa samse -n 3 chr21.fa na18507.chr21.sai na18507.chr21.fastq > na18507.chr21.sam

#4) convert to BAM.
#samtools import function requires a file with a list of chromosome
#if it is supplied with a non-exist file, in this case in.reflist, it will retrieve the information from the SAM file
samtools import in.reflist na18507.chr21.sam na18507.chr21.bam

#5) sort the BAM file
samtools sort na18507.chr21.bam na18507.chr21.sorted

#6) index the sorted BAM file
samtools index na18507.chr21.sorted.bam

#7) build a pileup file with variant calls
samtools pileup -vcf chr21.fa na18507.chr21.sorted.bam > raw.pileup

#8) filter variant calls using default filters
samtools.pl varFilter raw.pileup | awk '$6>=20' > na18507.chr21.SNP.pileup

#move result files from the working directory to my home directory
cp na18507.chr21.sam $HOME/session2/
cp na18507.chr21.sorted.* $HOME/session2/
cp na18507.chr21.SNP.pileup $HOME/session2/

#clean up the working directory
cd $HOME
rm -rf /tmp/$USER

**dawe** · 10-14-2010, 09:20 AM

Originally posted by zeam View Post

Can you explain it to me explicitly.

Mmm... Take a look at this thread

Originally posted by zeam View Post

(2)The red font words are your reply for another person,how to use the patch you mentioned,how to set '-I' option?

Just download it and use the patch command...

Code:

cd bwa-source-directory
patch -p1 < patch.file
make

But take a look at

Code:

man patch

Originally posted by zeam View Post

(3)Dose it matter that I put the data files in different directories or should I copy them to one directory?

as long as you specify a path to an existing file, everything will work fine.

HTH

d

**jsp** · 04-11-2011, 10:03 AM

Hello,

Does anyone know what happen to the "hard trim" option (-B) in BWA?

bwa.1

http://bio-bwa.sourceforge.net/bwa.shtml#12

-B INT Length of barcode starting from the 5’-end. When INT is positive, the barcode of each read will be trimmed before mapping and will be written at the BC SAM tag. For paired-end reads, the barcode from both ends are concatenated. [0]

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Today, 08:47 AM	0 responses 12 views 0 likes	Last Post by seqadmin Today, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

Questions on BWA

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News