Seqanswers Leaderboard Ad

**kmcarr** · 10-24-2011, 05:39 AM

Originally posted by skruglyak View Post

We are planning a minor release of CASAVA in October that is primarily intended to handle an improvement to the number of supported index sequences. In the same release, we plan to change the default behavior and omit reads that do not pass filter from the FASTQ files. In general, we do not recommend the use of non-PF reads. Users that want to retain the non-PF reads will be able to do so by adding the following parameter to the configureBcltoFastq.pl:

--with-failed-reads

A read is classified as non-PF when more than one cycle in the first 25 cycles has a poor ratio (<0.6) of the brightest intensity to the sum of the brightest and second brightest.
Our variant calling software ignores non-PF reads, but there are many alternate methods that use all data, disregarding the non-PF flag. The inclusion of non-PF reads increases time to align, increases the data footprint, increases the measured error rate, and can lead to variant calling errors. As a result we have decided to exclude such reads as the default behavior. As a consequence of being excluded from the FASTQ files, the reads will also be excluded from all downstream processing and output including BAM files – archival and standard.

Please let me know if you have questions or concerns.

Thank you,
Semyon

Semyon,

I really appreciate that Illumina has been so responsive to customer feedback with regard to refinement of the CASAVA pipeline and I really hate to keep coming up with more things to tweak/change, but...

I just ran my first data set through the new 1.8.2 pipeline and truly appreciate the PF only default and --fastq-cluster-count 0 options, however I noted what I consider a bug in some of the summary files produced by CASAVA. Some summary files (e.g. Flowcell_demux_summary.xml) report the number of PF clusters/bases for for both raw and PF counts. Other files (e.g. BustardSummary.xml) appear to correctly report raw and PF.

Thanks again.

**selen** · 11-14-2011, 08:44 AM

A single bam file as alignment output

Dear Semyon,

Is there a way to get alignments in a single file per sample in bam format as alignment output?

As far as I know we need an additional "configurebuild --targets sort bam " step to achieve it right now.

Thanks

**skruglyak** · 11-14-2011, 10:41 AM

Originally posted by kmcarr View Post

Semyon,

I really appreciate that Illumina has been so responsive to customer feedback with regard to refinement of the CASAVA pipeline and I really hate to keep coming up with more things to tweak/change, but...

I just ran my first data set through the new 1.8.2 pipeline and truly appreciate the PF only default and --fastq-cluster-count 0 options, however I noted what I consider a bug in some of the summary files produced by CASAVA. Some summary files (e.g. Flowcell_demux_summary.xml) report the number of PF clusters/bases for for both raw and PF counts. Other files (e.g. BustardSummary.xml) appear to correctly report raw and PF.

Thanks again.

Sorry for the late reply. I somehow missed notification of the post. You are correct. The stats are computed after the FASTQ file is made, so this leads to the issue that you observe. Have you tried using SAV (sequence analysis viewer)? It reports a lot of valuable statistics created by RTA, including %PF.

Thanks for your feedback.

Semyon

**skruglyak** · 11-14-2011, 10:45 AM

Originally posted by selen View Post

Dear Semyon,

Is there a way to get alignments in a single file per sample in bam format as alignment output?

As far as I know we need an additional "configurebuild --targets sort bam " step to achieve it right now.

Thanks

Hi selen,

You are correct to use configureBuild to generate the single BAM file. I spoke with a member of my team and he provided the following example.

Thanks,
Semyon

$CASAVA_PATH/bin/configureBuild.pl \
--outDir ./outdir \
--inSampleDir /path/to/eland_alignment/Sample_exampleSample \ --samtoolsRefFile genome.fa \ --targets sort bam \ --sortKeepAllReads

**kmcarr** · 11-15-2011, 11:06 AM

Originally posted by skruglyak View Post

Sorry for the late reply. I somehow missed notification of the post. You are correct. The stats are computed after the FASTQ file is made, so this leads to the issue that you observe. Have you tried using SAV (sequence analysis viewer)? It reports a lot of valuable statistics created by RTA, including %PF.

Thanks for your feedback.

Semyon

Semyon,

Yes, it's true SAV presents some of that data, but I need the data in a format that I can parse to generate reports. This means the .xml files produced by CASAVA. The files produced by CASAVA really should properly report the number of Raw and PF clusters generated regardless of what is output to the FASTQ files.

**Jon_Keats** · 11-19-2012, 01:51 PM

Has anyone figured out how to get the %PF per lane/barcode in the Demultiplex_Stats.htm file? Previous to version 1.8.2 this listed the %PF for each sample, now it just list 100% for all samples.

From reading above it seems like you might need to use:

--with-failed-reads

Then filter the out the PF failing reads separately to get the stats to show.

My argument would be if you have the field in the QC file, you might as well show the result as listing 100% across the board is not very useful.

Topics	Statistics	Last Post
New Method for DNA Sequence Amplification by seqadmin Started by seqadmin, Today, 08:18 AM	0 responses 10 views 0 likes	Last Post by seqadmin Today, 08:18 AM
New Tools Enhance Single-Molecule DNA Analysis with Minimal Samples by seqadmin Started by seqadmin, Today, 08:04 AM	0 responses 12 views 0 likes	Last Post by seqadmin Today, 08:04 AM
SIX2 Protein Identified as a Key Player in Prostate Cancer Treatment Resistance by seqadmin Started by seqadmin, 06-03-2024, 06:55 AM	0 responses 13 views 0 likes	Last Post by seqadmin 06-03-2024, 06:55 AM
Genetic Mosaicism More Prevalent Than Previously Thought by seqadmin Started by seqadmin, 05-30-2024, 03:16 PM	0 responses 27 views 0 likes	Last Post by seqadmin 05-30-2024, 03:16 PM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News