Announcement

Collapse
No announcement yet.

Default Change in CASAVA / BCL->FASTQ

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by skruglyak View Post
    We are planning a minor release of CASAVA in October that is primarily intended to handle an improvement to the number of supported index sequences. In the same release, we plan to change the default behavior and omit reads that do not pass filter from the FASTQ files. In general, we do not recommend the use of non-PF reads. Users that want to retain the non-PF reads will be able to do so by adding the following parameter to the configureBcltoFastq.pl:

    --with-failed-reads

    A read is classified as non-PF when more than one cycle in the first 25 cycles has a poor ratio (<0.6) of the brightest intensity to the sum of the brightest and second brightest.
    Our variant calling software ignores non-PF reads, but there are many alternate methods that use all data, disregarding the non-PF flag. The inclusion of non-PF reads increases time to align, increases the data footprint, increases the measured error rate, and can lead to variant calling errors. As a result we have decided to exclude such reads as the default behavior. As a consequence of being excluded from the FASTQ files, the reads will also be excluded from all downstream processing and output including BAM files – archival and standard.

    Please let me know if you have questions or concerns.

    Thank you,
    Semyon
    Semyon,

    I really appreciate that Illumina has been so responsive to customer feedback with regard to refinement of the CASAVA pipeline and I really hate to keep coming up with more things to tweak/change, but...

    I just ran my first data set through the new 1.8.2 pipeline and truly appreciate the PF only default and --fastq-cluster-count 0 options, however I noted what I consider a bug in some of the summary files produced by CASAVA. Some summary files (e.g. Flowcell_demux_summary.xml) report the number of PF clusters/bases for for both raw and PF counts. Other files (e.g. BustardSummary.xml) appear to correctly report raw and PF.

    Thanks again.

    Comment


    • #17
      A single bam file as alignment output

      Dear Semyon,

      Is there a way to get alignments in a single file per sample in bam format as alignment output?

      As far as I know we need an additional "configurebuild --targets sort bam " step to achieve it right now.

      Thanks

      Comment


      • #18
        Originally posted by kmcarr View Post
        Semyon,

        I really appreciate that Illumina has been so responsive to customer feedback with regard to refinement of the CASAVA pipeline and I really hate to keep coming up with more things to tweak/change, but...

        I just ran my first data set through the new 1.8.2 pipeline and truly appreciate the PF only default and --fastq-cluster-count 0 options, however I noted what I consider a bug in some of the summary files produced by CASAVA. Some summary files (e.g. Flowcell_demux_summary.xml) report the number of PF clusters/bases for for both raw and PF counts. Other files (e.g. BustardSummary.xml) appear to correctly report raw and PF.

        Thanks again.
        Sorry for the late reply. I somehow missed notification of the post. You are correct. The stats are computed after the FASTQ file is made, so this leads to the issue that you observe. Have you tried using SAV (sequence analysis viewer)? It reports a lot of valuable statistics created by RTA, including %PF.

        Thanks for your feedback.

        Semyon

        Comment


        • #19
          Originally posted by selen View Post
          Dear Semyon,

          Is there a way to get alignments in a single file per sample in bam format as alignment output?

          As far as I know we need an additional "configurebuild --targets sort bam " step to achieve it right now.

          Thanks
          Hi selen,

          You are correct to use configureBuild to generate the single BAM file. I spoke with a member of my team and he provided the following example.

          Thanks,
          Semyon

          $CASAVA_PATH/bin/configureBuild.pl \
          --outDir ./outdir \
          --inSampleDir /path/to/eland_alignment/Sample_exampleSample \ --samtoolsRefFile genome.fa \ --targets sort bam \ --sortKeepAllReads

          Comment


          • #20
            Originally posted by skruglyak View Post
            Sorry for the late reply. I somehow missed notification of the post. You are correct. The stats are computed after the FASTQ file is made, so this leads to the issue that you observe. Have you tried using SAV (sequence analysis viewer)? It reports a lot of valuable statistics created by RTA, including %PF.

            Thanks for your feedback.

            Semyon
            Semyon,

            Yes, it's true SAV presents some of that data, but I need the data in a format that I can parse to generate reports. This means the .xml files produced by CASAVA. The files produced by CASAVA really should properly report the number of Raw and PF clusters generated regardless of what is output to the FASTQ files.

            Comment


            • #21
              Has anyone figured out how to get the %PF per lane/barcode in the Demultiplex_Stats.htm file? Previous to version 1.8.2 this listed the %PF for each sample, now it just list 100% for all samples.

              From reading above it seems like you might need to use:

              --with-failed-reads

              Then filter the out the PF failing reads separately to get the stats to show.

              My argument would be if you have the field in the QC file, you might as well show the result as listing 100% across the board is not very useful.

              Comment

              Working...
              X