Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • error in running casava

    I am working with data from MiSeq bcl files, I want to convert it to fastq files using the bcl2fastq.

    below is sample of my data

    [Header]
    IEMFileVersion 4
    Investigator Name XXXXX
    Experiment Name XXXX_plate01_1pool
    Date xx/xx/xxxx
    Workflow GenerateFASTQ
    Application FASTQ Only
    Assay TruSeq LT
    Description Test
    Chemistry Default

    Reads
    250
    250

    [Settings]
    ReverseComplement 0
    Adapter TTTTTTTTTTTTTTT
    AdapterRead2 AAAAAAAAAAAA

    [Data]
    FCID Lane SampleID Sample_Ref index Description Control Recipe Operator Sample_Project
    073388Sm XXXXXXXX
    073389Sm XXXXXXXX
    073390Sm XXXXXXXX
    073391Sm XXXXXXXX
    073392Sm XXXXXXXX
    073393Sm XXXXXXXX
    073394Sm XXXXXXXX
    cont1 XXXXXXXX


    bash-4.1$ /home/Downloads/CASAVA/bin/configureBclToFastq.pl --output-dir /home/Projects/Data/unaligned --input-dir /home/Projects/Data/Intensities/BaseCalls --fastq-cluster-count 0 --sample-sheet /home/Projects/Data/Intensities/BaseCalls/SampleSheet.csv --tiles s_1_* --force --mismatches 1 --ignore-missing-bcl --ignore-missing-stats --use-bases-mask n
    could not find ParserDetails.ini in /home/Downloads/localperl/lib/site_perl/5.20.2/XML/SAX
    [2015-02-24 17:18:02] [configureBclToFastq.pl] INFO: Basecalling software: RTA
    [2015-02-24 17:18:02] [configureBclToFastq.pl] INFO: version: 1.18 (build 54)
    [2015-02-24 17:18:02] [configureBclToFastq.pl] WARNING: Couldn't find run info in /home/Projects/Data/Intensities/BaseCalls/../../../RunInfo.xml
    [2015-02-24 17:18:02] [configureBclToFastq.pl] WARNING: Couldn't find RunInfo.xml for /home/Projects/Data/Intensities/BaseCalls
    [2015-02-24 17:18:02] [configureBclToFastq.pl] INFO: Original use-bases mask: n
    [2015-02-24 17:18:02] [configureBclToFastq.pl] INFO: Guessed use-bases mask: n
    ERROR: Wrong number of fields in sample sheet (expected: 10, got 8: IEMFileVersion,4,,,,,,)
    at /home/Downloads/CASAVA/lib/bcl2fastq-1.8.4/perl/Casava/Demultiplex.pm line 531

    I am running casava for the first time so any help will be appreciated

    Thank you

  • #2
    You can use a simplified samplesheet like the example here: http://seqanswers.com/forums/showpos...4&postcount=14

    See the entire thread for additional information.

    Comment


    • #3
      error in running casava

      Hi

      I have seen this thread, it is still not clear to me, by simplified are you saying to remove the run information in top and just keep the values from column FCID and onwards.

      Also in FCID column I have ids and then cont1, cont2 and so on would that be the reason for inconsistent flowcell ID.

      thank you

      Comment


      • #4
        You can manually create a Samplesheet.csv file (you can name the file anything, it has to be in comma separated value (CSV) format)) that exactly looks like the example I linked above.

        That example contains the minimum information you need to convert BCL files to fastq when de-multiplexing your samples. You will need to grab the last part of the flowcell ID from the folder name (e.g. 000000000-ADB2U).
        Last edited by GenoMax; 02-25-2015, 04:46 AM.

        Comment


        • #5
          my FCID look like this
          073388Sm
          073389Sm
          073390Sm
          073391Sm
          073392Sm
          073393Sm
          073394Sm
          cont1

          if this is not correct where should I look for it

          Thanks

          Comment


          • #6
            Those must be your sample ID's. The Samplesheet.csv file that is contained in the raw data folder does not have the Flowcell ID in the file.

            Did you get the complete raw data folder from your sequence provider? It should have a date stamp as the starting name (http://support.illumina.com/help/Seq...FileNaming.htm).

            Comment


            • #7
              This is how I have it now

              [Header]
              IEMFileVersion 4
              Investigator Name
              Experiment Name
              Date 0/00/2015
              Workflow GenerateFASTQ
              Application FASTQ Only
              Assay TruSeq LT
              Description Test
              Chemistry Default

              [Reads]
              250
              250

              [Settings]
              ReverseComplement 0
              Adapter
              AdapterRead2

              [Data]
              FCID Lane Sample_ID SampleRef index Description Control Recipe Operator SampleProject
              000000000-ADBFK 070008Sm xxxxxxxx
              000000000-ADBFK 070009Sm xxxxxxxx
              000000000-ADBFK 070010Sm xxxxxxxx
              000000000-ADBFK 070011Sm xxxxxxxx
              000000000-ADBFK 070012Sm xxxxxxxx
              000000000-ADBFK 070013Sm xxxxxxxx
              000000000-ADBFK 070014Sm xxxxxxxx
              000000000-ADBFK cont1 xxxxxxxx
              000000000-ADBFK 070016Sm xxxxxxxx
              000000000-ADBFK 070017Sm xxxxxxxx

              I am still getting this error

              /home/CASAVA/bin/configureBclToFastq.pl --output-dir /home/unaligned --input-dir /home/DevelopmentRun1/Data/Intensities/BaseCalls --fastq-cluster-count 0 --sample-sheet /home/DevelopmentRun1/SampleSheet1.csv --tiles s_1_* --force --mismatches 1 --ignore-missing-bcl --ignore-missing-stats --use-bases-mask n
              could not find ParserDetails.ini in /home/localperl/lib/site_perl/5.20.2/XML/SAX
              [2015-02-25 10:38:57] [configureBclToFastq.pl] INFO: Basecalling software: RTA
              [2015-02-25 10:38:57] [configureBclToFastq.pl] INFO: version: 1.18 (build 54)
              [2015-02-25 10:38:57] [configureBclToFastq.pl] INFO: Original use-bases mask: n
              [2015-02-25 10:38:57] [configureBclToFastq.pl] INFO: Guessed use-bases mask: n,IIIIIIIn,yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
              ERROR: FlowCell ID is inconsistent across Sample Sheet lines. Expected: 'Casava:emultiplex::SampleSheet::Csv=HASH(0x2f16e80)->flowCellId()', got Investigator Name
              at /home/CASAVA/lib/bcl2fastq-1.8.4/perl/Casava/Demultiplex.pm line 531

              Thank you for the kind help

              Comment


              • #8
                Is this a 1D or 2D barcode run?

                Comment


                • #9
                  Your samplesheet (if this is a 1D run) needs to look like this:

                  Code:
                  FCID,Lane,Sample_ID,SampleRef,index,Description,Control,Recipe,Operator,SampleProject
                  000000000-ADBFK,1,073388Sm,no_ref,PUT_TAG_SEQ_HERE,NA,N,NA,NA,
                  000000000-ADBFK,1,073389Sm,no_ref,PUT_TAG_SEQ_HERE,NA,N,NA,NA,
                  000000000-ADBFK,1,073390Sm,no_ref,PUT_TAG_SEQ_HERE,NA,N,NA,NA,
                  000000000-ADBFK,1,073391Sm,no_ref,PUT_TAG_SEQ_HERE,NA,N,NA,NA,
                  and so on
                  The samplesheet that you are using is needed if you were using MiSeq reporter to do the analysis.
                  Last edited by GenoMax; 02-25-2015, 09:09 AM.

                  Comment


                  • #10
                    its a 2D run and now my sample sheet looks like this:

                    FCID,Lane,Sample_ID,SampleRef,index,Description,Control,Recipe,Operator,SampleProject
                    000000000-ADBFK,1,070008Sm,,CGCCCGCC-AAAAAAAA,,N,,,
                    000000000-ADBFK,1,070009Sm,,CGCCCGCC-AAAAAAAA,,N,,,
                    000000000-ADBFK,1,070000Sm,,CGCCCGCC-AAAAAAAA,,N,,,
                    000000000-ADBFK,1,070011Sm,,CGCCCGCC-AAAAAAAA,,N,,,
                    000000000-ADBFK,1,070012Sm,,CGCCCGCC-AAAAAAAA,,N,,,
                    000000000-ADBFK,1,070013Sm,,CGCCCGCC-AAAAAAAA,,N,,,

                    /home/CASAVA/bin/configureBclToFastq.pl --output-dir /home/DevelopmentRun1/unaligned --input-dir /home/DevelopmentRun1/Data/Intensities/BaseCalls --fastq-cluster-count 0 --sample-sheet /home/DevelopmentRun1/SampleSheet1.csv --tiles s_1_* --force --mismatches 1 --ignore-missing-bcl --ignore-missing-stats --use-bases-mask n
                    could not find ParserDetails.ini in /home/Downloads/localperl/lib/site_perl/5.20.2/XML/SAX
                    [2015-02-25 15:00:14] [configureBclToFastq.pl] INFO: Basecalling software: RTA
                    [2015-02-25 15:00:14] [configureBclToFastq.pl] INFO: version: 1.18 (build 54)
                    [2015-02-25 15:00:14] [configureBclToFastq.pl] INFO: Original use-bases mask: n
                    [2015-02-25 15:00:14] [configureBclToFastq.pl] INFO: Guessed use-bases mask: n,IIIIIIIn,yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
                    [2015-02-25 15:00:14] [configureBclToFastq.pl] ERROR: barcode ACGCATGGATGACTGG for lane 1 has length 16: expected barcode lenth (including delimiters) is 7
                    [2015-02-25 15:00:14] [configureBclToFastq.pl] BACKTRACE: at /home/Downloads/CASAVA/lib/bcl2fastq-1.8.4/perl/Casava/Demultiplex.pm line 553
                    Casava:emultiplex::loadSampleSheet('Casava:emultiplex=HASH(0x20f1498)') called at /home/Downloads/CASAVA/bin/configureBclToFastq.pl line 427
                    Died at /home/Downloads/CASAVA/lib/bcl2fastq-1.8.4/perl/Casava/Common/Log.pm line 310

                    I have tried to change the --use-bases-mask to different I settings as well but none of them seems to work

                    Comment


                    • #11
                      Use this base mask (if you have 8 bp tags).
                      Code:
                      --use-bases-mask Y*,I8,I8,Y*
                      or you could completely omit that option and bcl2fastq will guess the correct values from RunInfo.xml file.

                      I hope you have separate tags for each sample since giving identical tags to all samples is not going to separate any samples.

                      Comment


                      • #12
                        It worked but I don't see a Demultiplex_Stats File and DemultiplexedBustardSummary.xml. Looks like it is putting everything in Undetermined _indices.

                        Comment


                        • #13
                          I have it working now thank you for all the guidance.

                          Comment


                          • #14
                            Great. All of your output files should be in the "Unaligned/Basecall_Stats*" and "Unaligned/Project_*" directories. Undetermined pile of sequences goes into "Undetermined_indices" directory.

                            Comment


                            • #15
                              yes that's where they are thank you!

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Essential Discoveries and Tools in Epitranscriptomics
                                by seqadmin


                                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                                Today, 07:01 AM
                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              37 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              39 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              35 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              54 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X