Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • dejavu2010
    Member
    • Jan 2012
    • 21

    fastqc read limit?

    I have a question about Trimmomatic.

    I have a PE 100 reads from hiseq 2000, 100PE reads =166,867,542 PE reads. when i opened it by fastqc, it only shows total sequence:16000000.
    which CASAVA has it for ELAND. If that is the case, where does this read limit comes from and how to bypass this problem. Thanks.
  • BAMseek
    Senior Member
    • Apr 2011
    • 124

    #2
    My guess is that CASAVA divided the reads into multiple fastq files, with maximum number of reads per file set to 16 million. So your sample might be spread across multiple files. You can always do a word count (wc -l) on the file and divide by 4 to get the number of reads. There is a CASAVA mode added to FastQC (beginning in Version 0.10.0) that handles the multiple fastq files produced by CASAVA.

    Justin

    Comment

    • dejavu2010
      Member
      • Jan 2012
      • 21

      #3
      Actually I have concatenate those 12 files into a big file and then upload it to FastQC, but it still only showed 16m reads.
      michael

      Comment

      • BAMseek
        Senior Member
        • Apr 2011
        • 124

        #4
        Maybe try counting the number of lines in the fastq file, using something like "wc -l", to see if the file has the number of reads you are expecting.

        Comment

        • mgogol
          Senior Member
          • Mar 2008
          • 197

          #5
          Did you ever figure this out? I have a file with > 24 million reads and the fastqc report is saying 4000000 exactly... It also appears to be bailing out early.

          I'll try upgrading to the latest version.
          Last edited by mgogol; 04-12-2012, 01:43 PM.

          Comment

          • simonandrews
            Simon Andrews
            • May 2009
            • 870

            #6
            This will be because your original file will have been created by concatenating multiple gzipped files. This places gzip headers throughout the file rather than having a single header for all of the data at the top. The core java gzip decompressor doesn't account for multiple headers within the file, so says that the file has finished when the end of the first compressed block is reached (ie the end of the first file in the set). This problem will affect all programs written in java which use these classes to read gzipped data.

            There are a few solutions:
            1. Instead of doing cat *fastq.gz > allfiles.fastq.gz to join your files do zcat *fastq.gz | gzip -c > allfiles.fastq.gz. This will decompress and recompress the data so you'll end up with a single compressed block
            2. Don't join the files together, but leave them separate and pass them all to fastqc and add the --casava option when starting fastqc. This will reombine the files into a single report for you.
            3. Use the development verison of fastqc where I've added a work round for this. The fix will be in the next release.


            The development version is here and the new release should be out very soon now.

            Comment

            Latest Articles

            Collapse

            • GATTACAT
              Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
              by GATTACAT
              Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
              07-01-2026, 11:43 AM
            • SEQadmin2
              Nine Things a Sample Prep Scientist Thinks About Before Sequencing
              by SEQadmin2


              I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

              Here are nine questions we think about, in roughly the order they matter, before...
              06-18-2026, 07:11 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by SEQadmin2, Yesterday, 11:08 AM
            0 responses
            6 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-30-2026, 05:37 AM
            0 responses
            11 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-26-2026, 11:10 AM
            0 responses
            19 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-17-2026, 06:09 AM
            0 responses
            53 views
            0 reactions
            Last Post SEQadmin2  
            Working...