Seqanswers Leaderboard Ad

**BAMseek** · 02-24-2012, 06:32 PM

My guess is that CASAVA divided the reads into multiple fastq files, with maximum number of reads per file set to 16 million. So your sample might be spread across multiple files. You can always do a word count (wc -l) on the file and divide by 4 to get the number of reads. There is a CASAVA mode added to FastQC (beginning in Version 0.10.0) that handles the multiple fastq files produced by CASAVA.

Justin

**dejavu2010** · 02-24-2012, 07:33 PM

Actually I have concatenate those 12 files into a big file and then upload it to FastQC, but it still only showed 16m reads.
michael

**BAMseek** · 02-24-2012, 08:56 PM

Maybe try counting the number of lines in the fastq file, using something like "wc -l", to see if the file has the number of reads you are expecting.

**mgogol** · 04-12-2012, 01:41 PM

Did you ever figure this out? I have a file with > 24 million reads and the fastqc report is saying 4000000 exactly... It also appears to be bailing out early.

I'll try upgrading to the latest version.

**simonandrews** · 04-12-2012, 11:23 PM

This will be because your original file will have been created by concatenating multiple gzipped files. This places gzip headers throughout the file rather than having a single header for all of the data at the top. The core java gzip decompressor doesn't account for multiple headers within the file, so says that the file has finished when the end of the first compressed block is reached (ie the end of the first file in the set). This problem will affect all programs written in java which use these classes to read gzipped data.

There are a few solutions:

Instead of doing cat *fastq.gz > allfiles.fastq.gz to join your files do zcat *fastq.gz | gzip -c > allfiles.fastq.gz. This will decompress and recompress the data so you'll end up with a single compressed block
Don't join the files together, but leave them separate and pass them all to fastqc and add the --casava option when starting fastqc. This will reombine the files into a single report for you.
Use the development verison of fastqc where I've added a work round for this. The fix will be in the next release.

The development version is here and the new release should be out very soon now.

Topics	Statistics	Last Post
A Closer Look at the Enigmatic Genomes of Oikopleura dioica by seqadmin Started by seqadmin, Yesterday, 06:35 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 06:35 AM
Advanced Epigenome Editing Platform Explores Gene Regulation Mechanisms by seqadmin Started by seqadmin, 05-09-2024, 02:46 PM	0 responses 21 views 0 likes	Last Post by seqadmin 05-09-2024, 02:46 PM
Telomere Maintenance by PARP1: A New Perspective in Cancer Research by seqadmin Started by seqadmin, 05-07-2024, 06:57 AM	0 responses 18 views 0 likes	Last Post by seqadmin 05-07-2024, 06:57 AM
Enhanced Neoantigen Detection: Introducing NeoHunter by seqadmin Started by seqadmin, 05-06-2024, 07:17 AM	0 responses 19 views 0 likes	Last Post by seqadmin 05-06-2024, 07:17 AM

Seqanswers Leaderboard Ad

Announcement

fastqc read limit?

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News