That sounds great, thanks - getting an estimate from a subset of reads will be good enough for most of my analyses. I take out duplicates anyway (with prinseq), so losing that information is okay.
Unconfigured Ad
Collapse
This topic is closed.
X
X
-
Hi Simon,
I am trying to use FastQC as part of a pipeline and I use /dev/stdin as my input (I have to unzip my files before parsing to FastQC). I redirect my report using '-o', but there doesn't appear to be any way I can give the report a name? The problem is that I will be processing multiple files, so they all have to have a unique name containing the sample name - I haven't found a way to do that when using stdin. Any thoughts? My command is as follows:
gunzip *.gz -c | fastqc -f fastq /dev/stdin -o /Volumes/Storage_1/Sequencing_1/Reports/
Thanks very much
Comment
-
-
FastQC doesn't support reading from stdin in it's current incarnation. If you're doing this to merge together the multiple files generated by the illumina pipeline then you can use the --casava option and pass in all of the fastq.gz files and FastQC will merge them together appropriately and write out a combined analysis report for each lane.Originally posted by kga1978 View PostHi Simon,
I am trying to use FastQC as part of a pipeline and I use /dev/stdin as my input (I have to unzip my files before parsing to FastQC). I redirect my report using '-o', but there doesn't appear to be any way I can give the report a name? The problem is that I will be processing multiple files, so they all have to have a unique name containing the sample name - I haven't found a way to do that when using stdin. Any thoughts? My command is as follows:
gunzip *.gz -c | fastqc -f fastq /dev/stdin -o /Volumes/Storage_1/Sequencing_1/Reports/
Thanks very much
Comment
-
-
Hey Simon,
I have tried that, but the casava option doesn't appear to work correctly on my files. I get the following:
I have tried to add the files individually as well, but I got the same error. Any thoughts?Code:fastqc --casava Sample_O_215-1_225-2_225TGACCAreads0*.gz File 'Sample_O_215-1_225-2_225TGACCAreads001.gz' didn't look like part of a CASAVA group File 'Sample_O_215-1_225-2_225TGACCAreads002.gz' didn't look like part of a CASAVA group File 'Sample_O_215-1_225-2_225TGACCAreads003.gz' didn't look like part of a CASAVA group File 'Sample_O_215-1_225-2_225TGACCAreads004.gz' didn't look like part of a CASAVA group File 'Sample_O_215-1_225-2_225TGACCAreads005.gz' didn't look like part of a CASAVA group File 'Sample_O_215-1_225-2_225TGACCAreads006.gz' didn't look like part of a CASAVA group File 'Sample_O_215-1_225-2_225TGACCAreads007.gz' didn't look like part of a CASAVA group File 'Sample_O_215-1_225-2_225TGACCAreads008.gz' didn't look like part of a CASAVA group File 'Sample_O_215-1_225-2_225TGACCAreads009.gz' didn't look like part of a CASAVA group File 'Sample_O_215-1_225-2_225TGACCAreads010.gz' didn't look like part of a CASAVA group File 'Sample_O_215-1_225-2_225TGACCAreads011.gz' didn't look like part of a CASAVA group File 'Sample_O_215-1_225-2_225TGACCAreads012.gz' didn't look like part of a CASAVA group File 'Sample_O_215-1_225-2_225TGACCAreads013.gz' didn't look like part of a CASAVA group File 'Sample_O_215-1_225-2_225TGACCAreads014.gz' didn't look like part of a CASAVA group File 'Sample_O_215-1_225-2_225TGACCAreads015.gz' didn't look like part of a CASAVA group File 'Sample_O_215-1_225-2_225TGACCAreads016.gz' didn't look like part of a CASAVA group File 'Sample_O_215-1_225-2_225TGACCAreads017.gz' didn't look like part of a CASAVA group File 'Sample_O_215-1_225-2_225TGACCAreads018.gz' didn't look like part of a CASAVA group File 'Sample_O_215-1_225-2_225TGACCAreads019.gz' didn't look like part of a CASAVA group
Comment
-
-
Those names don't look like the names generated by Casava. According to the docs I've got the fastq file names should follow the pattern:Originally posted by kga1978 View PostFile 'Sample_O_215-1_225-2_225TGACCAreads001.gz' didn't look like part of a CASAVA group
<sample name>_<barcode sequence>_L<lane (0-padded to 3 digits)>_R<read number>_<set number (0-padded to 3 digits>.fastq.gz
Which is what FastQC looks for. The end of your file names seems to have been changed so that FastQC isn't able to group them together. I deliberately stuck quite closely to the official spec as I didn't want to end up merging together files which shouldn't be. I assumed that no one would bother going through and changing the names of all of the individual files, but it looks like I was wrong :-)
Comment
-
-
Single quality score number?
Hey Simon,
Any chance it would be possible to include a single-point measure of analyzed quality scores? More specifically, if I have analyzed my data and see the bp vs quality plots, it would be nice to have a single number here - e.g. % bp > Q30, etc.
Also, for paired-end data - is there any easy way to analyze two fastq files at the same time? (i.e. mate1.fastq and mate2.fastq)?
Comment
-
-
The problem with that kind of measure is where to draw the line. Q30 might be a good number for current Illumina reads, but may not be appropriate for Ion Torrent, PacBio or 454. We agonised for long enough about where to put the colour boundaries on the per-base quality plot :-)Originally posted by kga1978 View PostHey Simon,
Any chance it would be possible to include a single-point measure of analyzed quality scores? More specifically, if I have analyzed my data and see the bp vs quality plots, it would be nice to have a single number here - e.g. % bp > Q30, etc.
If you really want this number you could easily extract it from the text output of the per sequence quality plot.
I'm sure I'm missing the point but FastQC supports analysing as many files as you like. Just put multiple file names on the command line, or open more than one file in the interactive application.Originally posted by kga1978 View PostAlso, for paired-end data - is there any easy way to analyze two fastq files at the same time? (i.e. mate1.fastq and mate2.fastq)?
Comment
-
-
Ah, I see your concern - it is a sticky issue
.
Since the text file outputs the mean Q value per base - maybe you could just output that instead? Obviously I can do it from the text file itself, but that adds an extra step.
As for the paired-end - sorry, I should have been more clear. I would like to analyze them together (preferably the way it's done if they are together in a BAM file - one stuck onto the other). When I add more than one file after the other, I get a single analysis for each - not for the two combined.
Comment
-
-
Absolutely NOT. FastQC can't tell you if your data is any good or not since it doesn't know what your data is supposed to look like. What it can do is to run a series of tests and point out where your data looks different to what most people's data looks like. The results shouldn't necessarily indicate your data is bad, but they should be a prompt to look at that aspect of your data and try to understand why the test failed.Originally posted by frewise View PostHi Simon, with the fastQC output, is it mean that the data is bad for use if there is any module reporting failure?
Some of the tests are more predictive of bad data than others. The quality plots are most likely to indicate poor data, but even there we've seen libraries where a failed quality plot actually showed a problem in the Illumina pipeline, and not an actual problem in the data. All of the other tests can be failed by perfectly good data because of the type of library they came from, or for perfectly valid (and interesting) biological reasons.
Comment
-
-
I've been thinking about how best to handle paired end data. Paired end BAM files currently lump everything into one report (but reverse complementing the second read), which isn't ideal. I could see the benefit to separating out the two reads but combining them in a single report where there was one summary, but two graphs for all other sections.Originally posted by kga1978 View PostAs for the paired-end - sorry, I should have been more clear. I would like to analyze them together (preferably the way it's done if they are together in a BAM file - one stuck onto the other). When I add more than one file after the other, I get a single analysis for each - not for the two combined.
It's kind of on my 'to think about' list, but unfortunately there's a lot of other stuff on there as well
Comment
-
-
Thanks for your help!Originally posted by simonandrews View PostAbsolutely NOT. FastQC can't tell you if your data is any good or not since it doesn't know what your data is supposed to look like. What it can do is to run a series of tests and point out where your data looks different to what most people's data looks like. The results shouldn't necessarily indicate your data is bad, but they should be a prompt to look at that aspect of your data and try to understand why the test failed.
Some of the tests are more predictive of bad data than others. The quality plots are most likely to indicate poor data, but even there we've seen libraries where a failed quality plot actually showed a problem in the Illumina pipeline, and not an actual problem in the data. All of the other tests can be failed by perfectly good data because of the type of library they came from, or for perfectly valid (and interesting) biological reasons.
Comment
-
-
Haha, I can imagine - but thanks for thinking about it though! For now I just analyze one and double up - I find very little difference between the two mates.Originally posted by simonandrews View PostIt's kind of on my 'to think about' list, but unfortunately there's a lot of other stuff on there as well
Comment
-
-
Hi Simon,
I am using FastQC as part of a workflow analysis pipeline and running from commandline. A single workflow would result in numerous fastq files. I note from the documentation that FastQC takes several filenames as arguments and runs a single run.
cmd:- fastqc filename1.fq filename2.fq filename 3.fq
How does the above command scale for large number of files? Is it better than to run the analysis for each file separately?
Comment
-
Latest Articles
Collapse
-
by SEQadmin2
Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.
The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
...-
Channel: Articles
06-02-2026, 10:05 AM -
-
by SEQadmin2
With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.
Introduction
Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...-
Channel: Articles
05-22-2026, 06:42 AM -
-
by SEQadmin2
Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.
Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...-
Channel: Articles
05-06-2026, 09:04 AM -
ad_right_rmr
Collapse
News
Collapse
| Topics | Statistics | Last Post | ||
|---|---|---|---|---|
|
Started by SEQadmin2, 06-02-2026, 12:03 PM
|
0 responses
21 views
0 reactions
|
Last Post
by SEQadmin2
06-02-2026, 12:03 PM
|
||
|
Started by SEQadmin2, 06-02-2026, 11:40 AM
|
0 responses
14 views
0 reactions
|
Last Post
by SEQadmin2
06-02-2026, 11:40 AM
|
||
|
Started by SEQadmin2, 05-28-2026, 11:40 AM
|
0 responses
29 views
0 reactions
|
Last Post
by SEQadmin2
05-28-2026, 11:40 AM
|
||
|
Started by SEQadmin2, 05-26-2026, 10:12 AM
|
0 responses
31 views
0 reactions
|
Last Post
by SEQadmin2
05-26-2026, 10:12 AM
|
Comment