That sounds great, thanks - getting an estimate from a subset of reads will be good enough for most of my analyses. I take out duplicates anyway (with prinseq), so losing that information is okay.
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
This topic is closed.
X
X
-
Hi Simon,
I am trying to use FastQC as part of a pipeline and I use /dev/stdin as my input (I have to unzip my files before parsing to FastQC). I redirect my report using '-o', but there doesn't appear to be any way I can give the report a name? The problem is that I will be processing multiple files, so they all have to have a unique name containing the sample name - I haven't found a way to do that when using stdin. Any thoughts? My command is as follows:
gunzip *.gz -c | fastqc -f fastq /dev/stdin -o /Volumes/Storage_1/Sequencing_1/Reports/
Thanks very much
Comment
-
Originally posted by kga1978 View PostHi Simon,
I am trying to use FastQC as part of a pipeline and I use /dev/stdin as my input (I have to unzip my files before parsing to FastQC). I redirect my report using '-o', but there doesn't appear to be any way I can give the report a name? The problem is that I will be processing multiple files, so they all have to have a unique name containing the sample name - I haven't found a way to do that when using stdin. Any thoughts? My command is as follows:
gunzip *.gz -c | fastqc -f fastq /dev/stdin -o /Volumes/Storage_1/Sequencing_1/Reports/
Thanks very much
Comment
-
Hey Simon,
I have tried that, but the casava option doesn't appear to work correctly on my files. I get the following:
Code:fastqc --casava Sample_O_215-1_225-2_225TGACCAreads0*.gz File 'Sample_O_215-1_225-2_225TGACCAreads001.gz' didn't look like part of a CASAVA group File 'Sample_O_215-1_225-2_225TGACCAreads002.gz' didn't look like part of a CASAVA group File 'Sample_O_215-1_225-2_225TGACCAreads003.gz' didn't look like part of a CASAVA group File 'Sample_O_215-1_225-2_225TGACCAreads004.gz' didn't look like part of a CASAVA group File 'Sample_O_215-1_225-2_225TGACCAreads005.gz' didn't look like part of a CASAVA group File 'Sample_O_215-1_225-2_225TGACCAreads006.gz' didn't look like part of a CASAVA group File 'Sample_O_215-1_225-2_225TGACCAreads007.gz' didn't look like part of a CASAVA group File 'Sample_O_215-1_225-2_225TGACCAreads008.gz' didn't look like part of a CASAVA group File 'Sample_O_215-1_225-2_225TGACCAreads009.gz' didn't look like part of a CASAVA group File 'Sample_O_215-1_225-2_225TGACCAreads010.gz' didn't look like part of a CASAVA group File 'Sample_O_215-1_225-2_225TGACCAreads011.gz' didn't look like part of a CASAVA group File 'Sample_O_215-1_225-2_225TGACCAreads012.gz' didn't look like part of a CASAVA group File 'Sample_O_215-1_225-2_225TGACCAreads013.gz' didn't look like part of a CASAVA group File 'Sample_O_215-1_225-2_225TGACCAreads014.gz' didn't look like part of a CASAVA group File 'Sample_O_215-1_225-2_225TGACCAreads015.gz' didn't look like part of a CASAVA group File 'Sample_O_215-1_225-2_225TGACCAreads016.gz' didn't look like part of a CASAVA group File 'Sample_O_215-1_225-2_225TGACCAreads017.gz' didn't look like part of a CASAVA group File 'Sample_O_215-1_225-2_225TGACCAreads018.gz' didn't look like part of a CASAVA group File 'Sample_O_215-1_225-2_225TGACCAreads019.gz' didn't look like part of a CASAVA group
Comment
-
Originally posted by kga1978 View PostFile 'Sample_O_215-1_225-2_225TGACCAreads001.gz' didn't look like part of a CASAVA group
<sample name>_<barcode sequence>_L<lane (0-padded to 3 digits)>_R<read number>_<set number (0-padded to 3 digits>.fastq.gz
Which is what FastQC looks for. The end of your file names seems to have been changed so that FastQC isn't able to group them together. I deliberately stuck quite closely to the official spec as I didn't want to end up merging together files which shouldn't be. I assumed that no one would bother going through and changing the names of all of the individual files, but it looks like I was wrong :-)
Comment
-
Single quality score number?
Hey Simon,
Any chance it would be possible to include a single-point measure of analyzed quality scores? More specifically, if I have analyzed my data and see the bp vs quality plots, it would be nice to have a single number here - e.g. % bp > Q30, etc.
Also, for paired-end data - is there any easy way to analyze two fastq files at the same time? (i.e. mate1.fastq and mate2.fastq)?
Comment
-
Originally posted by kga1978 View PostHey Simon,
Any chance it would be possible to include a single-point measure of analyzed quality scores? More specifically, if I have analyzed my data and see the bp vs quality plots, it would be nice to have a single number here - e.g. % bp > Q30, etc.
If you really want this number you could easily extract it from the text output of the per sequence quality plot.
Originally posted by kga1978 View PostAlso, for paired-end data - is there any easy way to analyze two fastq files at the same time? (i.e. mate1.fastq and mate2.fastq)?
Comment
-
Ah, I see your concern - it is a sticky issue .
Since the text file outputs the mean Q value per base - maybe you could just output that instead? Obviously I can do it from the text file itself, but that adds an extra step.
As for the paired-end - sorry, I should have been more clear. I would like to analyze them together (preferably the way it's done if they are together in a BAM file - one stuck onto the other). When I add more than one file after the other, I get a single analysis for each - not for the two combined.
Comment
-
Originally posted by frewise View PostHi Simon, with the fastQC output, is it mean that the data is bad for use if there is any module reporting failure?
Some of the tests are more predictive of bad data than others. The quality plots are most likely to indicate poor data, but even there we've seen libraries where a failed quality plot actually showed a problem in the Illumina pipeline, and not an actual problem in the data. All of the other tests can be failed by perfectly good data because of the type of library they came from, or for perfectly valid (and interesting) biological reasons.
Comment
-
Originally posted by kga1978 View PostAs for the paired-end - sorry, I should have been more clear. I would like to analyze them together (preferably the way it's done if they are together in a BAM file - one stuck onto the other). When I add more than one file after the other, I get a single analysis for each - not for the two combined.
It's kind of on my 'to think about' list, but unfortunately there's a lot of other stuff on there as well
Comment
-
Originally posted by simonandrews View PostAbsolutely NOT. FastQC can't tell you if your data is any good or not since it doesn't know what your data is supposed to look like. What it can do is to run a series of tests and point out where your data looks different to what most people's data looks like. The results shouldn't necessarily indicate your data is bad, but they should be a prompt to look at that aspect of your data and try to understand why the test failed.
Some of the tests are more predictive of bad data than others. The quality plots are most likely to indicate poor data, but even there we've seen libraries where a failed quality plot actually showed a problem in the Illumina pipeline, and not an actual problem in the data. All of the other tests can be failed by perfectly good data because of the type of library they came from, or for perfectly valid (and interesting) biological reasons.
Comment
-
Originally posted by simonandrews View PostIt's kind of on my 'to think about' list, but unfortunately there's a lot of other stuff on there as well
Comment
-
Hi Simon,
I am using FastQC as part of a workflow analysis pipeline and running from commandline. A single workflow would result in numerous fastq files. I note from the documentation that FastQC takes several filenames as arguments and runs a single run.
cmd:- fastqc filename1.fq filename2.fq filename 3.fq
How does the above command scale for large number of files? Is it better than to run the analysis for each file separately?
Comment
Latest Articles
Collapse
-
by seqadmin
Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...-
Channel: Articles
10-18-2024, 07:11 AM -
-
by seqadmin
Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.
Nobel Prize for MicroRNA Discovery
This week,...-
Channel: Articles
10-07-2024, 08:07 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks
by seqadmin
Started by seqadmin, Yesterday, 05:31 AM
|
0 responses
10 views
0 likes
|
Last Post
by seqadmin
Yesterday, 05:31 AM
|
||
Started by seqadmin, 10-24-2024, 06:58 AM
|
0 responses
20 views
0 likes
|
Last Post
by seqadmin
10-24-2024, 06:58 AM
|
||
New AI Model Designs Synthetic DNA Switches for Targeted Gene Expression in Specific Cell Types
by seqadmin
Started by seqadmin, 10-23-2024, 08:43 AM
|
0 responses
48 views
0 likes
|
Last Post
by seqadmin
10-23-2024, 08:43 AM
|
||
Started by seqadmin, 10-17-2024, 07:29 AM
|
0 responses
58 views
0 likes
|
Last Post
by seqadmin
10-17-2024, 07:29 AM
|
Comment