Unconfigured Ad

Collapse
This topic is closed.
X
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • kga1978
    Senior Member
    • Nov 2010
    • 100

    That sounds great, thanks - getting an estimate from a subset of reads will be good enough for most of my analyses. I take out duplicates anyway (with prinseq), so losing that information is okay.

    Comment

    • kga1978
      Senior Member
      • Nov 2010
      • 100

      Hi Simon,
      I am trying to use FastQC as part of a pipeline and I use /dev/stdin as my input (I have to unzip my files before parsing to FastQC). I redirect my report using '-o', but there doesn't appear to be any way I can give the report a name? The problem is that I will be processing multiple files, so they all have to have a unique name containing the sample name - I haven't found a way to do that when using stdin. Any thoughts? My command is as follows:

      gunzip *.gz -c | fastqc -f fastq /dev/stdin -o /Volumes/Storage_1/Sequencing_1/Reports/

      Thanks very much

      Comment

      • simonandrews
        Simon Andrews
        • May 2009
        • 870

        Originally posted by kga1978 View Post
        Hi Simon,
        I am trying to use FastQC as part of a pipeline and I use /dev/stdin as my input (I have to unzip my files before parsing to FastQC). I redirect my report using '-o', but there doesn't appear to be any way I can give the report a name? The problem is that I will be processing multiple files, so they all have to have a unique name containing the sample name - I haven't found a way to do that when using stdin. Any thoughts? My command is as follows:

        gunzip *.gz -c | fastqc -f fastq /dev/stdin -o /Volumes/Storage_1/Sequencing_1/Reports/

        Thanks very much
        FastQC doesn't support reading from stdin in it's current incarnation. If you're doing this to merge together the multiple files generated by the illumina pipeline then you can use the --casava option and pass in all of the fastq.gz files and FastQC will merge them together appropriately and write out a combined analysis report for each lane.

        Comment

        • kga1978
          Senior Member
          • Nov 2010
          • 100

          Hey Simon,

          I have tried that, but the casava option doesn't appear to work correctly on my files. I get the following:

          Code:
          fastqc --casava Sample_O_215-1_225-2_225TGACCAreads0*.gz
          File 'Sample_O_215-1_225-2_225TGACCAreads001.gz' didn't look like part of a CASAVA group
          File 'Sample_O_215-1_225-2_225TGACCAreads002.gz' didn't look like part of a CASAVA group
          File 'Sample_O_215-1_225-2_225TGACCAreads003.gz' didn't look like part of a CASAVA group
          File 'Sample_O_215-1_225-2_225TGACCAreads004.gz' didn't look like part of a CASAVA group
          File 'Sample_O_215-1_225-2_225TGACCAreads005.gz' didn't look like part of a CASAVA group
          File 'Sample_O_215-1_225-2_225TGACCAreads006.gz' didn't look like part of a CASAVA group
          File 'Sample_O_215-1_225-2_225TGACCAreads007.gz' didn't look like part of a CASAVA group
          File 'Sample_O_215-1_225-2_225TGACCAreads008.gz' didn't look like part of a CASAVA group
          File 'Sample_O_215-1_225-2_225TGACCAreads009.gz' didn't look like part of a CASAVA group
          File 'Sample_O_215-1_225-2_225TGACCAreads010.gz' didn't look like part of a CASAVA group
          File 'Sample_O_215-1_225-2_225TGACCAreads011.gz' didn't look like part of a CASAVA group
          File 'Sample_O_215-1_225-2_225TGACCAreads012.gz' didn't look like part of a CASAVA group
          File 'Sample_O_215-1_225-2_225TGACCAreads013.gz' didn't look like part of a CASAVA group
          File 'Sample_O_215-1_225-2_225TGACCAreads014.gz' didn't look like part of a CASAVA group
          File 'Sample_O_215-1_225-2_225TGACCAreads015.gz' didn't look like part of a CASAVA group
          File 'Sample_O_215-1_225-2_225TGACCAreads016.gz' didn't look like part of a CASAVA group
          File 'Sample_O_215-1_225-2_225TGACCAreads017.gz' didn't look like part of a CASAVA group
          File 'Sample_O_215-1_225-2_225TGACCAreads018.gz' didn't look like part of a CASAVA group
          File 'Sample_O_215-1_225-2_225TGACCAreads019.gz' didn't look like part of a CASAVA group
          I have tried to add the files individually as well, but I got the same error. Any thoughts?

          Comment

          • simonandrews
            Simon Andrews
            • May 2009
            • 870

            Originally posted by kga1978 View Post
            File 'Sample_O_215-1_225-2_225TGACCAreads001.gz' didn't look like part of a CASAVA group
            Those names don't look like the names generated by Casava. According to the docs I've got the fastq file names should follow the pattern:

            <sample name>_<barcode sequence>_L<lane (0-padded to 3 digits)>_R<read number>_<set number (0-padded to 3 digits>.fastq.gz

            Which is what FastQC looks for. The end of your file names seems to have been changed so that FastQC isn't able to group them together. I deliberately stuck quite closely to the official spec as I didn't want to end up merging together files which shouldn't be. I assumed that no one would bother going through and changing the names of all of the individual files, but it looks like I was wrong :-)

            Comment

            • kga1978
              Senior Member
              • Nov 2010
              • 100

              Hah, no idea how our core is naming the files - they recently changed their pipeline, so maybe they threw in a renaming step as well....

              Would be good to be able to analyze all these as a single entity though.

              Comment

              • kga1978
                Senior Member
                • Nov 2010
                • 100

                Single quality score number?

                Hey Simon,

                Any chance it would be possible to include a single-point measure of analyzed quality scores? More specifically, if I have analyzed my data and see the bp vs quality plots, it would be nice to have a single number here - e.g. % bp > Q30, etc.

                Also, for paired-end data - is there any easy way to analyze two fastq files at the same time? (i.e. mate1.fastq and mate2.fastq)?

                Comment

                • simonandrews
                  Simon Andrews
                  • May 2009
                  • 870

                  Originally posted by kga1978 View Post
                  Hey Simon,

                  Any chance it would be possible to include a single-point measure of analyzed quality scores? More specifically, if I have analyzed my data and see the bp vs quality plots, it would be nice to have a single number here - e.g. % bp > Q30, etc.
                  The problem with that kind of measure is where to draw the line. Q30 might be a good number for current Illumina reads, but may not be appropriate for Ion Torrent, PacBio or 454. We agonised for long enough about where to put the colour boundaries on the per-base quality plot :-)

                  If you really want this number you could easily extract it from the text output of the per sequence quality plot.

                  Originally posted by kga1978 View Post
                  Also, for paired-end data - is there any easy way to analyze two fastq files at the same time? (i.e. mate1.fastq and mate2.fastq)?
                  I'm sure I'm missing the point but FastQC supports analysing as many files as you like. Just put multiple file names on the command line, or open more than one file in the interactive application.

                  Comment

                  • kga1978
                    Senior Member
                    • Nov 2010
                    • 100

                    Ah, I see your concern - it is a sticky issue .

                    Since the text file outputs the mean Q value per base - maybe you could just output that instead? Obviously I can do it from the text file itself, but that adds an extra step.

                    As for the paired-end - sorry, I should have been more clear. I would like to analyze them together (preferably the way it's done if they are together in a BAM file - one stuck onto the other). When I add more than one file after the other, I get a single analysis for each - not for the two combined.

                    Comment

                    • frewise
                      Member
                      • Jun 2011
                      • 13

                      Is the data available if there is failure in the check

                      Hi Simon, with the fastQC output, is it mean that the data is bad for use if there is any module reporting failure?
                      Thanks!

                      Comment

                      • simonandrews
                        Simon Andrews
                        • May 2009
                        • 870

                        Originally posted by frewise View Post
                        Hi Simon, with the fastQC output, is it mean that the data is bad for use if there is any module reporting failure?
                        Absolutely NOT. FastQC can't tell you if your data is any good or not since it doesn't know what your data is supposed to look like. What it can do is to run a series of tests and point out where your data looks different to what most people's data looks like. The results shouldn't necessarily indicate your data is bad, but they should be a prompt to look at that aspect of your data and try to understand why the test failed.

                        Some of the tests are more predictive of bad data than others. The quality plots are most likely to indicate poor data, but even there we've seen libraries where a failed quality plot actually showed a problem in the Illumina pipeline, and not an actual problem in the data. All of the other tests can be failed by perfectly good data because of the type of library they came from, or for perfectly valid (and interesting) biological reasons.

                        Comment

                        • simonandrews
                          Simon Andrews
                          • May 2009
                          • 870

                          Originally posted by kga1978 View Post
                          As for the paired-end - sorry, I should have been more clear. I would like to analyze them together (preferably the way it's done if they are together in a BAM file - one stuck onto the other). When I add more than one file after the other, I get a single analysis for each - not for the two combined.
                          I've been thinking about how best to handle paired end data. Paired end BAM files currently lump everything into one report (but reverse complementing the second read), which isn't ideal. I could see the benefit to separating out the two reads but combining them in a single report where there was one summary, but two graphs for all other sections.

                          It's kind of on my 'to think about' list, but unfortunately there's a lot of other stuff on there as well

                          Comment

                          • frewise
                            Member
                            • Jun 2011
                            • 13

                            Originally posted by simonandrews View Post
                            Absolutely NOT. FastQC can't tell you if your data is any good or not since it doesn't know what your data is supposed to look like. What it can do is to run a series of tests and point out where your data looks different to what most people's data looks like. The results shouldn't necessarily indicate your data is bad, but they should be a prompt to look at that aspect of your data and try to understand why the test failed.

                            Some of the tests are more predictive of bad data than others. The quality plots are most likely to indicate poor data, but even there we've seen libraries where a failed quality plot actually showed a problem in the Illumina pipeline, and not an actual problem in the data. All of the other tests can be failed by perfectly good data because of the type of library they came from, or for perfectly valid (and interesting) biological reasons.
                            Thanks for your help!

                            Comment

                            • kga1978
                              Senior Member
                              • Nov 2010
                              • 100

                              Originally posted by simonandrews View Post
                              It's kind of on my 'to think about' list, but unfortunately there's a lot of other stuff on there as well
                              Haha, I can imagine - but thanks for thinking about it though! For now I just analyze one and double up - I find very little difference between the two mates.

                              Comment

                              • ganygan25
                                Junior Member
                                • Dec 2011
                                • 9

                                Hi Simon,
                                I am using FastQC as part of a workflow analysis pipeline and running from commandline. A single workflow would result in numerous fastq files. I note from the documentation that FastQC takes several filenames as arguments and runs a single run.

                                cmd:- fastqc filename1.fq filename2.fq filename 3.fq

                                How does the above command scale for large number of files? Is it better than to run the analysis for each file separately?

                                Comment

                                Latest Articles

                                Collapse

                                • SEQadmin2
                                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                  by SEQadmin2


                                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                  ...
                                  06-02-2026, 10:05 AM
                                • SEQadmin2
                                  Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                  by SEQadmin2


                                  With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                  Introduction

                                  Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                  05-22-2026, 06:42 AM
                                • SEQadmin2
                                  Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                  by SEQadmin2

                                  Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                  Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                  05-06-2026, 09:04 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by SEQadmin2, 06-02-2026, 12:03 PM
                                0 responses
                                21 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-02-2026, 11:40 AM
                                0 responses
                                14 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 05-28-2026, 11:40 AM
                                0 responses
                                29 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 05-26-2026, 10:12 AM
                                0 responses
                                31 views
                                0 reactions
                                Last Post SEQadmin2  
                                Working...