Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • fastq.gz VS fastq

    Hi, I have a question about fastq and fastq.gz files. I understand that fastq.gz is the compressed version of fastq file. Can I combine all the R1.fastq.gz files and R2.fastq.gz separately before feeding it to fastQC ? Thanks.
    Last edited by lala2013; 10-15-2013, 09:50 AM. Reason: Clarify question

  • #2
    FastQC can use .gz.

    You probably want to run R1 and R2 separately for understanding the quality of your sequencing run/data, as they have distinctly different error profiles.

    Comment


    • #3
      Sorry I didn't make it clear. Is it possible to do the following and feed R1.fastq.gz to fastQC?
      cat L001_R1.fastq.gz L002_R1.fastq.gz L003_R1.fastq.gz L004_R1.fastq.gz > R1.fastq.gz

      Comment


      • #4
        yes you can but if you can distribute jobs on a cluster, you could do each file separately much faster.

        Comment


        • #5
          Looks like fastqc will use one thread per file. Reading the -help for fastqc it states --threads "Specifies the number of files which can be processed simultaneously". Perhaps upping --threads to equal the number of fastq you have, then giving it all the fastq as inputs is the way to go? Going divide-and-conquer is what you need to do.

          The problem with cat'ing all your fastq is that the statistics would be calculated as average of all reads: you might not catch an anomaly in one subset of reads, especially if it gets averaged out by the other three set of reads.
          Last edited by winsettz; 10-15-2013, 12:00 PM.

          Comment


          • #6
            Originally posted by lala2013 View Post
            Sorry I didn't make it clear. Is it possible to do the following and feed R1.fastq.gz to fastQC?
            cat L001_R1.fastq.gz L002_R1.fastq.gz L003_R1.fastq.gz L004_R1.fastq.gz > R1.fastq.gz
            I agree that you should probably not merge the files prior to feeding them to FastQC because you run the risk of masking any potential lane specific effects. Apart from that, using cat doesn't work for gzipped files, but you can use zcat to do the same thing. The command should something like this:

            Code:
            zcat L001_R1.fastq.gz L002_R1.fastq.gz L003_R1.fastq.gz L004_R1.fastq.gz | gzip -c - > R1.fastq.gz

            Comment


            • #7
              I think cat does concatenate gzipped files. You don't need to gzip again either.

              Comment


              • #8
                Oh didn't know that, cheers for that!

                Comment


                • #9
                  Originally posted by vivek_ View Post
                  I think cat does concatenate gzipped files. You don't need to gzip again either.
                  I recall that FastQC didn't properly process the resultant file when simply 'cat'ed the gzipped files. The file which you get by concatenating gzipped file still contains some sort of division between the original compressed parts and FastQC would stop when it reached the end of the first block of reads. This was in and earlier version of FastQC and Simon may have addressed this in a later release.

                  O.K. I just went to the FastQC site and under the Change Log:

                  3-5-12: Version 0.10.1 released
                  Added a workround to allow the analysis of concatenated gzipped files(emphasis mine)
                  Fixed a bug when FastQC was installed in a path containing characters needing to be escaped in a URL
                  Added an option to specify the location of the java interpreter on the command line
                  This was added in the last version (0.10.1) so make sure you're using that version if you are concatenating gzipped files.
                  Last edited by kmcarr; 10-15-2013, 01:33 PM. Reason: Should have read the release notes before posting

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM
                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  25 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  28 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  24 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  52 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X