Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • arcolombo698
    Senior Member
    • Nov 2013
    • 142

    Sufficient Reads

    Hello.

    I am doing fastQC on a batch of samples and am wondering how many millions of reads suffices?

    One sample passed QC with 13 x 10^6 reads (roughly 13 million)

    over 20 million reads and of good quality is ideal. but what about 13 million?
  • Bukowski
    Senior Member
    • Jan 2010
    • 388

    #2
    That really depends what you're doing with the reads as to whether there are enough reads... More information is required.

    Comment

    • arcolombo698
      Senior Member
      • Nov 2013
      • 142

      #3
      I am doing variant calling, and will convert these files into VCF files, and do SNP calling for analysis of any SNP's.

      Comment

      • arcolombo698
        Senior Member
        • Nov 2013
        • 142

        #4
        I am not doing any differential expression analysis. I imagine after I get the VCF files, I will then do a pathway analysis.

        Comment

        • EpiBrass
          Member
          • Nov 2013
          • 16

          #5
          It depends on what genome you're using, how long your reads are and whether they're paired-end or single-end and how even the coverage is. If you're dealing with say a wheat genome (17 GB) then I'd say the number of reads you have is too low. A good guide to go by is that any variations should be supported by a minimum of 10 reads (preferable both in forward and reverse).

          Further information is required, but I hope this helps.

          Comment

          • arcolombo698
            Senior Member
            • Nov 2013
            • 142

            #6
            I am dealing with a human genome (homo sapien) using the reference genome Hg19 from UCSC.

            I have reads from 13 million, some in 15 million, and others above 20 million.

            if 13 million is too low, then do I sacrifice quality?

            Comment

            • arcolombo698
              Senior Member
              • Nov 2013
              • 142

              #7
              sorry. I am doing paired end reads, and most sequence lengths are from 30-128 bp in length

              Comment

              • EpiBrass
                Member
                • Nov 2013
                • 16

                #8
                What is your current quality cut off? I wouldn't go below Phred20.

                I'm going to assume your "13/15/20 millions" are different samples which can't be pooled?

                Comment

                • arcolombo698
                  Senior Member
                  • Nov 2013
                  • 142

                  #9
                  looking through my QC, most reads are around 17 million. does this suffice? or should I lower my parameters in trimmomatic?

                  Comment

                  • arcolombo698
                    Senior Member
                    • Nov 2013
                    • 142

                    #10
                    1) yes they are different samples, we can not combine the reads
                    2) I chose phred score of 33
                    3) I am using illumina clip, and this is for an RNA seq experiment of bone marrow using a truseq prep kit.

                    here are my parameters,

                    java -classpath /auto/rcf-proj/sa1/software/Trimmomatic-0.32/trimmomatic-0.32.jar org.usadellab.trimmomatic.TrimmomaticPE -threads 16 -phred33 931269_R1.fastq.gz 931269_R2.fastq.gz paired_trimmed_931269_R1.fastq.gz unpaired_trimmed_931269_R1.fastq.gz paired_trimmed_931269_R2.fastq.gz unpaired_trimmed_931269_R2.fastq.gz ILLUMINACLIP:/auto/rcf-proj/sa1/acolombo/Target_2013_229/BoneMarrows_PolyA/Sample_931269/TruSeq2-PE.fa:2:30:10 LEADING:3 TRAILING:3 HEADCROP:15 SLIDINGWINDOW:4:10 MINLEN:30

                    Comment

                    • arcolombo698
                      Senior Member
                      • Nov 2013
                      • 142

                      #11
                      I am trimming the adapters using a custom made TruSeq2-PE.fa file as well.

                      Comment

                      • Bukowski
                        Senior Member
                        • Jan 2010
                        • 388

                        #12
                        Originally posted by arcolombo698 View Post
                        I am dealing with a human genome (homo sapien) using the reference genome Hg19 from UCSC.

                        I have reads from 13 million, some in 15 million, and others above 20 million.

                        if 13 million is too low, then do I sacrifice quality?
                        Is it exome, whole genome, RNA-Seq, smaller targeted capture?

                        Dan

                        Comment

                        • arcolombo698
                          Senior Member
                          • Nov 2013
                          • 142

                          #13
                          Thank you very much for your response

                          It is RNA-seq experiment.

                          Comment

                          • SNPsaurus
                            Registered Vendor
                            • May 2013
                            • 525

                            #14
                            If you are doing variant calling from RNA-seq data, 13M reads is enough to get sufficient read depth on a subset of the genes. Because the number of transcripts from genes varies 1000-fold, it is very difficult to get high depth from genes that are poorly expressed (and impossible to get high depth from genes that are not expressed). So for any particular number of reads, you will be able to make SNP calls for a particular number of genes, and as the number of reads increase, you'll be able to call SNPs from more genes.

                            edit: removed phred bit... thought was about parameters for cutting poor quality, not encoding!
                            Last edited by SNPsaurus; 12-24-2013, 12:17 PM.
                            Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

                            Comment

                            • mastal
                              Senior Member
                              • Mar 2009
                              • 666

                              #15
                              note that -phred33 in the trimmomatic parameters refers to the Illumina encoding for the base qualities, and not to the cutoff value.
                              Last edited by mastal; 12-24-2013, 11:44 AM.

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                by SEQadmin2


                                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


                                Here are nine questions we think about, in roughly the order they matter, before...
                                06-18-2026, 07:11 AM
                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 06-17-2026, 06:09 AM
                              0 responses
                              30 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-09-2026, 11:58 AM
                              0 responses
                              44 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              50 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-04-2026, 08:59 AM
                              0 responses
                              51 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...