Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sufficient Reads

    Hello.

    I am doing fastQC on a batch of samples and am wondering how many millions of reads suffices?

    One sample passed QC with 13 x 10^6 reads (roughly 13 million)

    over 20 million reads and of good quality is ideal. but what about 13 million?

  • #2
    That really depends what you're doing with the reads as to whether there are enough reads... More information is required.

    Comment


    • #3
      I am doing variant calling, and will convert these files into VCF files, and do SNP calling for analysis of any SNP's.

      Comment


      • #4
        I am not doing any differential expression analysis. I imagine after I get the VCF files, I will then do a pathway analysis.

        Comment


        • #5
          It depends on what genome you're using, how long your reads are and whether they're paired-end or single-end and how even the coverage is. If you're dealing with say a wheat genome (17 GB) then I'd say the number of reads you have is too low. A good guide to go by is that any variations should be supported by a minimum of 10 reads (preferable both in forward and reverse).

          Further information is required, but I hope this helps.

          Comment


          • #6
            I am dealing with a human genome (homo sapien) using the reference genome Hg19 from UCSC.

            I have reads from 13 million, some in 15 million, and others above 20 million.

            if 13 million is too low, then do I sacrifice quality?

            Comment


            • #7
              sorry. I am doing paired end reads, and most sequence lengths are from 30-128 bp in length

              Comment


              • #8
                What is your current quality cut off? I wouldn't go below Phred20.

                I'm going to assume your "13/15/20 millions" are different samples which can't be pooled?

                Comment


                • #9
                  looking through my QC, most reads are around 17 million. does this suffice? or should I lower my parameters in trimmomatic?

                  Comment


                  • #10
                    1) yes they are different samples, we can not combine the reads
                    2) I chose phred score of 33
                    3) I am using illumina clip, and this is for an RNA seq experiment of bone marrow using a truseq prep kit.

                    here are my parameters,

                    java -classpath /auto/rcf-proj/sa1/software/Trimmomatic-0.32/trimmomatic-0.32.jar org.usadellab.trimmomatic.TrimmomaticPE -threads 16 -phred33 931269_R1.fastq.gz 931269_R2.fastq.gz paired_trimmed_931269_R1.fastq.gz unpaired_trimmed_931269_R1.fastq.gz paired_trimmed_931269_R2.fastq.gz unpaired_trimmed_931269_R2.fastq.gz ILLUMINACLIP:/auto/rcf-proj/sa1/acolombo/Target_2013_229/BoneMarrows_PolyA/Sample_931269/TruSeq2-PE.fa:2:30:10 LEADING:3 TRAILING:3 HEADCROP:15 SLIDINGWINDOW:4:10 MINLEN:30

                    Comment


                    • #11
                      I am trimming the adapters using a custom made TruSeq2-PE.fa file as well.

                      Comment


                      • #12
                        Originally posted by arcolombo698 View Post
                        I am dealing with a human genome (homo sapien) using the reference genome Hg19 from UCSC.

                        I have reads from 13 million, some in 15 million, and others above 20 million.

                        if 13 million is too low, then do I sacrifice quality?
                        Is it exome, whole genome, RNA-Seq, smaller targeted capture?

                        Dan

                        Comment


                        • #13
                          Thank you very much for your response

                          It is RNA-seq experiment.

                          Comment


                          • #14
                            If you are doing variant calling from RNA-seq data, 13M reads is enough to get sufficient read depth on a subset of the genes. Because the number of transcripts from genes varies 1000-fold, it is very difficult to get high depth from genes that are poorly expressed (and impossible to get high depth from genes that are not expressed). So for any particular number of reads, you will be able to make SNP calls for a particular number of genes, and as the number of reads increase, you'll be able to call SNPs from more genes.

                            edit: removed phred bit... thought was about parameters for cutting poor quality, not encoding!
                            Last edited by SNPsaurus; 12-24-2013, 12:17 PM.
                            Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

                            Comment


                            • #15
                              note that -phred33 in the trimmomatic parameters refers to the Illumina encoding for the base qualities, and not to the cutoff value.
                              Last edited by mastal; 12-24-2013, 11:44 AM.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Genetic Variation in Immunogenetics and Antibody Diversity
                                by seqadmin



                                The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
                                11-06-2024, 07:24 PM
                              • seqadmin
                                Choosing Between NGS and qPCR
                                by seqadmin



                                Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                                10-18-2024, 07:11 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 11-01-2024, 06:09 AM
                              0 responses
                              30 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 10-30-2024, 05:31 AM
                              0 responses
                              21 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 10-24-2024, 06:58 AM
                              0 responses
                              26 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 10-23-2024, 08:43 AM
                              0 responses
                              57 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X