Seqanswers Leaderboard Ad

**Bukowski** · 12-23-2013, 02:36 PM

That really depends what you're doing with the reads as to whether there are enough reads... More information is required.

**arcolombo698** · 12-23-2013, 02:45 PM

I am doing variant calling, and will convert these files into VCF files, and do SNP calling for analysis of any SNP's.

**arcolombo698** · 12-23-2013, 02:46 PM

I am not doing any differential expression analysis. I imagine after I get the VCF files, I will then do a pathway analysis.

**EpiBrass** · 12-23-2013, 04:56 PM

It depends on what genome you're using, how long your reads are and whether they're paired-end or single-end and how even the coverage is. If you're dealing with say a wheat genome (17 GB) then I'd say the number of reads you have is too low. A good guide to go by is that any variations should be supported by a minimum of 10 reads (preferable both in forward and reverse).

Further information is required, but I hope this helps.

**arcolombo698** · 12-23-2013, 04:58 PM

I am dealing with a human genome (homo sapien) using the reference genome Hg19 from UCSC.

I have reads from 13 million, some in 15 million, and others above 20 million.

if 13 million is too low, then do I sacrifice quality?

**arcolombo698** · 12-23-2013, 04:59 PM

sorry. I am doing paired end reads, and most sequence lengths are from 30-128 bp in length

**EpiBrass** · 12-23-2013, 05:00 PM

What is your current quality cut off? I wouldn't go below Phred20.

I'm going to assume your "13/15/20 millions" are different samples which can't be pooled?

**arcolombo698** · 12-23-2013, 05:02 PM

looking through my QC, most reads are around 17 million. does this suffice? or should I lower my parameters in trimmomatic?

**arcolombo698** · 12-23-2013, 05:11 PM

1) yes they are different samples, we can not combine the reads
2) I chose phred score of 33
3) I am using illumina clip, and this is for an RNA seq experiment of bone marrow using a truseq prep kit.

here are my parameters,

java -classpath /auto/rcf-proj/sa1/software/Trimmomatic-0.32/trimmomatic-0.32.jar org.usadellab.trimmomatic.TrimmomaticPE -threads 16 -phred33 931269_R1.fastq.gz 931269_R2.fastq.gz paired_trimmed_931269_R1.fastq.gz unpaired_trimmed_931269_R1.fastq.gz paired_trimmed_931269_R2.fastq.gz unpaired_trimmed_931269_R2.fastq.gz ILLUMINACLIP:/auto/rcf-proj/sa1/acolombo/Target_2013_229/BoneMarrows_PolyA/Sample_931269/TruSeq2-PE.fa:2:30:10 LEADING:3 TRAILING:3 HEADCROP:15 SLIDINGWINDOW:4:10 MINLEN:30

**arcolombo698** · 12-23-2013, 05:13 PM

I am trimming the adapters using a custom made TruSeq2-PE.fa file as well.

**Bukowski** · 12-24-2013, 06:46 AM

Originally posted by arcolombo698 View Post

I am dealing with a human genome (homo sapien) using the reference genome Hg19 from UCSC.

I have reads from 13 million, some in 15 million, and others above 20 million.

if 13 million is too low, then do I sacrifice quality?

Is it exome, whole genome, RNA-Seq, smaller targeted capture?

Dan

**arcolombo698** · 12-24-2013, 09:43 AM

Thank you very much for your response

It is RNA-seq experiment.

**SNPsaurus** · 12-24-2013, 09:57 AM

If you are doing variant calling from RNA-seq data, 13M reads is enough to get sufficient read depth on a subset of the genes. Because the number of transcripts from genes varies 1000-fold, it is very difficult to get high depth from genes that are poorly expressed (and impossible to get high depth from genes that are not expressed). So for any particular number of reads, you will be able to make SNP calls for a particular number of genes, and as the number of reads increase, you'll be able to call SNPs from more genes.

edit: removed phred bit... thought was about parameters for cutting poor quality, not encoding!

**mastal** · 12-24-2013, 10:15 AM

note that -phred33 in the trimmomatic parameters refers to the Illumina encoding for the base qualities, and not to the cutoff value.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 20 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 20 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Sufficient Reads

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News