Seqanswers Leaderboard Ad

**SNPsaurus** · 08-19-2013, 08:14 PM

Hi fanwei,

You could run Stacks (http://creskolab.uoregon.edu/stacks/) for general RAD-Seq analysis, including how many RAD loci you have sequenced. If you have a reference genome, and are wondering how many of the in silico cut sites are present in your data, you could create a "RAD reference" of the cut sites + 100 bp adjacent DNA and align your reads against that.

**fanwei** · 08-19-2013, 09:35 PM

Originally posted by SNPsaurus View Post

Hi fanwei,

You could run Stacks (http://creskolab.uoregon.edu/stacks/) for general RAD-Seq analysis, including how many RAD loci you have sequenced. If you have a reference genome, and are wondering how many of the in silico cut sites are present in your data, you could create a "RAD reference" of the cut sites + 100 bp adjacent DNA and align your reads against that.

Thank you!
yes, i have a reference genome. i have finished mapping using bwa, and using GATK for SNP calling. Approximately 6300 SNPs per sample have been found. But when i want to find specific SNPs between two samples, little has been found（less than 10）. It seems that little overlaps exist. My sequencing depth is 3~4X.
Can Stacks deal with this situation?
Thanks for your help!

**SNPsaurus** · 08-20-2013, 04:53 AM

If the coverage is low, you probably aren't getting enough depth to call SNPs at most loci. At 3-4X, you won't even pick up most heterozygous SNPs. If the goal is to find SNPs specific to a particular sample, you need to sequence to a high depth, feel confident that you don't have missing data, and then compare.

**fanwei** · 08-20-2013, 06:04 AM

Originally posted by SNPsaurus View Post

If the coverage is low, you probably aren't getting enough depth to call SNPs at most loci. At 3-4X, you won't even pick up most heterozygous SNPs. If the goal is to find SNPs specific to a particular sample, you need to sequence to a high depth, feel confident that you don't have missing data, and then compare.

Yea, because heterozygous SNPs is genetic instability, my goal is to find homozygous SNPs. Do you think the coverage is too low?
And sequencing is completed by company, they choose the TaqαI(TCGA) to digest genomic DNA. Now i'm wondering whether it is reasonable? Because there are too many digestion site in genome.

**SNPsaurus** · 08-20-2013, 07:35 PM

Was that for RAD or ddRAD or GBS, do you know? If it is RAD-Seq, then the digesting with a 4-cutter enzyme will produce short fragments resistant to shearing, making library creation very inefficient. For any of the methods, a frequent cutter like that will produce 3-5 million tags for a moderate sized genome of 500 Mb. So it is not surprising you have low coverage, unless they sequenced just 2 samples per HiSeq lane.

I'm guessing they only sequenced a portion of the possible cut sites, and so you ended up with a semi-random set of tags in one sample versus the other, with little overlap between them. If it was ddRAD or GBS, you also have to worry if they were not careful in the size distribution selection, since then one sample may end up with a bigger size range of fragments and a different set of loci selected.

Why was it paired-end sequenced? Tell me a little about the species, etc.

If a locus is sequenced at 3X, and it is diploid, then 25% of the time you'll only sequence one chromosome or the other, missing the heterozygosity. So you'll many times think it is homozygous for one allele in one sample and homozygous in the other allele in the other sample, when it is really heterozygous in both.

**fanwei** · 08-20-2013, 11:00 PM

Originally posted by SNPsaurus View Post

Was that for RAD or ddRAD or GBS, do you know? If it is RAD-Seq, then the digesting with a 4-cutter enzyme will produce short fragments resistant to shearing, making library creation very inefficient. For any of the methods, a frequent cutter like that will produce 3-5 million tags for a moderate sized genome of 500 Mb. So it is not surprising you have low coverage, unless they sequenced just 2 samples per HiSeq lane.

I'm guessing they only sequenced a portion of the possible cut sites, and so you ended up with a semi-random set of tags in one sample versus the other, with little overlap between them. If it was ddRAD or GBS, you also have to worry if they were not careful in the size distribution selection, since then one sample may end up with a bigger size range of fragments and a different set of loci selected.

Why was it paired-end sequenced? Tell me a little about the species, etc.

If a locus is sequenced at 3X, and it is diploid, then 25% of the time you'll only sequence one chromosome or the other, missing the heterozygosity. So you'll many times think it is homozygous for one allele in one sample and homozygous in the other allele in the other sample, when it is really heterozygous in both.

Thank you very much! Sorry for incomplete information provided. And i'm quite agree with you!
Species is rice. It is diploid. The genome is about 400Mb. We choosed paired-end RAD-sequencing method.As previously mentioned，sequencing depth is 3~4X, coverage is 8%.My goal is to find specific SNPs per sample.
Can you give me some suggestions?

**SNPsaurus** · 08-21-2013, 05:37 AM

If you got the amount of sequencing expected, then the experiment was designed poorly, since that amount of sequencing is guaranteed to give a bad outcome. If I am understanding you, only 8% of the sites are sequenced in a sample. The chance of having reads in both samples is then (.08 X .08 = 0.0064) or less than 1% of the sites will be sequenced in both samples. Then, the low sequencing coverage of 3X at the sites also guarantees that there will be many miscalling of the SNPs.

So, I don't see any way to rescue this experiment other than lots more sequencing. But it would probably be better to start over with a good design, unfortunately.

**fanwei** · 08-23-2013, 01:05 AM

Originally posted by SNPsaurus View Post

If you got the amount of sequencing expected, then the experiment was designed poorly, since that amount of sequencing is guaranteed to give a bad outcome. If I am understanding you, only 8% of the sites are sequenced in a sample. The chance of having reads in both samples is then (.08 X .08 = 0.0064) or less than 1% of the sites will be sequenced in both samples. Then, the low sequencing coverage of 3X at the sites also guarantees that there will be many miscalling of the SNPs.

So, I don't see any way to rescue this experiment other than lots more sequencing. But it would probably be better to start over with a good design, unfortunately.

Thank you very much! I'll redesign my work.

**SNPsaurus** · 08-25-2013, 08:01 PM

Originally posted by fanwei View Post

Thank you very much! Sorry for incomplete information provided. And i'm quite agree with you!
Species is rice. It is diploid. The genome is about 400Mb. We choosed paired-end RAD-sequencing method.As previously mentioned，sequencing depth is 3~4X, coverage is 8%.My goal is to find specific SNPs per sample.
Can you give me some suggestions?

You should probably sequence a number of samples in each variety to assay the full genetic diversity of each. If you are looking for SNPs specific to a sample, it is easy to be misled when looking at a small number of individuals.

Not knowing enough about your system, a typical approach would be to sequence around 100,000 loci at moderate depth (5X) for a large number of individuals (here at SNPsaurus we work in 96-well plate units). You'll get high-quality genotype calls for homozygous alleles, and can multiplex 190 individuals in a lane.

**fanwei** · 08-26-2013, 07:59 PM

Originally posted by SNPsaurus View Post

Hi fanwei,

You could run Stacks (http://creskolab.uoregon.edu/stacks/) for general RAD-Seq analysis, including how many RAD loci you have sequenced. If you have a reference genome, and are wondering how many of the in silico cut sites are present in your data, you could create a "RAD reference" of the cut sites + 100 bp adjacent DNA and align your reads against that.

hi, i'm trying to run Stacks, i have read manual downloaded from web,but also encounter problems. It seems complex. Are you familiar with that? Could you kindly help me how to run Stacks？

**SNPsaurus** · 08-26-2013, 11:06 PM

Sorry, we use our own analysis software for nextRAD. There is a user community at https://groups.google.com/forum/#!forum/stacks-users that might be able to help.

**fanwei** · 08-27-2013, 01:12 AM

Originally posted by SNPsaurus View Post

Sorry, we use our own analysis software for nextRAD. There is a user community at https://groups.google.com/forum/#!forum/stacks-users that might be able to help.

You are very nice! Thank you!

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 13 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

How to calculate RAD-Seq digestion sites?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News