Seqanswers Leaderboard Ad

Gislaine · 06-16-2016, 05:39 AM

Originally posted by maxbangs View Post

Thank you for your comments nucacidhunter and SNPsaurus.
I guess I should explain my math. I was trying to calculated the number of expected loci for a given size selection (300+/-36) so I calculated the proportion of DNA in the size selection (.47ng/19.4ng) and then multiple by the size of the genome (2.42Gbps) to get the number of bps with in the selection.

Hi!

Please, I would like to know if these concentration values of ng/ul are from real data, because I am doing ddRAD and I found similar concentration. But I didn't do qPCR yet to see if the final concentration is good for sequencing.

Cheers

nucacidhunter · 05-24-2014, 03:17 PM

In GBS methods (single or double digest) one can recognise the presence of repeat region or organelle fragments’ amplicons in prepped library and not to proceed to sequencing. I believe this is one aspect that they look at the establishment phase for new species. Users often utilise only 50% of their GBS reads because of low coverage and low number of common loci among samples but they still are happy getting around 1K polymorphic loci from their data. Some users repeat their samples in their submission to increase coverage. Obviously, number of useful fragments depends on study aim, population type, existence of reference genome and other factors. ddRAD as it has been described leaves sampling repeat region to chance.

RAD-Seq is most comprehensive and probably more consistent version of the methods, but library prep costs can be prohibitive and can exceed sequencing costs.

SNPsaurus · 05-23-2014, 09:59 PM

nucacidhunter, I have often wondered about the repeats in a particular size range issue. When my lab was working on RAD-Seq, one of the reasons we liked having one side of the RAD tag be sheared is that we saw talks from people doing RRL and how they spent so much time checking different size ranges for repeat content... it took longer to decide on a size range than to do the actual experiment. But I don't hear about that from ddRAD or GBS talks. Is it just that sequencing has gotten cheaper and it isn't worth getting fussed over losing 25M reads to repeats?

Good point about the Y-adapters as well. Without a good reference genome it is hard to feel that confident that the number of sites will translate from a related genome anyway, so sometimes a person has to just plunge ahead and try it!

nucacidhunter · 05-23-2014, 07:12 PM

Based on my experience I would advise against using gDNA ScreenTape for sizing your fragments. It is dependent on load and serial dilution of the same sample (within the range specified for them) will give different size outcomes. In addition, the approach that you are taking (estimating fragment number based on digestion result) may hinder successful library prep in some occasions. If the size window that you are selecting comprises fragments from repeat regions and organelles you may not have many useful SNPs to call.

So you may be wondering way I am doing this and not just going with the in silico estimates. Most of the organisms that I work with do not have a reference genome for any species closely related. Thus, I want to develop a system to estimate the number of fragments completely de novo.

The issue with in silico approach is that during actual size selection with Pippin, eluted fragments will be different from set point because one end of fragments has a Y shape adapter and that affects the migration speed. Other issue is that Pippin size selection is also load dependent and one can expect different results based on DNA amount loaded on them.

SNPsaurus · 05-23-2014, 03:19 PM

I would still worry about MspI-MspI fragments. That's 4 nucs that are GC, in a genome that is 38% GC, so you'll get a MspI site every 700 bp or so. If there is a 699/700 chance of not getting that site at some particular nuc, then in a 72 bp region there is a 10% chance of seeing a site (so if you have a MspI site somewhere, then 10% of those will have another MspI site 300 bp away). The same logic applies to MspI sites near EcoRI or PstI sites. But there are 10X more MspI sites to start with than PstI, so you need to divide by more than 2. EcoRI is better (4 fold more MspI sites).

maxbangs · 05-22-2014, 10:45 AM

TapeStation results

Thank you for your comments nucacidhunter and SNPsaurus.
I guess I should explain my math. I was trying to calculated the number of expected loci for a given size selection (300+/-36) so I calculated the proportion of DNA in the size selection (.47ng/19.4ng) and then multiple by the size of the genome (2.42Gbps) to get the number of bps with in the selection. I then divided by the average size of the fragment (300bps) to get the number of fragments within the size selection. Finally I divide by 2 to get the number of loci, since the organism is diploid. However as nucacidhunter pointed out I forgot to take into account that there are fragments that are not sequence-able (e.g. EcoRI-EcoRI and MspI-MspI). To account for this I am just going to divide by 2 (thus assuming ~50% of the fragments are sequence-able).

242000000bps * 0.47ng / 19.4ng / 300bps / 2 / 2 = 4885 loci

I know it is a big assumption that 50% of the fragments are sequence-able, but after running some simulations this seems to work as long as you are not selecting any place were the slope of the distribution of fragments is high (this includes the slope for all three fragment types). Thus if I go with a size selection around 300 I get a fairly consistent estimate that matches my in silico estimates. However if I do the some for a size selection around 224+/-36 I get a much more erratic result.

So you may be wondering way I am doing this and not just going with the in silico estimates. Most of the organisms that I work with do not have a reference genome for any species closely related. Thus, I want to develop a system to estimate the number of fragments completely de novo.

As per request I tried to attached the TapeStation results. However the .doc file and the raw result file is too each too large. If you want the results to look at just send me an email at [email protected].

If you do want the results there are two files 1) .doc file of results given to me by the facility and 2) the raw data from the TapeStation. You may noticed that there is a warning for some samples that the concentration is too low. I thought they wanted the concentrations to be between 1ng/ul - 50ng/ul (as per normal D1000 DNA tapes) but since we were using the genomic DNA tape the concentrations were supposed to be >20ng/ul. They still ran fine and the total concentration from the TapeStation matches that of the Qubit. If you want to play with the raw data you can download the program for free. If anyone wants to know the total contractions from the Qubit or want to know what the genome sizes of the fish (four species) we are used or the different RE combinations we are used (four combinations) just let me know. This message is already way to long.

Hope this is hopeful and thank you for the fast responses.

SNPsaurus · 05-21-2014, 06:28 PM

Good point! I was thinking that EcoRI would cut every 3kb so the problem would be negligible, but there will be plenty of MspI - MspI fragments in the 300 bp range that will not become ddRAD loci. The PstI - EcoRI double digest should be more accurate, but I guess even there for every Pst-EcoRI fragment you'll have an equal number of PstI-PstI fragments and equal number of EcoRI-EcoRI fragments in the mix. Actually, many more of the EcoRI-EcoRI fragments given the GC content.

nucacidhunter · 05-21-2014, 04:37 PM

If I wanted to figure out the number of fragments for ddRAD in which the organisms genome size is ~2.5Gbps and the concentration of the whole sample is 19.6ng/ul and the size selection of 300+/- 36 bps yields 0.47ng/ul, then does the following math work at.

2,500,000,000bps * .47ng / 19.6ng /300bps / 2 = 99,915 loci

It seems to me that you have calculated the number of fragments with average size of 300+_XX bp resulting from your digestion. The sequence-able portion of that number will be the fragments that flanked by restriction site (overhang) of both enzymes. Fragments that are purely flanked with one of enzymes site will not contribute to your library or the reads you will obtain from sequencing.
Peterson et al has described a protocol for estimating number of sequence-able fragments in their supplementary material. Although they have not used it or described it in their paper and how that correlates with real data. I have great doubts about practicality or accuracy of their described method. I will post my reasons in detail later. In the meantime I am very interested to see you TapeStation results for your digests and your rational for the way you have estimated fragment numbers in your target range.

luc · 05-21-2014, 01:32 PM

Originally posted by maxbangs View Post

I did have another question. Once we figure out the enzyme combination we are going to order the adaptors/primers from the Peterson et al. (2011) protocol. Does anyone have suggestions on where to order the oligos and if there is anything special we need to do with them?

It is very convenient if the manufacturer provides you the oligos plated and already normalized to a standard concentration. I believe most manufacturers will do that (LifeTechnologies, IDT, Bioneer, ... ) but that would save a ton of work.

SNPsaurus · 05-21-2014, 12:41 PM

It might just be genome variation. Here are (rough) estimates of PstI numbers in zebrafish:
[genomes]$ cat Daniozv9.fa | tr -d "\n" | grep -io -E "CTGCAG" | wc -l
90066
and stickleback:
[genomes]$ cat Gasterosteus_aculeatus.BROADS1.61.dna_rm.toplevel.fa | tr -d "\n" | grep -io -E "CTGCAG" | wc -l
141198

More sites in stickleback, even though it has a genome size of 450Mbp and zebrafish is 1.5Gbp! Stickleback has a slightly higher GC content (42% vs 38%) but that is not enough to explain the difference. So I would trust the empirical data (Tapestation) more than an in silico search of a related genome, as long as you trust your Tapestation to give accurate results.

maxbangs · 05-21-2014, 11:50 AM

We do not have the genome for the fish I am looking at but it is a close(ish) relative to Danio which has a published genome. The one problem is that the fish I am looking at has had a genome duplication since its split with Danio. So we are just doubling the in silico estimates for Danio and hoping it is at least a rough ball park estimate. Estimates also fit with the estimates from Peterson.

Any idea on why we are so far off? Also we ran the following RE combinations and got the following estimates for number of loci:
EcoRI x PstI = 41,667 loci
EcoRI x MspI = 99,915 loci
PstI x MspI = 118,912 loci
PstI x HpyC4IV = 131,340 loci

Based on this I was thinking about going with EcoRI x PstI so that I do not have too narrow of a size selection, but it seems odd to go with two six base cutters.

SNPsaurus · 05-21-2014, 11:31 AM

That makes sense (your calculation). There's some minor change in fragment number on the small side of the average compared to the large side, but with a tight selection it won't matter so much. Did you have an actual genome reference for the in silico digest?

maxbangs · 05-21-2014, 11:10 AM

TapeStation results

So, I ran the samples on the TapeStation and am trying to figure out the number of loci per size selection. (I can post my results later once I figure out my size selection).

If I wanted to figure out the number of fragments for ddRAD in which the organisms genome size is ~2.5Gbps and the concentration of the whole sample is 19.6ng/ul and the size selection of 300+/- 36 bps yields 0.47ng/ul, then does the following math work at.

2,500,000,000bps * .47ng / 19.6ng /300bps / 2 = 99,915 loci

Thus if I do a size selection on the Pippin Prep of 376+/- 36bps (added 76bps for adapters) then I should get ~100,000loci. Is this right? If so then why is this more then double what I expected based on Peterson's estimates and in silico size selection? This is EcoRI by MspI by the way, I also have results for for other combinations across a few different species.

maxbangs · 05-17-2014, 07:44 AM

Adapters

Thank you for you quick response, we are testing some enzymes now (PstI and EcoRI in combination with MspI and HpyCh4V). They seem to be working well and we are going to run them on a TapeStation next week.

I did have another question. Once we figure out the enzyme combination we are going to order the adaptors/primers from the Peterson et al. (2011) protocol. Does anyone have suggestions on where to order the oligos and if there is anything special we need to do with them?

maxbangs · 05-09-2014, 11:11 AM

Thank you for your response. It was very useful.

Do you have any suggestions on a balanced (AT/GC) 4-base cutter. I was thinking of testing PstI and EcoRI with MspI, but MspI is all GC. Is this a problem?

Topics	Statistics	Last Post
AI Tool Creates High-Resolution 3D Maps of the Mouse Brain by seqadmin Started by seqadmin, 03-20-2025, 05:03 AM	0 responses 49 views 0 reactions	Last Post by seqadmin 03-20-2025, 05:03 AM
Studying Microbial Gene Transfer with RNA Barcoding by seqadmin Started by seqadmin, 03-19-2025, 07:27 AM	0 responses 57 views 0 reactions	Last Post by seqadmin 03-19-2025, 07:27 AM
Mapping the snoRNAome in Zebrafish to Advance Disease Research by seqadmin Started by seqadmin, 03-18-2025, 12:50 PM	0 responses 50 views 0 reactions	Last Post by seqadmin 03-18-2025, 12:50 PM
TIGR Systems Offer a Compact Alternative to CRISPR for Gene Editing by seqadmin Started by seqadmin, 03-03-2025, 01:15 PM	0 responses 201 views 0 reactions	Last Post by seqadmin 03-03-2025, 01:15 PM

Seqanswers Leaderboard Ad

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News