Originally posted by GerryB
View Post
f you've made a big sequencing run, how do you move the data around? Isn't it tens of gigabytes for a reasonably complex run? Everyone just uses FTP over their fiber net connections?
Do people tend to try to compress these big lists of sequences using standard tools like gzip or 7zip? Or is it not worth it?
In assembly/matching what are typical error rates for single bp reads? (not counting bad ends.) is it like 0.1%? 1%?
When you do get single bp errors, does each sequencing strategy have its own error behavior? Maybe some error matrix that says for this machine, C is sometimes misread as T with a probability of X, C is sometimes misread as G with a probability of Y...
How common are gaps in sequence reads? Are the gaps totally random, like from two totally different parts of the DNA strand, or are they just small slips like somehow 10bp are just missing?
In library preparation, it seems unlikely as well that two fragments get joined.
However, it is well possible that a large chunk of sequence is really missing in your cells that is present in your reference genome, or something is present in your cells and missing in the reference. These so-called "structural variations" are currently a hot topic of research.
Is mixed source DNA ever deliberately sampled? Something like taking samples of gut bacteria and analyzing the mix of random sequences to estimate the diversity of the flora?
Can sequence sampling be guided at all, or is it truly a random sample from the whole genome?
Can you try to just analyze one chromosome somehow from the very start?
Cheers
Simon
Leave a comment: