Fearing that 454 failed to compete.

rskr replied

08-26-2011, 07:42 AM
Originally posted by ajthomas View Post

However, if you're looking for splice variants and don't care so much to quantify expression, 454 is probably a better technology. There will always be a niche for 454, although it's never going to be large.

Possible statistical fallacy there, since it doesn't sound like you have actually done any Illumina assemblies(generalizing from your one 454 machine). As it turns out paired end data is pretty good at finding splice variants, which I wasn't expecting, but we did a transcriptome assembly then ran PASA on it, and it found plenty of legitimate splice variants(with high alignment coverage). There again the length of the read doesn't matter what matters is the insert size during PCR. I guess what annoys me is when scientists do 454 transcriptome assemblies, then try to correct errors with Illumina data. When the Illumina paired end data does a superior job of transcriptome assembly in the first place. I have had this happen a number of times and compared the builds(454 vs. Illumina vs. 454+Illumina),and 454 was not as good, even had collaborators remark. I wish it weren't this way, but 454 really isn't competitive at that. The problem is to get the raw number of transcripts up, there end up being very many contigs that consist only of one 454 read, and well, since 454 reads aren't very accurate by and of themselves, it turns out to be a fairly inaccurate assembly. Newbler does use a bunch of isotigs, which essentially amounts to adding a bunch of duplicates of highly covered genes back to the assembly, I wish they wouldn't do this, to pad their N50 but they do. I don't think every isotig is a legitimate splice variant as a bifurcation in the graph that isn't resolved, so if you do some sort of clustering analysis, Newbler actually turns out to generate many fewer contigs than similar Illumina based assemblies.
Leave a comment:
ajthomas replied

08-26-2011, 05:26 AM
Originally posted by pmiguel View Post

Hard to estimate total number of miscalls in a read from a mean quality value. But Q30 is one error per 1000 bases. So, as long as you don't have crazy high quality values off setting really low values, then I would not expect several errors in a 200-400bp read.

Also, to the extent that the quality values are accurate, software could use them to weight the likelihood of a given base being a true variation or not. Or, trivially, you could mask out bases that had quality values lower than 30.

Let's not ignore the elephant here: Illumina is producing 100's of gigabases of sequence per flow cell whereas a 454 run produces 100's of megabases. Illumina chemistry has a higher per run cost than 454, but we are still looking at something approaching a 100x price per base differential.

But the same logic applies to Sanger sequencing, which is at least 100x more expensive per base.

--
Phillip

Actually, in my experience it does seem to be that most of the errors are concentrated in a few of the reads. I can't explain why that is, but nevertheless that seems to be the case. So, most of the reads are perfect and a few are riddled with errors. It's not usually difficult to find those with the errors and discard them, either.

It's true that 454 is less cost effective than Illumina. Most applications can use the shorter read lengths obtained from Illumina/Solid, etc., and for those applications it makes a lot more sense to use those technologies. One thing to keep in mind, however, when comparing the amount of data produced--454 doesn't produce as much data, but in many cases, you don't need as much data with 454, either. Simply comparing numbers doesn't tell the whole story. RNA-seq provides an excellent example of where one technology might be better than the other, depending on your experiment. If you're trying to quantify gene expression, Illumina is definitely the way to. In that case, you're just trying to identify transcripts and count them. The high number of reads is a boon to your experiment. However, if you're looking for splice variants and don't care so much to quantify expression, 454 is probably a better technology. There will always be a niche for 454, although it's never going to be large.
Leave a comment:
pmiguel replied

08-26-2011, 04:12 AM
Originally posted by rskr View Post

Well except 454, would still fail to find linkage over 400bp consistently since the median read is much less, and with sufficient coverage paired end data is likely to find linkage up to 800bp which is maximum length PCR product.

Hmm, I think you may just be trolling here.
If all goes well, a 454 run will have median read lengths >400 bases.
--
Phillip
Leave a comment:
pmiguel replied

08-26-2011, 04:09 AM
Originally posted by rskr View Post

And you can do that with the 454 error model? Last I checked a mean quality of 30 would guarantee several errors in a 200-400bp read? Might be better off With a 250 base insert size and 150 bp paired end reads, with an overlapper that finds the intersection.

Hard to estimate total number of miscalls in a read from a mean quality value. But Q30 is one error per 1000 bases. So, as long as you don't have crazy high quality values off setting really low values, then I would not expect several errors in a 200-400bp read.

Also, to the extent that the quality values are accurate, software could use them to weight the likelihood of a given base being a true variation or not. Or, trivially, you could mask out bases that had quality values lower than 30.

Let's not ignore the elephant here: Illumina is producing 100's of gigabases of sequence per flow cell whereas a 454 run produces 100's of megabases. Illumina chemistry has a higher per run cost than 454, but we are still looking at something approaching a 100x price per base differential.

But the same logic applies to Sanger sequencing, which is at least 100x more expensive per base.

--
Phillip
Leave a comment:
rskr replied

08-25-2011, 05:06 PM
Originally posted by rskr View Post

So, what is the difference between that and looking at a pileup of paired end Illumina reads? They will get the linkage just as well.

Well except 454, would still fail to find linkage over 400bp consistently since the median read is much less, and with sufficient coverage paired end data is likely to find linkage up to 800bp which is maximum length PCR product.
Leave a comment:
rskr replied

08-25-2011, 04:25 PM
Originally posted by ajthomas View Post

Does the fact that I use the 454 offend you or something? I explained why I use it and why other technologies aren't appropriate for my work and you seem to think I'm an idiot for using it. I'm a little confused at your derision.

By the way, I don't look at individual reads, I look at consensus reads (usually 10-100X coverage per variant).

So, what is the difference between that and looking at a pileup of paired end Illumina reads? They will get the linkage just as well.
Leave a comment:
ECO replied

08-25-2011, 04:06 PM
Originally posted by rskr View Post

Certainly you have some will to analyze erroneous data. If you cared you would have seen that 454 is only .9999% accurate with deep coverage, but you are talking about analyzing individual reads looking for variants, which suggests a different type of of "work for you", than I would find acceptable, but hey you are probably an MD analyzing a major histocompatibility complex, so you can get away with saying anything you want, because you hate statistics.

Arguments + facts + opinions please. Save the insults. Thanks.
Leave a comment:
ajthomas replied

08-25-2011, 04:02 PM
Does the fact that I use the 454 offend you or something? I explained why I use it and why other technologies aren't appropriate for my work and you seem to think I'm an idiot for using it. I'm a little confused at your derision.

By the way, I don't look at individual reads, I look at consensus reads (usually 10-100X coverage per variant).
Leave a comment:
rskr replied

08-25-2011, 02:44 PM
Originally posted by ajthomas View Post

I'm not trying to argue with you, but I've looked at the options and the 454 is the only one that works for my application. I can't get 400bp reads any other way, and one of my amplicons is nearly that long.

Certainly you have some will to analyze erroneous data. If you cared you would have seen that 454 is only .9999% accurate with deep coverage, but you are talking about analyzing individual reads looking for variants, which suggests a different type of of "work for you", than I would find acceptable, but hey you are probably an MD analyzing a major histocompatibility complex, so you can get away with saying anything you want, because you hate statistics.
Leave a comment:
ajthomas replied

08-25-2011, 02:29 PM
I'm not trying to argue with you, but I've looked at the options and the 454 is the only one that works for my application. I can't get 400bp reads any other way, and one of my amplicons is nearly that long.
Leave a comment:
rskr replied

08-25-2011, 02:18 PM
Originally posted by ajthomas View Post

without having both ends of the amplicon on the same read.

'''''what paired end is
Leave a comment:
ajthomas replied

08-25-2011, 02:11 PM
It works just fine. Perhaps the accuracy is better than you think. I don't see nearly as many errors as you imply here. And no, I can't work with shorter reads that must be overlapped. I had to do that before switching from the older standard chemistry to the Titanium chemistry. I ended up with a number of allele misidentifications because of it. Some of the alleles are just too similar and can't be reliably identified without a full-length sequence. Say you have four different alleles: A and B differ from C and D by one base near the 5' end. A and C differ from B and D by one base near the 3' end. Any given sample may have any combination of the four (there are ~10 loci in the genome, so ~20 alleles present in a heterozygote). You can't differentiate these four alleles without having both ends of the amplicon on the same read.
Leave a comment:
rskr replied

08-25-2011, 01:38 PM
Originally posted by ajthomas View Post

I'm using it primarily for genotyping highly polymorphic genes (MHC if you must know), sequencing amplicons of 200-400bp long. Because some alleles only differ by one or two bases that may be at one end or the other of the amplicon, reads that are not full length cannot always differentiate some closely-related alleles. I must have full-length reads of my amplicons which I cannot get from any NGS technology except 454.

And you can do that with the 454 error model? Last I checked a mean quality of 30 would guarantee several errors in a 200-400bp read? Might be better off With a 250 base insert size and 150 bp paired end reads, with an overlapper that finds the intersection.
Leave a comment:
ajthomas replied

08-25-2011, 12:55 PM
I'm using it primarily for genotyping highly polymorphic genes (MHC if you must know), sequencing amplicons of 200-400bp long. Because some alleles only differ by one or two bases that may be at one end or the other of the amplicon, reads that are not full length cannot always differentiate some closely-related alleles. I must have full-length reads of my amplicons which I cannot get from any NGS technology except 454.
Leave a comment:
rskr replied

08-25-2011, 12:48 PM
Originally posted by ajthomas View Post

In spite of the short read technologies getting longer, their read lengths still can't compare to that achieved on the 454. Of course, those longer "short" reads also means the number of applications where the 454 is required is shrinking. In my own case, I must have at least 400bp reads, and I'm excited about the longer reads of the FLX+ because that opens up the way for some other experiments I couldn't do before. It will be a while (maybe a long while) before the short read technologies can do what I need.

Oh do tell.

I find 100 base paired end data do subsume just about any benefits gained from error prone reads with a mean of 400 but a median of 200.
Leave a comment:

Previous 1 2 template Next

Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
04-22-2024, 07:01 AM
Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 18 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News