Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • rskr
    replied
    Originally posted by ajthomas View Post
    However, if you're looking for splice variants and don't care so much to quantify expression, 454 is probably a better technology. There will always be a niche for 454, although it's never going to be large.
    Possible statistical fallacy there, since it doesn't sound like you have actually done any Illumina assemblies(generalizing from your one 454 machine). As it turns out paired end data is pretty good at finding splice variants, which I wasn't expecting, but we did a transcriptome assembly then ran PASA on it, and it found plenty of legitimate splice variants(with high alignment coverage). There again the length of the read doesn't matter what matters is the insert size during PCR. I guess what annoys me is when scientists do 454 transcriptome assemblies, then try to correct errors with Illumina data. When the Illumina paired end data does a superior job of transcriptome assembly in the first place. I have had this happen a number of times and compared the builds(454 vs. Illumina vs. 454+Illumina),and 454 was not as good, even had collaborators remark. I wish it weren't this way, but 454 really isn't competitive at that. The problem is to get the raw number of transcripts up, there end up being very many contigs that consist only of one 454 read, and well, since 454 reads aren't very accurate by and of themselves, it turns out to be a fairly inaccurate assembly. Newbler does use a bunch of isotigs, which essentially amounts to adding a bunch of duplicates of highly covered genes back to the assembly, I wish they wouldn't do this, to pad their N50 but they do. I don't think every isotig is a legitimate splice variant as a bifurcation in the graph that isn't resolved, so if you do some sort of clustering analysis, Newbler actually turns out to generate many fewer contigs than similar Illumina based assemblies.

    Leave a comment:


  • ajthomas
    replied
    Originally posted by pmiguel View Post
    Hard to estimate total number of miscalls in a read from a mean quality value. But Q30 is one error per 1000 bases. So, as long as you don't have crazy high quality values off setting really low values, then I would not expect several errors in a 200-400bp read.

    Also, to the extent that the quality values are accurate, software could use them to weight the likelihood of a given base being a true variation or not. Or, trivially, you could mask out bases that had quality values lower than 30.

    Let's not ignore the elephant here: Illumina is producing 100's of gigabases of sequence per flow cell whereas a 454 run produces 100's of megabases. Illumina chemistry has a higher per run cost than 454, but we are still looking at something approaching a 100x price per base differential.

    But the same logic applies to Sanger sequencing, which is at least 100x more expensive per base.

    --
    Phillip
    Actually, in my experience it does seem to be that most of the errors are concentrated in a few of the reads. I can't explain why that is, but nevertheless that seems to be the case. So, most of the reads are perfect and a few are riddled with errors. It's not usually difficult to find those with the errors and discard them, either.

    It's true that 454 is less cost effective than Illumina. Most applications can use the shorter read lengths obtained from Illumina/Solid, etc., and for those applications it makes a lot more sense to use those technologies. One thing to keep in mind, however, when comparing the amount of data produced--454 doesn't produce as much data, but in many cases, you don't need as much data with 454, either. Simply comparing numbers doesn't tell the whole story. RNA-seq provides an excellent example of where one technology might be better than the other, depending on your experiment. If you're trying to quantify gene expression, Illumina is definitely the way to. In that case, you're just trying to identify transcripts and count them. The high number of reads is a boon to your experiment. However, if you're looking for splice variants and don't care so much to quantify expression, 454 is probably a better technology. There will always be a niche for 454, although it's never going to be large.

    Leave a comment:


  • pmiguel
    replied
    Originally posted by rskr View Post
    Well except 454, would still fail to find linkage over 400bp consistently since the median read is much less, and with sufficient coverage paired end data is likely to find linkage up to 800bp which is maximum length PCR product.
    Hmm, I think you may just be trolling here.
    If all goes well, a 454 run will have median read lengths >400 bases.
    --
    Phillip

    Leave a comment:


  • pmiguel
    replied
    Originally posted by rskr View Post
    And you can do that with the 454 error model? Last I checked a mean quality of 30 would guarantee several errors in a 200-400bp read? Might be better off With a 250 base insert size and 150 bp paired end reads, with an overlapper that finds the intersection.
    Hard to estimate total number of miscalls in a read from a mean quality value. But Q30 is one error per 1000 bases. So, as long as you don't have crazy high quality values off setting really low values, then I would not expect several errors in a 200-400bp read.

    Also, to the extent that the quality values are accurate, software could use them to weight the likelihood of a given base being a true variation or not. Or, trivially, you could mask out bases that had quality values lower than 30.

    Let's not ignore the elephant here: Illumina is producing 100's of gigabases of sequence per flow cell whereas a 454 run produces 100's of megabases. Illumina chemistry has a higher per run cost than 454, but we are still looking at something approaching a 100x price per base differential.

    But the same logic applies to Sanger sequencing, which is at least 100x more expensive per base.

    --
    Phillip

    Leave a comment:


  • rskr
    replied
    Originally posted by rskr View Post
    So, what is the difference between that and looking at a pileup of paired end Illumina reads? They will get the linkage just as well.
    Well except 454, would still fail to find linkage over 400bp consistently since the median read is much less, and with sufficient coverage paired end data is likely to find linkage up to 800bp which is maximum length PCR product.

    Leave a comment:


  • rskr
    replied
    Originally posted by ajthomas View Post
    Does the fact that I use the 454 offend you or something? I explained why I use it and why other technologies aren't appropriate for my work and you seem to think I'm an idiot for using it. I'm a little confused at your derision.

    By the way, I don't look at individual reads, I look at consensus reads (usually 10-100X coverage per variant).
    So, what is the difference between that and looking at a pileup of paired end Illumina reads? They will get the linkage just as well.

    Leave a comment:


  • ECO
    replied
    Originally posted by rskr View Post
    Certainly you have some will to analyze erroneous data. If you cared you would have seen that 454 is only .9999% accurate with deep coverage, but you are talking about analyzing individual reads looking for variants, which suggests a different type of of "work for you", than I would find acceptable, but hey you are probably an MD analyzing a major histocompatibility complex, so you can get away with saying anything you want, because you hate statistics.
    Arguments + facts + opinions please. Save the insults. Thanks.

    Leave a comment:


  • ajthomas
    replied
    Does the fact that I use the 454 offend you or something? I explained why I use it and why other technologies aren't appropriate for my work and you seem to think I'm an idiot for using it. I'm a little confused at your derision.

    By the way, I don't look at individual reads, I look at consensus reads (usually 10-100X coverage per variant).

    Leave a comment:


  • rskr
    replied
    Originally posted by ajthomas View Post
    I'm not trying to argue with you, but I've looked at the options and the 454 is the only one that works for my application. I can't get 400bp reads any other way, and one of my amplicons is nearly that long.
    Certainly you have some will to analyze erroneous data. If you cared you would have seen that 454 is only .9999% accurate with deep coverage, but you are talking about analyzing individual reads looking for variants, which suggests a different type of of "work for you", than I would find acceptable, but hey you are probably an MD analyzing a major histocompatibility complex, so you can get away with saying anything you want, because you hate statistics.

    Leave a comment:


  • ajthomas
    replied
    I'm not trying to argue with you, but I've looked at the options and the 454 is the only one that works for my application. I can't get 400bp reads any other way, and one of my amplicons is nearly that long.

    Leave a comment:


  • rskr
    replied
    Originally posted by ajthomas View Post
    without having both ends of the amplicon on the same read.
    '''''what paired end is

    Leave a comment:


  • ajthomas
    replied
    It works just fine. Perhaps the accuracy is better than you think. I don't see nearly as many errors as you imply here. And no, I can't work with shorter reads that must be overlapped. I had to do that before switching from the older standard chemistry to the Titanium chemistry. I ended up with a number of allele misidentifications because of it. Some of the alleles are just too similar and can't be reliably identified without a full-length sequence. Say you have four different alleles: A and B differ from C and D by one base near the 5' end. A and C differ from B and D by one base near the 3' end. Any given sample may have any combination of the four (there are ~10 loci in the genome, so ~20 alleles present in a heterozygote). You can't differentiate these four alleles without having both ends of the amplicon on the same read.

    Leave a comment:


  • rskr
    replied
    Originally posted by ajthomas View Post
    I'm using it primarily for genotyping highly polymorphic genes (MHC if you must know), sequencing amplicons of 200-400bp long. Because some alleles only differ by one or two bases that may be at one end or the other of the amplicon, reads that are not full length cannot always differentiate some closely-related alleles. I must have full-length reads of my amplicons which I cannot get from any NGS technology except 454.
    And you can do that with the 454 error model? Last I checked a mean quality of 30 would guarantee several errors in a 200-400bp read? Might be better off With a 250 base insert size and 150 bp paired end reads, with an overlapper that finds the intersection.

    Leave a comment:


  • ajthomas
    replied
    I'm using it primarily for genotyping highly polymorphic genes (MHC if you must know), sequencing amplicons of 200-400bp long. Because some alleles only differ by one or two bases that may be at one end or the other of the amplicon, reads that are not full length cannot always differentiate some closely-related alleles. I must have full-length reads of my amplicons which I cannot get from any NGS technology except 454.

    Leave a comment:


  • rskr
    replied
    Originally posted by ajthomas View Post
    In spite of the short read technologies getting longer, their read lengths still can't compare to that achieved on the 454. Of course, those longer "short" reads also means the number of applications where the 454 is required is shrinking. In my own case, I must have at least 400bp reads, and I'm excited about the longer reads of the FLX+ because that opens up the way for some other experiments I couldn't do before. It will be a while (maybe a long while) before the short read technologies can do what I need.
    Oh do tell.

    I find 100 base paired end data do subsume just about any benefits gained from error prone reads with a mean of 400 but a median of 200.

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    04-22-2024, 07:01 AM
  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 04-25-2024, 11:49 AM
0 responses
19 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-24-2024, 08:47 AM
0 responses
18 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
62 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
60 views
0 likes
Last Post seqadmin  
Working...
X