Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • liaoyunshi
    replied
    Originally posted by jmartin View Post
    It looks like the variation between quasispecies is making it difficult for tadpole to accomplish what I need, which is a sort of 'central' consensus amongst all these quasispecies which can serve as an anchor reference for mapping between samples. Tadpole ends up building a number of overlapping contigs, as well as leaving some gaps in coverage where maybe the input data is too confusing (too many 'haplotypes' of varying abundances?).

    I think tadpole would be pretty nice as an assembler if I was working with homogenous samples, but for my usage case it may not be the right tool. I don't think its doing anything wrong since most people would probably want to keep the strains seperate. I just have an unusual task.
    Hi Martin,

    Sorry for leaving message in this old post. But I find I meet similar situation with you and want to see if you have new idea after 2 years.

    If my understanding is right, your sequencing data is not that "purified", it has somewhat high diversity/polymorphism though they have similar backbone. To deal with such situation, most assembler would separate contigs in those confusing sites, which leads to quite a lot of contigs instead of a "consensus" contig.

    Thus, may I know if you have found any tools can do this more forgiving assembly job?

    Also, I do have a reference seq from relative species in my case, but it will have some insertion or deletion different from the current sequencing one, so I think most reference mapping tool (e.g., BWA) can not be used for consensus as they do not care InDel information when generating consensus. I think my case is also similar with your concern of "reference guided assembly"? If so, could you give me some suggestion of such tools to help me get the consensus sequence?

    Thanks.

    Leave a comment:


  • jmartin
    replied
    It looks like the variation between quasispecies is making it difficult for tadpole to accomplish what I need, which is a sort of 'central' consensus amongst all these quasispecies which can serve as an anchor reference for mapping between samples. Tadpole ends up building a number of overlapping contigs, as well as leaving some gaps in coverage where maybe the input data is too confusing (too many 'haplotypes' of varying abundances?).

    I think tadpole would be pretty nice as an assembler if I was working with homogenous samples, but for my usage case it may not be the right tool. I don't think its doing anything wrong since most people would probably want to keep the strains seperate. I just have an unusual task.

    Leave a comment:


  • Brian Bushnell
    replied
    OK! Please let me know what settings you find to be optimal in your situation, and also whether Tadpole was better or worse than other assemblers.

    Leave a comment:


  • jmartin
    replied
    Thanks Brian, I'll try playing a bit more. I'll try using tadpole's error correction too in case it deals with cases that I haven't already corrected.

    Leave a comment:


  • Brian Bushnell
    replied
    Tadpole cannot d reference-guided assemblies - it is purely de-novo. And it's also rather unforgiving of polymorphisms, intentionally, to prevent misassemblies and assembly errors. However, you can often substantially increase the contiguity of viral assemblies by adjusting the branch multiplier flags - those tell it when to stop extending a contig because there is a branch in the graph, typically caused by a repeat or polymorphism. For example:

    bm1=8 bm2=2.5

    ...will often substantially increase contiguity. You can reduce them even more from the defaults (20 and 3, respectively) to find the optimum (setting them both at 1 will not yield an optimal result ). I developed the default cutoffs for bacteria so they're not really ideal for viruses, and in fact, I don't know if it's possible in general to find good defaults for viruses because they tend to be very different and mutate rapidly.

    It's also worth trying different kmer lengths. You can do this automatically with tadwrapper.sh. For example:

    tadwrapper.sh in=reads.fq out=contigs%.fa k=31,62,93,124 expand bisect

    That will try various kmer lengths and try to give you the optimal one for contiguity. It's not perfect, but you can just fire it off and ignore it until it finishes, which makes things easier. I developed it for bacterial isolates and metagenomes so I'm not entirely sure what it will do for viruses, but it's worth trying, and at least I expect it to produce a better value for K than the default of 31. 31 was chosen as default simply because it is the fastest and uses the least memory, not because it's the best. Normally, a larger value is better.

    You will often also get better continuity if you first error-correct the reads with Tadpole. For example:

    tadpole.sh in=reads.fq out=corrected.fq ecc k=62
    Last edited by Brian Bushnell; 05-18-2017, 05:59 PM.

    Leave a comment:


  • jmartin
    replied
    Thanks for the reply! I went and tried Tadpole and I'm trying various things to fine tune the assembly. One thing I'm wondering is if there is a way to do a reference guided assembly in Tadpole?

    Also, are there parameters you can suggest tweaking to try and be a bit more forgiving with regards to polymorphism in my input reads?

    Leave a comment:


  • Brian Bushnell
    replied
    BBMap's Tadpole (which I wrote) seems to do a good job of viral assembly for any coverage, both in my experience, and from what I've seen from others, so I suggest you give that a try. In some cases normalizing or subsampling the data can also improve assemblies, so that's worth trying as well. You already tried subsampling, but it's possible that a different tool would give different results. The BBMap package also includes BBNorm (which can normalize data) and Reformat (which can subsample the data); some assemblers simply cannot handle super-high coverage, so those operations can often make assemblers produce good assemblies from data that violates their heuristics.

    Also - you did not mention anything about preprocessing. That can be very useful prior to assembly - adapter-trimming, contaminant-filtering, quality-trimming, reagent DNA removal, human DNA removal, etc. It's possible that much of your assembly is contaminant rather than genomic content of the virus in question.
    Last edited by Brian Bushnell; 05-17-2017, 06:21 PM.

    Leave a comment:


  • Best way to build consensus of short reads spanning viral gene

    I have a collection of Illumina HiSeq 2000 reads that should span a specific coding region in a viral genome. The region these reads cover is 2625bp. What I want to do is generate a consensus of that region from all my reads.

    The only thing I've tried so far is IDBA_UD. I downsampled to ~100x and ran it, but the assembly contigs summed up much larger than the region I know these reads should span. I also tried using all the data, but that was even further off base.

    I have excessive coverage (~77000x), but the reads are from a population of quasi-species and have some variation. What would be the best tool to use to generate a consensus?

Latest Articles

Collapse

  • seqadmin
    Recent Advances in Sequencing Technologies
    by seqadmin



    Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

    Long-Read Sequencing
    Long-read sequencing has seen remarkable advancements,...
    12-02-2024, 01:49 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Today, 08:24 AM
0 responses
10 views
0 likes
Last Post seqadmin  
Started by seqadmin, Yesterday, 07:41 AM
0 responses
8 views
0 likes
Last Post seqadmin  
Started by seqadmin, 12-11-2024, 07:45 AM
0 responses
13 views
0 likes
Last Post seqadmin  
Started by seqadmin, 12-10-2024, 07:59 AM
0 responses
14 views
0 likes
Last Post seqadmin  
Working...
X