Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • pmiguel
    replied
    Originally posted by sklages View Post
    Just to give some other recommendation .. as maubp has already mentioned, I'd give MIRA3 a try. It is (usually) doing a good job on this size of genome. It also handles repetitive sequence quite effective.

    In the 1-100Mb class we successfully use Celera Assembler (currently v6beta), results can be converted to ACE/CAF ([1], still "beta").

    Just my 2p,
    Sven

    [1]=https://sourceforge.net/projects/asm2ace/
    I finally did try MIRA3. Yes, I like the results. 37 contigs in one main scaffold, 10 of the contigs larger than 1 kb. When I look at the .ace file with consed, the contig breaks seem reasonable. That is, they are in regions with a fairly complex repeat structure.

    --
    Phillip

    Leave a comment:


  • pmiguel
    replied
    As a postscript. The bacterial genome I was assembling, after examination and merging inside consed dropped from 52 contigs in a scaffold down to a single contig.

    Still not sure what Newbler was doing breaking the contigs down like that.

    --
    Phillip

    Leave a comment:


  • sklages
    replied
    Just to give some other recommendation .. as maubp has already mentioned, I'd give MIRA3 a try. It is (usually) doing a good job on this size of genome. It also handles repetitive sequence quite effective.

    In the 1-100Mb class we successfully use Celera Assembler (currently v6beta), results can be converted to ACE/CAF ([1], still "beta").

    Just my 2p,
    Sven

    [1]=https://sourceforge.net/projects/asm2ace/

    Leave a comment:


  • pmiguel
    replied
    Originally posted by nickloman View Post
    What I find very helpful is the 454ContigGraph.txt file which should give you the information you are looking for. However, unhelpfully this is not produced by default by Newbler. Run Newbler with the -g option set on to get this file. This will give you a contig graph which will allow you to make joins between contigs.

    However, you must be aware that contigs that are not included in scaffolds are likely to be "repeat consensus" contigs, so they may not reflect a true biological sequence but a consensus of two or more repetitive regions.

    You'd want to do confirmatory Sanger sequencing on these contigs to be sure you get the right sequence.

    Hope that helps
    With v. 2.3 "-g" is deprecated because the ContigGraph.txt file is created by default.

    The issue I was describing--where Newbler appeared to be censoring the ends of contigs was more bizarre than I initially thought. Near as I can tell this censorship hides reads spanning contig junctions. That is, the contigs should be joined on the basis of reads spanning the pseudo-gap, but for a mysterious reason described below, a 0 base gap is created in the assembly.

    The reason? These breaks apparently denote multiple branch points. Got that? Contiguous reads join a stretch of sequence, but newbler segments the stretch into multiple contigs because, if the reads were not contiguous, there would be multiple possible branch points there? Using a method I describe below I am able to visualize the "suppressed" overlapping sequence in Assembly View of Consed, using "What to Show" "Run cross_match". Very few of them appear to branch at all--there is only one place they match to--the contig adjacent to them!

    How to see the suppressed sequence that shows that Newber mistakenly failed to join adjacent contigs?

    Check the "reads limited to one Contig" box of the Parameters:Output pane of the GS de novo assembler GUI.

    --
    Phillip

    Leave a comment:


  • westerman
    replied
    Going along with Phillip's questions (since we are working on the same project), I am having problems running the Newbler assembly through amosvalidate. I get to the point (step 600+) where amosvalidate is looking at the singletons. nucmer/mummer then crashes because the singletons file is empty.

    Has anyone run a newbler ace file through amosvalidate?

    Thanks,

    Leave a comment:


  • nickloman
    replied
    What I find very helpful is the 454ContigGraph.txt file which should give you the information you are looking for. However, unhelpfully this is not produced by default by Newbler. Run Newbler with the -g option set on to get this file. This will give you a contig graph which will allow you to make joins between contigs.

    However, you must be aware that contigs that are not included in scaffolds are likely to be "repeat consensus" contigs, so they may not reflect a true biological sequence but a consensus of two or more repetitive regions.

    You'd want to do confirmatory Sanger sequencing on these contigs to be sure you get the right sequence.

    Hope that helps

    Leave a comment:


  • pmiguel
    replied
    Originally posted by kmcarr View Post
    Phillip,

    Here is something I did once when working on a herpes virus genome. I first performed the assembly with gsAssembler. I then took the ACE output and loaded that into consed. Once in consed you can use the use the reassembly and contig joining tools to try to bring them together.
    Yes, that is what I am doing as well. The caveat being that I normally use the sequences matches produced in consed assembly view to drive joins. But there are virtually no sequence matches among the contigs produced by gsAssembler.

    But it looks to me as if gsAssembler has deliberately excluded sequence at the ends of contigs that might match the adjacent contigs. Lacking these regions I don't know how to drive joins.

    --
    Phillip

    Leave a comment:


  • kmcarr
    replied
    Phillip,

    Here is something I did once when working on a herpes virus genome. I first performed the assembly with gsAssembler. I then took the ACE output and loaded that into consed. Once in consed you can use the use the reassembly and contig joining tools to try to bring them together. You will need to (re)run gsAssembler with the ACE output mode set to generate a full consed folder and you will need the latest version of consed to properly recognize a 454 project.

    I also noted that consed ignores any scaffolding information from the gsAssembler; it calculates its own scaffolding based on the read pairing information in the contigs. I ran a number of independent assemblies of the viral genome in gsAssembler (I had vastly more raw data than needed for a single assembly). While the size and sequence of the contigs was consistent from one assembly to the next, the scaffolding produced by gsAssembler varied. When the assemblies were loaded into consed it recalculated the scaffolding and produced consistent arrangements for all the assemblies.

    Leave a comment:


  • pmiguel
    replied
    Originally posted by bio-x View Post
    for bacteria genome, newbler is a good choice;
    for " 52 contigs in a single scaffold", you can try to close gaps with the pair-end information.
    Yes, that is the plan. But I am a little suspicious of the gaps.

    I have now done assemblies with both gsAssembler and phrap and examined the assemblies in consed.

    BTW, I had to do a perl -i -pe 's/_left/.f/ if /^>/;s/_right/.r/ if /^>/;' on the fasta and qual files before doing phrap to get consed to show the F/R paired ends in assembly view.

    As I mentioned earlier, the phrap assembly produced a larger number of contigs by far than gsAssembler. This was somewhat mitigated when I cut all the quality values for the reads in half. Nevertheless, the assemblies appear qualitatively different.

    Phrap contig consensus sequence extends fairly far into questionable areas towards the ends of contigs. This is normal for phrap. The bases at the ends of contigs tend to have very low quality values. Newbler, however, is far more parsimonious with how far it allows the consensus sequence to extend at the ends of contigs.

    While I can understand the purpose behind this more stringent consensus sequence generating behavior, I want to turn it off! As it is, none of the contigs show overlap at their ends--so I have no basis to join them. Anyone know a way to get gsAssembler to become less censorious?

    --
    Phillip

    Leave a comment:


  • bio-x
    replied
    for bacteria genome, newbler is a good choice;
    for " 52 contigs in a single scaffold", you can try to close gaps with the pair-end information.

    Leave a comment:


  • pmiguel
    replied
    Originally posted by nickloman View Post
    Oh right! Sounds like you have a pretty good result then if ended up with a single scaffold. Be cautious, doing another assembly with a different program may be more confounding than confirmatory! Perhaps it would be better, if you want to verify the genome order to make some confirmatory PCRs, perhaps primers to amplify the entire genome in 10kb sections.
    Well, I neglected to add that a cursory examination of the optical map bore no similarity that I could perceive to the scaffold restriction map. But that might be for a trivial reason--like the optical map was done with some other restriction enzyme than the one we were told.

    It is the right bacterium--everything fits previous sequencing results.

    In part I'm just trying to transition or include the tools I've long used for eukaryotic BAC assembly (phred/phrap/consed) into this brave new next-generation world.

    Thanks for your advice,
    Phillip

    Leave a comment:


  • nickloman
    replied
    Originally posted by pmiguel View Post
    gsAssembler (aka "Newbler") gives me 52 contigs in a single scaffold, with lots of extraneous contigs--most or all likely deriving from contaminating eukaryotic DNA.

    I looks to me like there are very few repeats in this genome. The read lengths are standard Titanium read lengths (~400 mean) but in cases where the read contains the F/R mate split, each right/left reads will be shorter, of course. The pair-ends look to be 2-4 kb apart--consistent with 3kb paired-ends.

    --
    Phillip
    Oh right! Sounds like you have a pretty good result then if ended up with a single scaffold. Be cautious, doing another assembly with a different program may be more confounding than confirmatory! Perhaps it would be better, if you want to verify the genome order to make some confirmatory PCRs, perhaps primers to amplify the entire genome in 10kb sections.

    Leave a comment:


  • pmiguel
    replied
    Originally posted by nickloman View Post
    How many contigs are you getting right now and is this consistent with the repeat profile (i.e. number of repeat regions >= read length) predicted from the genome in question?
    gsAssembler (aka "Newbler") gives me 52 contigs in a single scaffold, with lots of extraneous contigs--most or all likely deriving from contaminating eukaryotic DNA.

    I looks to me like there are very few repeats in this genome. The read lengths are standard Titanium read lengths (~400 mean) but in cases where the read contains the F/R mate split, each right/left reads will be shorter, of course. The pair-ends look to be 2-4 kb apart--consistent with 3kb paired-ends.

    --
    Phillip

    Leave a comment:


  • maubp
    replied
    I'd second the suggestion to try MIRA 3 on the assembly. The manual takes you through preparing the SFF file using sff_extract etc. If you start using the tool seriously, do sign up to their mailing list.

    Leave a comment:


  • nickloman
    replied
    My experience with assembling bacterial genomes from 454 data is that Newbler with default parameters will do better than any other assembly engine, when dealing purely with 454 reads.

    I don't know why this is, but I presume there are some built-in heuristics in Newbler that suit the error model for this technology (which is very different to Solexa or Sanger-ABI).

    However, if you really want to try a different assembler then I would suggest MIRA and CLC Genomics Workbench are possible options.

    How many contigs are you getting right now and is this consistent with the repeat profile (i.e. number of repeat regions >= read length) predicted from the genome in question?

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Best Practices for Single-Cell Sequencing Analysis
    by seqadmin



    While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
    06-06-2024, 07:15 AM
  • seqadmin
    Latest Developments in Precision Medicine
    by seqadmin



    Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

    Somatic Genomics
    “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
    05-24-2024, 01:16 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 02:20 PM
0 responses
11 views
0 likes
Last Post seqadmin  
Started by seqadmin, 06-07-2024, 06:58 AM
0 responses
181 views
0 likes
Last Post seqadmin  
Started by seqadmin, 06-06-2024, 08:18 AM
0 responses
228 views
0 likes
Last Post seqadmin  
Started by seqadmin, 06-06-2024, 08:04 AM
0 responses
185 views
0 likes
Last Post seqadmin  
Working...
X