Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help w/ HGAP4 assembly of high polymorphism genome

    I'm trying to assemble a nematode genome that we estimate to be 400Mb in length. I have 60x worth of PacBio Sequel data and I've been using HGAP4 (FALCON + Arrow). So far I'm not having much luck getting a good assembly and I'm trying to decide if its worth tweaking parameters more than I already have, or if it would be worth generating more PB coverage.

    My primary metric for how good the assembly is has been BUSCO to look at core gene coverage and then simple contig metrics (N50, total length, # contigs, etc...) So far my best assembly has been missing 35 BUSCO core eukaryotic genes, and a fair number of the genes that were found were found > 1 time. Also the total length of contigs we're getting is > 100Mb larger than we expected (although its within the world of possibility that our genome size estimate is wrong). So I suspect this worm's genome may be highly polymorphic. We do have an old assembly but its in pretty bad shape itself (and the reason we want to build a new, better one).

    Can anyone suggest arguments for HGAP4 that might help improve contiguity in a genome that is very polymorphic? So far I've tried HGAP4's default params, default params + the 'aggressive' switch, and a range of values for cutoff_length_pr. If I set cutoff_length_pr to just a bit < the mean pre-assembly read length my contiguity goes up and the genome size drops to closer to what I had expected, but I end up losing a few core genes. So I think its really just dropping data that is needed.

    Any advice would be appreciated.

Latest Articles

Collapse

  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM
  • seqadmin
    Strategies for Sequencing Challenging Samples
    by seqadmin


    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
    03-22-2024, 06:39 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
25 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
27 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 09:21 AM
0 responses
24 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-04-2024, 09:00 AM
0 responses
52 views
0 likes
Last Post seqadmin  
Working...
X