Assembling haplotypes of a highly heterozygous gene cluster with canu

drosophilid

Junior Member

Join Date: Nov 2019

Posts: 1
- Share
- Tweet
#1

Assembling haplotypes of a highly heterozygous gene cluster with canu

11-25-2019, 06:47 AM

I am trying to assemble haplotypes for a peculiar region of the human genome that (1) has high heterozygosity, (2) has variation in presence or absence of entire genes, and (3) encompasses a gene cluster of highly similar paralogues. This is obviously making assembly difficult, since a paralogue on the same chromosome may have only 10% divergence from its duplicate, while the homologue on the other chromosome has 5% differences due to segregating polymorphism at that locus. Currently, I have nanopore and short reads from this region, both at approximately 30X coverage. I would like to use canu to assemble the nanopore reads, then short reads to polish, but I am getting nowhere near the full assembly. My command is

Code:

canu -p prefix -d canu_run genomeSize=250k correctedErrorRate=0.144 minOverlapLength=500 -nanopore-raw sample.fastq

Here sample.fastq are reads filtered for my region of interest, so its a fairly small total assembly. So far, I have tried varying the corrected error rate between 0.1 and 0.2, and the minOverlapLength between 500 and 1000 with no luck. Using BLAST, I can see large chunks of my genes of interest in the prefix.unassembled.fasta file. It seems varying error rates should help find a sweet spot of expected divergence between reads from the same allele at a locus, reads from different alleles at a locus, and reads from different paralogous loci - I'm wondering, is there any other parameters I can vary to try and get a more complete assembly? Is there any preprocessing I can do with the more accurate short reads to lead to a more complete assembly? Ideally, I eventually want phased haplotype information.
Tags: assembly, canu, heterozygous, nanopore, phase
SNPsaurus

Registered Vendor

Join Date: May 2013

Posts: 525
- Share
- Tweet
#2

11-25-2019, 10:15 AM

This isn't so helpful to solve the asked problem, but PacBio HiFi reads might be a better choice since you can generate 15kb sequences at 1% or 0.1% error rates. They might more obviously segregate into 4 bins (2 haplotypes at a locus, 2 at paralogous locus).

Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com
Comment

Previous template Next

Understanding Genetic Influence on Infectious Disease

by seqadmin

During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
- Channel: Articles
09-09-2024, 10:59 AM
Addressing Off-Target Effects in CRISPR Technologies

by seqadmin

The first FDA-approved CRISPR-based therapy marked the transition of therapeutic gene editing from a dream to reality¹. CRISPR technologies have streamlined gene editing, and CRISPR screens have become an important approach for identifying genes involved in disease processes². This technique introduces targeted mutations across numerous genes, enabling large-scale identification of gene functions, interactions, and pathways³. Identifying the full range...
- Channel: Articles
08-27-2024, 04:44 AM

Topics	Statistics	Last Post
Study Highlights Challenges in Cellular Reprogramming for Regenerative Medicine by seqadmin Started by seqadmin, Today, 06:25 AM	0 responses 13 views 0 likes	Last Post by seqadmin Today, 06:25 AM
New DNA Modification Discovered as Key to Gene Activation in Early Development by seqadmin Started by seqadmin, Yesterday, 01:02 PM	0 responses 12 views 0 likes	Last Post by seqadmin Yesterday, 01:02 PM
Wastewater Analysis Unlocks New Method for Identifying Public Health Threats by seqadmin Started by seqadmin, 09-18-2024, 06:39 AM	0 responses 14 views 0 likes	Last Post by seqadmin 09-18-2024, 06:39 AM
Molecular Markers Shared Across Dementias by seqadmin Started by seqadmin, 09-11-2024, 02:44 PM	0 responses 14 views 0 likes	Last Post by seqadmin 09-11-2024, 02:44 PM

Seqanswers Leaderboard Ad

Announcement

Assembling haplotypes of a highly heterozygous gene cluster with canu

Comment

Latest Articles

ad_right_rmr

News