Should I try hybrid assembly with my PacBio data?

MiniMicrobe

Junior Member

Join Date: Nov 2015

Posts: 6
- Share
- Tweet
#1

Should I try hybrid assembly with my PacBio data?

12-09-2015, 08:01 AM

Hi all,

I recently had the genome of a bacterial strain I am working with sequenced using both PacBio and Illumina paired end.

I have managed to assemble the Illumina data into ~200 contigs using Soap2. The PacBio data I got back came assembled into 22 contigs. Which I was a little disappointed with especially because other people in my lab have sequenced the same species but different strains and got their data back as one contig! The original idea was to map the Illumina to the PacBio to look for errors.

But anyway, now I am not sure what to do with the data I have. The longest four contigs of the PacBio data cover ~97% of my estimated 4.5Mb genome size but all the other contigs do map to the same species when looking at the BLASR output, although some with low coverage. Now I'm not sure what is "real" and I don't want to underestimate the genome size.

I have read that you can use Pacbio sequences to scaffold Illumina contigs so I am wondering if I should try that? But I can't really find any helpful tutorials/resources on how to do this. I'm not sure about which PacBio data I should use (I have the CCS.fastq, filtered subread fastq and longest subread fastq file). If I need to do anything to the data before using it? Which program to use? etc.

Any help would be appreciated, even if its just a link to a good resource.

Thanks in advance!
Tags: hybrid assembly, illumina, pacbio
rhall

Senior Member

Join Date: Aug 2012

Posts: 324
- Share
- Tweet
#2

12-09-2015, 10:12 AM

Rather than try a complex hybrid approach, which is unlikely to be any more successful than the 22 contig Pacbio assembly I would try to diagnose and optimize the Pacbio assembly. How do the preassembly statistics (yield, N50, number of bases) compare to the other assemblies in your lab? Was the subread N50, or the number of bases in the filtered data less than the assemblies that generated single contigs?
With 22 contigs it is possible to run bridgemapper to order the contigs with the remaining Pacbio reads, overlapping the contigs using minimus2 and validating using resequencing. Think of it as manual finishing. I would then use the illumina reads to check the final base accuracy.
You mentioned contigs with lower coverage, is it possible that the sample is not perfectly clonal, and you are seeing a minor population that is breaking the assembly?
Comment

Previous template Next

Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
04-22-2024, 07:01 AM
Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Should I try hybrid assembly with my PacBio data?

Comment

Latest Articles

ad_right_rmr

News