Unconfigured Ad

**[email protected]** · 11-14-2012, 07:52 PM

I am not able to figure out how I can use the incomplete reference genome for error correction. It looks like FastaToCA converts fastq file to frg file so that it can be used as high identity sequence for error correction. However, the incomplete genome assembly in fasta file. there is no quality score files can be found. How can I get around this?

many thanks!

Stuart

**mchaisso** · 11-15-2012, 09:06 AM

Perhaps use the pbjelly pipeline to fill gaps? Also, with an appropriate pipeline (quiver: https://github.com/PacificBiosciences/GenomicConsensus) you may not need error correction to call accurate consensus.

cheers,
-mark

**[email protected]** · 11-20-2012, 08:24 PM

Thanks for the tips! Mark. It looks like it will take me a while to figure this out. However, It sounds like interesting to me when you say I might not need to do error correction for pacbiodate since it it has 15% error rate.

STuart

**jbingham** · 11-22-2012, 07:56 AM

Some more tips: if you want to use pacBioToCA, the approach would be to use the raw Illumina data as input to the correction step, not the draft assembly. The advantage of going back to the raw data is you may be able to correct assembly errors. The disadvantage is it takes longer to run.

If you want to keep the assembly as is, you can install SMRT Analysis and use AHA (a hybrid assembler) to scaffold it, provided your the genome is less than about 200 MB. For larger genomes, or to really focus on the gap-filling, you can use pbjelly.

Finally, the "no error correction" suggestion refers to the new algorithm HGAp: http://www.pacbiodevnet.com/hgap. You'll need more PacBio coverage to go that route. The benefit is you may be able to close more gaps and get a final result that's potentially as accurate as Sanger finishing.

**[email protected]** · 11-22-2012, 09:17 AM

Thanks for your tips! jbingham. I am in the process of generating short illumina data for the error correction. I think I don't have enough coverage to try the new algorithm since my pacbio data only gives 3-4 times coverage when look into those data more carefully. The most majority of them are less than 500bp and 1000bp. Longest read is 13kb. I will post my process later.

Thanks again to Winsettz and jbingham for helping out here!

Stuart

Topics	Statistics	Last Post
Chemotherapy Leaves Detectable DNA Signatures in Childhood Tumors by SEQadmin2 Started by SEQadmin2, Today, 11:41 AM	0 responses 9 views 0 reactions	Last Post by SEQadmin2 Today, 11:41 AM
Single-Cell Atlases Skew Toward European Ancestry, Analysis Finds by SEQadmin2 Started by SEQadmin2, 07-20-2026, 11:10 AM	0 responses 23 views 0 reactions	Last Post by SEQadmin2 07-20-2026, 11:10 AM
UC San Diego Bioengineers Map Gene Function in Human Stem Cells by SEQadmin2 Started by SEQadmin2, 07-13-2026, 10:26 AM	0 responses 35 views 0 reactions	Last Post by SEQadmin2 07-13-2026, 10:26 AM
New Analysis Splits Leukemia Into 16 Epigenomic Subgroups by SEQadmin2 Started by SEQadmin2, 07-09-2026, 10:04 AM	0 responses 45 views 0 reactions	Last Post by SEQadmin2 07-09-2026, 10:04 AM

Unconfigured Ad

pacbio sequence error correction

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News