Hi,
I am using HGAP through SMRTanalysis to assemble 2 MB microbial genome. I have 2 SMRT cells with more than 200x coverage. With HGAP2 from SMRTportal 2.1.1 i got around 20 contigs. Then I upgraded to v 2.2 and with HGAP 2 and 3 protocols I am always getting 24 contigs. I have also updated the target coverage to 15 since it is a microbial genome.
Also as per the pacbio recommendation "For samples with a lot of coverage (e.g. significantly greater than 100X coverage), you may see a larger number of contigs resulting from overwhelming the built-in contamination and chimera filtering that is part of the HGAP process. This can be addressed by using the ~100X longest subreads for HGAP, which can be selected by increasing the minimum subread length." I increased minimum subread length to 1000 and 2000. With subread 1000 I get 24 contigs assembly and with 2000 I get 180 contig assebly.
Based on this scenario I have some questions:
Thanks
Sagar
I am using HGAP through SMRTanalysis to assemble 2 MB microbial genome. I have 2 SMRT cells with more than 200x coverage. With HGAP2 from SMRTportal 2.1.1 i got around 20 contigs. Then I upgraded to v 2.2 and with HGAP 2 and 3 protocols I am always getting 24 contigs. I have also updated the target coverage to 15 since it is a microbial genome.
Also as per the pacbio recommendation "For samples with a lot of coverage (e.g. significantly greater than 100X coverage), you may see a larger number of contigs resulting from overwhelming the built-in contamination and chimera filtering that is part of the HGAP process. This can be addressed by using the ~100X longest subreads for HGAP, which can be selected by increasing the minimum subread length." I increased minimum subread length to 1000 and 2000. With subread 1000 I get 24 contigs assembly and with 2000 I get 180 contig assebly.
Based on this scenario I have some questions:
- What parameter influence the HGAP other than target_coverage and min_subread_length?
- I have updated min_subread_length to 1000/2000 in HGAP parameters. Is it right way to do it?
- Why HGAP from 2.1.1. have better (20 contigs) results than 2.2?
- I also have some Illumina data for this genome and with Spades I got 64 contigs. What other assemblers I could try for Only_pacBio or hybrid assembly?
Thanks
Sagar
Comment