Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • HGAP Parameters

    Hi,

    I am using HGAP through SMRTanalysis to assemble 2 MB microbial genome. I have 2 SMRT cells with more than 200x coverage. With HGAP2 from SMRTportal 2.1.1 i got around 20 contigs. Then I upgraded to v 2.2 and with HGAP 2 and 3 protocols I am always getting 24 contigs. I have also updated the target coverage to 15 since it is a microbial genome.

    Also as per the pacbio recommendation "For samples with a lot of coverage (e.g. significantly greater than 100X coverage), you may see a larger number of contigs resulting from overwhelming the built-in contamination and chimera filtering that is part of the HGAP process. This can be addressed by using the ~100X longest subreads for HGAP, which can be selected by increasing the minimum subread length." I increased minimum subread length to 1000 and 2000. With subread 1000 I get 24 contigs assembly and with 2000 I get 180 contig assebly.


    Based on this scenario I have some questions:
    1. What parameter influence the HGAP other than target_coverage and min_subread_length?
    2. I have updated min_subread_length to 1000/2000 in HGAP parameters. Is it right way to do it?
    3. Why HGAP from 2.1.1. have better (20 contigs) results than 2.2?
    4. I also have some Illumina data for this genome and with Spades I got 64 contigs. What other assemblers I could try for Only_pacBio or hybrid assembly?


    Thanks
    Sagar

  • #2
    What parameter influence the HGAP other than target_coverage and min_subread_length?
    I find HGAP to be extremely robust, tweaking parameters is not going to significantly improve the assembly.
    I have updated min_subread_length to 1000/2000 in HGAP parameters. Is it right way to do it?
    Yes
    Why HGAP from 2.1.1. have better (20 contigs) results than 2.2?
    At this level the number of contigs is not a good measure of assembly quality. I would try to understand how the two assemblies compare.
    I also have some Illumina data for this genome and with Spades I got 64 contigs. What other assemblers I could try for Only_pacBio or hybrid assembly?
    I don't think it is worth trying other assemblers, time is probably better spent trying to understand the current assemblies. How big are the contigs? What do they blast against? Do they show any signs of overlap? Understanding why you are getting 20 contigs, i.e. is there a repeat that the size distribution of your library does not allow you to span through? Will help with the next step.
    https://github.com/PacificBioscience...ing-Assemblies
    Richard.

    Comment


    • #3
      Thanks

      Thanks for the reply and suggestions. I will look into assembly details.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin




        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
        04-22-2024, 07:01 AM
      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Today, 08:47 AM
      0 responses
      11 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      60 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      59 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      54 views
      0 likes
      Last Post seqadmin  
      Working...
      X