Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • manjari.deshmukh
    Member
    • Mar 2015
    • 11

    PacBio assembly using SMRT portal

    Hi all,

    I am trying to assemble a bacterial genome with almost 6mb genome size using HGAP 3 from SMRT portal.
    I have 10 SMRT cells giving total 64X coverage.
    The data has been provided from outside our lab so i dont know the chemistry or sequencing process they used. I run HGAP 3 assembler with mainly default parameter, changing only the genome size and got 240 saffolds which are ver y high. please help me in reducing the scaffold numbers.

    Regards

    Manjari
  • GenoMax
    Senior Member
    • Feb 2008
    • 7142

    #2
    Can you post the results from RS_subread protocol analysis? That would be useful to judge the quality of your data.

    Have you tried to do the assembly with just one or two of the SMRTcells that have the longest subreads?

    Comment

    • manjari.deshmukh
      Member
      • Mar 2015
      • 11

      #3
      Hi GenoMax

      Thanks for the quick reply.
      I have attached the subreads protocol analysis. No i din't try assembly with the two or three cells.

      Manjari
      Attached Files

      Comment

      • GenoMax
        Senior Member
        • Feb 2008
        • 7142

        #4
        Is that filter report from one cell or all 10 together (I hope the answer is one)?

        If the answer is one then I would say try your assembly with one, two and three (of the best SMRTcells, separate runs) with a 4kb seed (you will have to deselect "automatic minimum length seed calculation" setting).

        Comment

        • manjari.deshmukh
          Member
          • Mar 2015
          • 11

          #5
          This report is from all 10 cells taken together. Is there any problem with the data?

          Comment

          • GenoMax
            Senior Member
            • Feb 2008
            • 7142

            #6
            That is not a good amount of data from 10 SMRTcells.

            Have you run independent RS_Subread filtering on each SMRTcell? Can you identify ones that look better in terms of mean subread length/total reads? Perhaps you can try to select only those for the assembly.

            Dr. Hall from PacBio participates on this forum and he may have some suggestions later today.

            Comment

            • rhall
              Senior Member
              • Aug 2012
              • 324

              #7
              Sorry, I missed the new thread.
              A conservative estimate would be ~800x for a 6mb genome from 10 cells.
              Does the loading report look similar for all cells, can you post an example?

              Comment

              • manjari.deshmukh
                Member
                • Mar 2015
                • 11

                #8
                Thanks R Hall for your respond. I have attached the loading report of 4 cells. they are more or less same.
                Attached Files

                Comment

                • GenoMax
                  Senior Member
                  • Feb 2008
                  • 7142

                  #9
                  @manjari: It is probably apparent by now but these are poor runs. P1 loading should normally be in 35-50% range. Inserts in your libraries appear to be small so you should actually have got a lot more data.

                  Were these libraries made by size selection (e.g. blue pippin)? Did the sequence provider try doing a clean-up to remove contaminants to see if the yield can be increased? This is a good example where local PacBio FAS would be a good resource to consult.

                  Comment

                  • manjari.deshmukh
                    Member
                    • Mar 2015
                    • 11

                    #10
                    @GenoMax: thanks for quick respond.
                    So, what should i do now. Can we go ahead with the denovo assembly or should we ask the data provider for details of the run and some more data????? I am lost.

                    Comment

                    • GenoMax
                      Senior Member
                      • Feb 2008
                      • 7142

                      #11
                      You can try doing assemblies with the data you have (have you tried to vary any assembly parameters). If you are lucky perhaps one (or more) of the SMRTcells would have the needed critical long fragments that make a good assembly.

                      Since you appear to be doing this on amazon cloud you may be limited by external constraints (cost) as to what all you want to try. Wait to see if Dr. Hall has any specific recommendations for parameters to try.

                      Did you make the libraries or did the provider make them? Having a constructive discussion with them about ways to improve the yield may be useful. Making new libraries should also be on the table if you must have a "finished' (or close to) genome.
                      Last edited by GenoMax; 04-01-2015, 05:12 AM.

                      Comment

                      • rhall
                        Senior Member
                        • Aug 2012
                        • 324

                        #12
                        The assembly is a lost cause. Of the 4 cell reports that you posted 3 cells contain little to no sequencing of your sample, and are all control reads. Looking at the 'length between adapters histogram' these cells show a single peak corresponding to the 4kb sequencing control. A sheared library should have a continuous distribution of insert sizes. Cell 1 does have sequencing and not just the control, but not enough data to assemble (only about 1/10th of the expected sequencing yield).
                        I would not recommend sequencing more of the same library without some sample QC. At this point you should talk to whoever did the sequencing, and if possible get in contact with your PacBio FAS (Field Applications Scientist) to discuss sample QC and loading.

                        Comment

                        Latest Articles

                        Collapse

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by SEQadmin2, 06-09-2026, 11:58 AM
                        0 responses
                        23 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 06-05-2026, 10:09 AM
                        0 responses
                        29 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 06-04-2026, 08:59 AM
                        0 responses
                        39 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 06-02-2026, 12:03 PM
                        0 responses
                        61 views
                        0 reactions
                        Last Post SEQadmin2  
                        Working...