Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • HGAP.3 protocol in the SMRT 2.3 --failing

    Hi,

    I am trying to assemble a 124kb BAC and I am new to PacBio data. I have installed SMRT analysis 2.3. I have started a HGAP_Assembly.3 though the SMRT portal, but it keep failing at P_PreAssemblerDagcon/hgapAlignForCorrection step. The issue is that the scattered fasta files required do not exist. The preceding step hgapAlignForCorrection.target.Scatter get completed without any error. I tried running the script manually and it creates the files fine. So is there any missing option in the parameter file ?

    Few other questions:
    1. Is there a way to change the setting for the job queues. I tried updating the files in $SMRT_ROOT/current/analysis/etc/cluster/LSF/ but they are not reflected in the run.
    2. Why do I need to supply a reference genome file to HGAP_Assembly when I intend to do a de-novo assembly.

    I am attaching the log as well as my settings.xml file for your reference.
    Attached Files

  • #2
    Originally posted by rachita View Post
    Hi,

    I am trying to assemble a 124kb BAC and I am new to PacBio data. I have installed SMRT analysis 2.3. I have started a HGAP_Assembly.3 though the SMRT portal, but it keep failing at P_PreAssemblerDagcon/hgapAlignForCorrection step. The issue is that the scattered fasta files required do not exist. The preceding step hgapAlignForCorrection.target.Scatter get completed without any error. I tried running the script manually and it creates the files fine. So is there any missing option in the parameter file ?
    I see you have the genomeSize set @ 124kb, how much input coverage do you have?

    If you can zip up the rest of the logs (particularly the ones in log/P_PreAssemblerDagcon/*) it will facilitate the troubleshooting process.


    Originally posted by rachita View Post

    Few other questions:
    1. Is there a way to change the setting for the job queues. I tried updating the files in $SMRT_ROOT/current/analysis/etc/cluster/LSF/ but they are not reflected in the run.
    There is information for configuring LSF here: https://github.com/PacificBioscience...llation-v2.2.0

    Granted that's for the previous version (2.2.0 not 2.3.0 that you have installed) but configuration should be identical.

    Originally posted by rachita View Post
    2. Why do I need to supply a reference genome file to HGAP_Assembly when I intend to do a de-novo assembly.
    You don't need to supply a reference. The reference that is referred to in the settings.xml is the final product of the assembly process that is used for error correction by mapping the raw reads to this freshly generated denovo reference.

    Comment


    • #3
      1. I added the genome size as 124,000. I can't figure out the coverage as I have not aligned the data yet. The data consists of 89,252 long reads and 726,663,272 bases.

      2. I have manually changed the LSF in .tmpl files to add a queue name ($smrtanalysis/current/analysis/etc/cluster/LSF) but still the jobs are going to the default queue.

      3. When the pipeline did not work, so I manually ran the scripts in "P_PreAssemblerDagcon" followed by script align.plsFofn.Scatter.sh and align_003of003.sh. Which gave an IOError: The input path /PHShome/ry077/bin/smrtanalysis/userdata/jobs/016/016445/reference does not exist.


      I am attaching the logs.

      Thanks for all the help.
      Attached Files

      Comment


      • #4
        Originally posted by rachita View Post
        1. I added the genome size as 124,000. I can't figure out the coverage as I have not aligned the data yet. The data consists of 89,252 long reads and 726,663,272 bases.
        OK, that's plenty of coverage, that's not the issue.

        Originally posted by rachita View Post
        2. I have manually changed the LSF in .tmpl files to add a queue name ($smrtanalysis/current/analysis/etc/cluster/LSF) but still the jobs are going to the default queue.
        Who installed smrtanalysis for you? Make sure CLUSTER_MANAGER=LSF is in the smrtpipe.rc file located here:

        $SMRT_ROOT/current/analysis/etc/smrtpipe.rc

        If not, change it to LSF and restart smrtanalysis.

        I'm not too familiar with LSF, but based on the error message in hgapAlignForCorrection_*.log
        # Writing stdout and stderr from Popen:
        /bin/bash: /opt/lsf/conf/profile.lsf: No such file or directory
        Queue only accepts interactive jobs. Job not submitted.

        SMRTanalysis is unable to resolve your LSF settings properly.



        Originally posted by rachita View Post
        3. When the pipeline did not work, so I manually ran the scripts in "P_PreAssemblerDagcon" followed by script align.plsFofn.Scatter.sh and align_003of003.sh. Which gave an IOError: The input path /PHShome/ry077/bin/smrtanalysis/userdata/jobs/016/016445/reference does not exist.
        There is an intermediate step in between P_PreAssemblerDagcon and P_Mapping - and that's P_ReferenceUploader/runUploaderUnitig that formats the raw reads as a reference repository entry so that the raw reads can then be mapped and corrected prior to assembly.

        Comment


        • #5
          Not to high-jack this thread but Is LSF now fully supported for SMRTanalysis?

          Comment


          • #6
            Originally posted by GenoMax View Post
            Not to high-jack this thread but Is LSF now fully supported for SMRTanalysis?
            Sorry, I'm not sure exactly what you mean by "fully" supported.

            The last time I tested and used SMRTanalysis 2.3.0 with LSF was sometime late last year to run some basic integration tests, and everything worked fine.

            According to the 2.3.0 install guide, it is "supported".



            On that note, If there is a key feature of LSF that we are not supporting in SMRTAnalysis, we're probably not aware of it, and I wouldn't hold your breath waiting for support.

            Development efforts are currently focused on SMRTAnalysis's successor SMRTLink.

            Comment


            • #7
              Last time we had tried to get SMRTportal working with LSF things did not go too far (but that was 2+ years ago).

              We switched to SGE/different cluster at that point.

              Comment


              • #8
                Thanks for your reply. I was able to change the settings to run HGAP assemble. The bam generated two untigs. When I map the first contig back to human reference, I get a 8Kb region mapping to "Cloning vector pBACe3.6, complete sequence". I followed the "https://github.com/PacificBiosciences/Bioinformatics-Training/wiki/HGAP-Whitelisting-Tutorial" and created a list of reads without vector. Then I added the following tags to my settings.xml in the filtering module:

                <param name="whiteList" label="Read IDs to whitelist">
                <value>PATH/whitelist.txt</value>
                </param>
                Is this not the correct way to do this ? After changing the settings.xml I still used the portal to save and run the job. Should it be run this setup.py script.
                Last edited by rachita; 08-22-2016, 11:58 AM.

                Comment


                • #9
                  Originally posted by rachita View Post
                  Thanks for your reply. I was able to change the settings to run HGAP assemble. The bam generated two untigs. When I map the first contig back to human reference, I get a 8Kb region mapping to "Cloning vector pBACe3.6, complete sequence". I followed the "https://github.com/PacificBiosciences/Bioinformatics-Training/wiki/HGAP-Whitelisting-Tutorial" and created a list of reads without vector. Then I added the following tags to my settings.xml in the filtering module:



                  Is this not the correct way to do this ? After changing the settings.xml I still used the portal to save and run the job. Should it be run this setup.py script.

                  How long are the two contigs? Is one roughly 4Kb? The SMRTCell internal control is not removed by the HGAP assembly process, and may be what you are seeing - see this thread:
                  Single-molecule real-time observation of DNA polymerase using zero-mode waveguide (ZMW) optical confinement nanostructures


                  Is the whitelist not working for you? You didn't post any error messages. All you need to do is add the section that you mentioned, and make sure the list is one read Id per line. I'm assuming the "PATH/whitelist.txt" you referenced is actually a fully resolved path, and not exactly what you copy and pasted above as that will obviously not work.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM
                  • seqadmin
                    Techniques and Challenges in Conservation Genomics
                    by seqadmin



                    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                    Avian Conservation
                    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                    03-08-2024, 10:41 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 03-27-2024, 06:37 PM
                  0 responses
                  13 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-27-2024, 06:07 PM
                  0 responses
                  11 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-22-2024, 10:03 AM
                  0 responses
                  53 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-21-2024, 07:32 AM
                  0 responses
                  69 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X