Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • pacBioToCA caFailure (diff. from similar threads)

    Hi guys,


    I'm trying to assemble PacBio reads with Illumina short reads, by using Celera Assembler but I'm getting an error on the 1-overlap step.
    That must be some misconfiguration issue (of the .spec file) once it did finish sccessful when I tried to assemble pacbio reads only.

    What I did is:

    1) convert my illumina reads:

    fastqToCA -libraryname illumina -technology illumina -type sanger -innie -insertsize 300 50 -mates pe1.fq,pe2.fq > illumina.frg

    2) then call pacBioToCA to convert pacbio reads:

    nice pacBioToCA -length 500 -shortReads -genomeSize 2500000000 -partitions 200 -l snake -s pacbio.spec -fastq /work/snake/data/PacBio/SN051_filtered_subreads.fastq /work/snake/CeleraAssembler/20140922/illumina.frg > run.out 2>&1





    It's a 10 million pacbio reads of about 1.5kbp each and 20 million illumina short reads of 99bp.



    After some hours of execution I'm getting this error:

    Code:
    runCA failed.
      
    ----------------------------------------
    Stack trace:
      
    at /usr/local/src/celera/runCA line 1501
           main::caFailure('ERROR:  Overlap job /work/snake/CeleraAssembler/2...', undef) called at /usr/local/src/celera/runCA line 3757
           main::checkOverlapper('normal') called at /usr/local/src/celera/runCA line 3808
           main::checkOverlap('normal') called at /usr/local/src/celera/runCA line 6310
    
    
    Failure message:
     
    ERROR:  Overlap job /work/snake/CeleraAssembler/20140922/tempsnake/./1-overlapper/001/000031 FAILED.
    ERROR:  Overlap job /work/snake/CeleraAssembler/20140922/tempsnake/./1-overlapper/001/000032 FAILED.
    ERROR:  Overlap job /work/snake/CeleraAssembler/20140922/tempsnake/./1-overlapper/001/000033 FAILED.
    ERROR:  Overlap job /work/snake/CeleraAssembler/20140922/tempsnake/./1-overlapper/001/000034 FAILED.
    ERROR:  Overlap job /work/snake/CeleraAssembler/20140922/tempsnake/./1-overlapper/001/000035 FAILED.
    ...
    ...
    a lot more...
    ERROR:  Overlap job /work/snake/CeleraAssembler/20140922/tempsnake/./1-overlapper/001/000784 FAILED.
    
    
    94 overlapper jobs failed
    log file ends with:

    Code:
    Failed to execute /usr/local/src/celera/runCA -s /work/snake/CeleraAssembler/20140922//tempsnake/snake.spec -p asm -d . ovlMerThreshold=338 ovlHashLibrary=2 ovlRefLibrary=1-1 ovlCheckLibra     ry=1 obtHashLibrary=1-1 obtRefLibrary=1-1 obtCheckLibrary=0 sgePropagateHold="pBcR_asm" stopAfter=overlapper



    This is the .spec file I'm using:


    Code:
    stopAfter=overlapper
    
    # original asm settings
    utgErrorRate = 0.25
    utgErrorLimit = 4.5
    
    cnsErrorRate = 0.25
    cgwErrorRate = 0.25
    ovlErrorRate = 0.25
    
    merSize=14
    
    merylMemory = 300000
    merylThreads = 30
    
    ovlStoreMemory = 500000
    
    # grid info
    useGrid = 0
    scriptOnGrid = 0
    frgCorrOnGrid = 0
    ovlCorrOnGrid = 0
    
    sge = -V -A assembly
    sgeScript = -pe threads 30
    sgeConsensus = -pe threads 30
    sgeOverlap = -pe threads 30
    sgeFragmentCorrection = -pe threads 30
    sgeOverlapCorrection = -pe threads 30
    
    #ovlMemory=8GB --hashload 0.7
    ovlHashBits = 25
    ovlThreads = 30
    ovlHashBlockLength = 20000000
    ovlRefBlockSize =  50000000
    
    # for mer overlapper
    merCompression = 1
    merOverlapperSeedBatchSize = 300000
    merOverlapperExtendBatchSize = 250000
    
    frgCorrThreads = 30
    frgCorrBatchSize = 100000
    
    ovlCorrBatchSize = 100000
    
    # non-Grid settings, if you set useGrid to 0 above these will be used
    merylMemory = 500000
    merylThreads = 30
    
    ovlStoreMemory = 500000
    
    ovlConcurrency = 30
    
    cnsConcurrency = 30
    
    merOverlapperThreads = 30
    merOverlapperSeedConcurrency = 30
    merOverlapperExtendConcurrency = 30
    
    frgCorrConcurrency = 30
    ovlCorrConcurrency = 30
    cnsConcurrency = 30

    I've searched this forum for similar threads but couldn't find any.
    Could someone please give me some help on how to get through this issue?



    Thanks a lot.

    Cheers,
    Condomitti.

  • #2
    Hi mates,

    After some trials I could reach a set of configuration in the spec file that led to the successful end of assembly :-)

    For those who might be interested, below is the spec file I used:

    PacBio reads: 10 million of ~ 1.5kbp length each.
    Short reads: 40 million of ~99bp each.

    .spec file:
    Code:
    stopAfter=overlapper
    
    # original asm settings
    utgErrorRate = 0.25
    utgErrorLimit = 4.5
    
    cnsErrorRate = 0.25
    cgwErrorRate = 0.25
    ovlErrorRate = 0.25
    
    merSize=14
    
    merylMemory = 50000
    merylThreads = 30
    
    ovlStoreMemory = 50000
    
    # grid info
    useGrid = 0
    scriptOnGrid = 0
    frgCorrOnGrid = 0
    ovlCorrOnGrid = 0
    
    sge = -V -A assembly
    sgeScript = -pe threads 30
    sgeConsensus = -pe threads 30
    sgeOverlap = -pe threads 30
    sgeFragmentCorrection = -pe threads 30
    sgeOverlapCorrection = -pe threads 30
    
    #ovlMemory=8GB --hashload 0.7
    ovlHashBits = 26
    ovlThreads = 30
    ovlHashBlockLength = 100000000
    ovlRefBlockSize =  5000000
    
    # for mer overlapper
    merCompression = 1
    merOverlapperSeedBatchSize = 200000
    merOverlapperExtendBatchSize = 750000
    
    frgCorrThreads = 30
    frgCorrBatchSize = 200000
    
    ovlCorrBatchSize = 500000
    
    # non-Grid settings, if you set useGrid to 0 above these will be used
    merylMemory = 50000
    merylThreads = 30
    
    ovlConcurrency = 30
    
    cnsConcurrency = 30
    
    merOverlapperThreads = 30
    merOverlapperSeedConcurrency = 30
    merOverlapperExtendConcurrency = 30
    
    frgCorrConcurrency = 30
    ovlCorrConcurrency = 30
    cnsConcurrency = 30


    Cheers,
    Condomitti.

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Essential Discoveries and Tools in Epitranscriptomics
      by seqadmin




      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
      04-22-2024, 07:01 AM
    • seqadmin
      Current Approaches to Protein Sequencing
      by seqadmin


      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
      04-04-2024, 04:25 PM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, 04-11-2024, 12:08 PM
    0 responses
    59 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 10:19 PM
    0 responses
    57 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 09:21 AM
    0 responses
    51 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-04-2024, 09:00 AM
    0 responses
    56 views
    0 likes
    Last Post seqadmin  
    Working...
    X