Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • speed up the cuffdiff calculation

    Dear Community,

    I have 12 sets of RNAseq data (4 groups) and mapped them using STAR. The resultant bam files ranged from 4G to 13 G. I then run cuffdiff on those bam files to get the FPKMs and differential genes. The process has been on-going for over a day and the log file still stopped at " Inspecting maps and determining fragment length distributions". I wonder if the cuffdiff has limit about the sample size? Is it normal this slow?

    Thanks a lot for any inputs!

    C.

  • #2
    Hi Capricy,

    I recently used cuffdiff on 12 samples (bam sizes ~2Gb each) and it took 5-6 hours. This was using a server with 128Gb and 20 threads.

    I'd say your process does seem a bit slow, although your bam files are larger than mine. How much memory / threads are you using? I wonder if you could use the 'top' command to check if the program is still actually running?

    Cheers,

    Matt.

    Comment


    • #3
      Hi, Matt,

      Thank you very much for reply.

      I am running on HPC. I used 96G mem, 40 processor. They are still running..., and last night I started to see the output for file: var_model.info

      I wonder if the uneven file size would be the issue.

      Not sure how long it would take to actually finish.

      C.
      Last edited by capricy; 11-29-2017, 03:55 AM.

      Comment


      • #4
        hmm, well at least it's not just hanging!

        Not sure why it's taking that long. If you've given cuffdiff all the threads '-p 40', I'd have thought that would be plenty. Maybe someone else has a better idea?

        Matt.

        Comment


        • #5
          After 8 theads the speedup for cufflinks/diff is marginal...

          From my experience the speed up of the cufflinks/cuffdiff is marginal after 8 threads...

          In some cases the runtime with 32-48 threads may be way longer than with 8-16, esp on systems with 4+ CPU sockets due to bottlenecks caused by memory interconnects saturation/latencies.

          Also make sure the system/program is using NUMA properly and cpu interleaving is not set in the BIOS setup.

          For tophat/cufflinks I would run several jobs using 1-8 threads in parallel than one job at a time using 40 threads in series (provided enough ram is available).

          PS: And be patient... - leave job running overnight/weekend/Christmas Holiday :-)

          Comment


          • #6
            Thank you very much for advice about bringing down the value for -p.

            I will try that with larger memory.

            Actually all my jobs are hanging at:

            ChkbCpt1b
            > Processing Locus chr15:100479569-100495239 [******************** ] 81%Methig1
            Mettl7a2
            Methig1
            Mettl7a2
            > Processing Locus chr15:100469033-100479252 [******************** ] 81%Methig1
            Mettl7a2
            Methig1
            Mettl7a2
            > Processing Locus chr15:103562759-103565081 [******************** ] 81%

            I am working on mouse data. I use mm10 gtf as reference.

            C.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Best Practices for Single-Cell Sequencing Analysis
              by seqadmin



              While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
              06-06-2024, 07:15 AM
            • seqadmin
              Latest Developments in Precision Medicine
              by seqadmin



              Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

              Somatic Genomics
              “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
              05-24-2024, 01:16 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 06-07-2024, 06:58 AM
            0 responses
            13 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 06-06-2024, 08:18 AM
            0 responses
            20 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 06-06-2024, 08:04 AM
            0 responses
            18 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 06-03-2024, 06:55 AM
            0 responses
            13 views
            0 likes
            Last Post seqadmin  
            Working...
            X