Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • speed up the cuffdiff calculation

    Dear Community,

    I have 12 sets of RNAseq data (4 groups) and mapped them using STAR. The resultant bam files ranged from 4G to 13 G. I then run cuffdiff on those bam files to get the FPKMs and differential genes. The process has been on-going for over a day and the log file still stopped at " Inspecting maps and determining fragment length distributions". I wonder if the cuffdiff has limit about the sample size? Is it normal this slow?

    Thanks a lot for any inputs!

    C.

  • #2
    Hi Capricy,

    I recently used cuffdiff on 12 samples (bam sizes ~2Gb each) and it took 5-6 hours. This was using a server with 128Gb and 20 threads.

    I'd say your process does seem a bit slow, although your bam files are larger than mine. How much memory / threads are you using? I wonder if you could use the 'top' command to check if the program is still actually running?

    Cheers,

    Matt.

    Comment


    • #3
      Hi, Matt,

      Thank you very much for reply.

      I am running on HPC. I used 96G mem, 40 processor. They are still running..., and last night I started to see the output for file: var_model.info

      I wonder if the uneven file size would be the issue.

      Not sure how long it would take to actually finish.

      C.
      Last edited by capricy; 11-29-2017, 03:55 AM.

      Comment


      • #4
        hmm, well at least it's not just hanging!

        Not sure why it's taking that long. If you've given cuffdiff all the threads '-p 40', I'd have thought that would be plenty. Maybe someone else has a better idea?

        Matt.

        Comment


        • #5
          After 8 theads the speedup for cufflinks/diff is marginal...

          From my experience the speed up of the cufflinks/cuffdiff is marginal after 8 threads...

          In some cases the runtime with 32-48 threads may be way longer than with 8-16, esp on systems with 4+ CPU sockets due to bottlenecks caused by memory interconnects saturation/latencies.

          Also make sure the system/program is using NUMA properly and cpu interleaving is not set in the BIOS setup.

          For tophat/cufflinks I would run several jobs using 1-8 threads in parallel than one job at a time using 40 threads in series (provided enough ram is available).

          PS: And be patient... - leave job running overnight/weekend/Christmas Holiday :-)

          Comment


          • #6
            Thank you very much for advice about bringing down the value for -p.

            I will try that with larger memory.

            Actually all my jobs are hanging at:

            ChkbCpt1b
            > Processing Locus chr15:100479569-100495239 [******************** ] 81%Methig1
            Mettl7a2
            Methig1
            Mettl7a2
            > Processing Locus chr15:100469033-100479252 [******************** ] 81%Methig1
            Mettl7a2
            Methig1
            Mettl7a2
            > Processing Locus chr15:103562759-103565081 [******************** ] 81%

            I am working on mouse data. I use mm10 gtf as reference.

            C.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Latest Developments in Precision Medicine
              by seqadmin



              Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

              Somatic Genomics
              “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
              05-24-2024, 01:16 PM
            • seqadmin
              Recent Advances in Sequencing Analysis Tools
              by seqadmin


              The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
              05-06-2024, 07:48 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 01:32 PM
            0 responses
            10 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 05-24-2024, 07:15 AM
            0 responses
            199 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 05-23-2024, 10:28 AM
            0 responses
            221 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 05-23-2024, 07:35 AM
            0 responses
            232 views
            0 likes
            Last Post seqadmin  
            Working...
            X