Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • adiallo
    Member
    • Mar 2012
    • 17

    How to speedup Cuffdiff ?? It is taking forever !!!

    I have a machine with 16 cores, 64GB memory and 16TB hard drive.

    I have a very large RNASeq data set to analyse:
    25 Cases (100 millions reads in each file).
    25 Controls (100 millions reads in each file).

    The cufflinks and cuffmerge part are all done.

    The cuffdiff is given me some problems because when I use more than one thread (i.e,-p *> 1), I run out of memory after 1 day.

    I am running cuffdiff with -p 1, and the job started 3 weeks ago and still running.

    How can I speed up the process ? Or what are the other solutions ?

    Can I split my files per chromosomes and run different analysis ? If so will my results be usable ?

    Can I ask cuffdiff to write some partial results ?

    Can I ask cuffdiff to compute juste the gene expression levels and discard the other results ?


    Thanks for any advise.
    Alpha
  • yueluo
    Member
    • Aug 2013
    • 82

    #2
    Supposedly, Cufflinks-2.2.0 introduced a new workflow. You can now run cuffquant to estimate transcript abundance for each sample before running cuffdiff, which speeds up the process and solves some runtime issues. However, I have encountered some minor issues with the output of cuffnorm. You can check one of my posts about it in this forum(also posted the problem on Google Group), but so far no feedback from other users.
    If you only care about examining differences between your two groups then it shouldn't be much of a problem.

    Comment

    • adiallo
      Member
      • Mar 2012
      • 17

      #3
      Thanks for the quick reply, I will run cuffquant/ cuffnorm and cuffquant / cuffdiiff and let you know if everything went well.

      Cheers,
      Alpha

      Comment

      • Wallysb01
        Senior Member
        • Feb 2011
        • 286

        #4
        In terms of speed, cuffquant made the difference between me being able to use Cufflinks or not. I tried to use cuffdiff a while back on my data and it was looking like it would take around a month or so on 12 cores. Now with cuffquant, its more like overnight. And once you’ve run cuffquant, you can rerun cuffdiff very quickly, since you only have to generate those cxb files once.

        Comment

        • jeales
          Member
          • Oct 2012
          • 13

          #5
          Try the --no-diff argument to cuffdiff

          I can't see your original command line
          But if you didn't specify labels with -L or comma delimit your list of case and control bams
          then it will be doing pairwise DE tests for all against all samples
          and this is likely to be the slowest step

          Comment

          • adiallo
            Member
            • Mar 2012
            • 17

            #6
            Hello,
            Here is an example of my command line:
            I have a lot of bash variables.
            $cuffdiff -o ${output_path_diff} -b ${genomeIndex} -p 1 -L TEST_ALL,CONTROL_ALL -u ${merged_gtf} $bam14,$bam16,$bam26,$bam28,$bam30,$bam34,$bam36,$bam40,$bam42,$bam44,$bam46,$bam48,$bam50,$bam52,$bam54,$bam56,$bam58,$bam60,$bam64,$bam66,$bam68,$bam117,$bam118,$bam119,$bam32 $bam2,$bam4,$bam6,$bam70,$bam72,$bam74,$bam76,$bam78,$bam80,$bam82,$bam84,$bam86,$bam90,$bam92,$bam94,$bam96,$bam98,$bam100,$bam102,$bam104,$bam108,$bam110,$bam112,$bam114,$bam116

            Cheers,
            Alpha

            Comment

            • jeales
              Member
              • Oct 2012
              • 13

              #7
              That looks ok to me
              as long as the line breaks are actually spaces

              I'd definitely try the new cufflinks workflow to see if it reduces the ram usage by splitting up the tasks
              i.e. tophat > cufflinks > cuffmerge > cuffquant > cuffdiff
              but you are a supplying a huge amount of data, it's going to need a lot of memory

              As a comparator i've got a cuffdiff running with 32 threads on 72 bams (average size 5GB) and that is using 90GB of ram

              I predict, based on progress from the verbose (-v) output, that it'll take 6 days for my job to finish, that doesn't bode well for your analysis runtime

              Comment

              • jeales
                Member
                • Oct 2012
                • 13

                #8
                Also if you just want expression values per sample then omit the cuffdiff
                You can always do your own DE testing in R etc

                Comment

                • jeales
                  Member
                  • Oct 2012
                  • 13

                  #9
                  New cufflinks workflow compared to old
                  cuffnorm outputs expression values from the CXB files generated by cuffquant
                  then you could do your own testing on the output


                  Comment

                  • adiallo
                    Member
                    • Mar 2012
                    • 17

                    #10
                    Thanks jeales,
                    I am using the new version of cufflinks, the cuffquant is done. I am running the cuffdiff part.
                    I am testing on different servers I have access too to speedup the process.
                    I will let you know the results and computation time and ressources soon.

                    Cheers,
                    Alpha

                    Comment

                    • vishnuamaram
                      Member
                      • Jun 2013
                      • 41

                      #11
                      That's a great news from alpha.

                      I do have a suggestion, I think as the process goes out of memory and your RAM size is less (64gb). Try creating a tmp folder in your server hard drive and give command input of the tmp folder while running the analysis.

                      Comment

                      • adiallo
                        Member
                        • Mar 2012
                        • 17

                        #12
                        Thanks vishnuamaram
                        I will try this solution. I was still trying to run cuffdiff with all my datasets, I only can run it with 1 cpu and it's a very long process.
                        Since I have 100 samples, 4 conditions (25 samples/ condition) and the samples in a condition are not replicates, cuffdiff is not the best!!
                        Do you have any suggestion for that ?
                        For now I am exploring another idea : writting a R script with DESeq and use the cuffnorm results do to my diff expression analysis.

                        Alpha

                        Comment

                        • adiallo
                          Member
                          • Mar 2012
                          • 17

                          #13
                          Hello vishnuamaram,
                          I realize that cufflinks programs don't have a parameter for tmp folder !!!
                          How can i manage to make it work ?

                          Alpha

                          Comment

                          • shangzhong0619
                            Member
                            • Nov 2013
                            • 17

                            #14
                            Cuffquant takes a long time

                            Hi all,
                            I have a problem about running cuffquant, when I didn't use the option '-b/--frag-bias-correct <genome.fa>', I can got results fast. However if I add that option, it always got stuck at a processing percentage and seems taking forever.

                            I also tried to use the old pipeline, when running the cuffdiff it also takes forever. I searched online and found that in the annotation file, removing the line whose 3rd feature is 'gene' can increase the speed. I did that, but the speed didn't increase that much. Does anyone know what is the possible issue? Thanks.

                            Comment

                            • adiallo
                              Member
                              • Mar 2012
                              • 17

                              #15
                              Hello shangzhong0619,

                              Here is the parameters of cuffquant:

                              General Options:
                              -o/--output-dir write all output files to this directory [ default: ./ ]
                              -M/--mask-file ignore all alignment within transcripts in this file [ default: NULL ]
                              -b/--frag-bias-correct use bias correction - reference fasta required [ default: NULL ]
                              -u/--multi-read-correct use 'rescue method' for multi-reads [ default: FALSE ]
                              -p/--num-threads number of threads used during quantification [ default: 1 ]
                              --library-type Library prep used for input reads [ default: below ]

                              Advanced Options:
                              -m/--frag-len-mean average fragment length (unpaired reads only) [ default: 200 ]
                              -s/--frag-len-std-dev fragment length std deviation (unpaired reads only) [ default: 80 ]
                              -c/--min-alignment-count minimum number of alignments in a locus for testing [ default: 10 ]
                              --max-mle-iterations maximum iterations allowed for MLE calculation [ default: 5000 ]
                              -v/--verbose log-friendly verbose processing (no progress bar) [ default: FALSE ]
                              -q/--quiet log-friendly quiet processing (no progress bar) [ default: FALSE ]
                              --seed value of random number generator seed [ default: 0 ]
                              --no-update-check do not contact server to check for update availability[ default: FALSE ]
                              --max-bundle-frags maximum fragments allowed in a bundle before skipping [ default: 500000 ]
                              --max-frag-multihits Maximum number of alignments allowed per fragment [ default: unlim ]
                              --no-effective-length-correction No effective length correction [ default: FALSE ]
                              --no-length-correction No length correction [ default: FALSE ]


                              I suggest you to change some default parameters, like --max-bundle-frags to 50000.

                              Cheers,
                              Alpha

                              Comment

                              Latest Articles

                              Collapse

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Today, 10:09 AM
                              0 responses
                              9 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, Yesterday, 08:59 AM
                              0 responses
                              17 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              25 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 11:40 AM
                              0 responses
                              21 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...