Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • ksiowa
    Junior Member
    • Jul 2011
    • 2

    Cufflinks Runtime

    Hello, we're beginners who are using cufflinks for the first time to assemble a transcriptome. We have ~80 million tophat aligned reads and have been surprised by the significant amount of time that it has taken to assemble transcripts with cufflinks. We're not quite sure if it's an issue of computing power, something we've done incorrectly, or, since we're beginners, just the standard amount of time required. We are not using a reference annotation for assembly. At the pace the assembly has been moving, it looks like it will take 6-7 days to complete the assembly for this ~80 million read library. Is this normal? Any suggestions or comments would be greatly appreciated! Here are the specs:

    8 cpu's running at 2.83 GHz
    32 GB RAM
    8.8 TB free memory
  • tgenahmet
    Member
    • Apr 2009
    • 11

    #2
    We're also having very similar issues. Our clusters gives us 96 hours to complete our jobs but sometimes we hit this wall time. Any tips to improve speed with cufflinks are highly appreciated.

    Comment

    • DZhang
      Senior Member
      • Jun 2010
      • 177

      #3
      Which step did you notice the program spent most of the time? Below is one entry in the Cufflinks FAQ:

      I'm trying to assemble a sample. Cufflinks is almost done, but it seems to be hanging at "99% complete". What's going on?

      Cufflinks spawns threads for each locus to assemble and quantitate the "bundle" of reads in that locus. Some loci may have more reads and more complicated alternative splicing than others, which requires more CPU cycles. These bundles can continue processing long after all others have completed, leading to this behavior. You may be able to decrease the number of such bundles by masking out ribosomal and mitochondrial RNA using the -M/--mask-file option described in the Manual.

      Comment

      • ksiowa
        Junior Member
        • Jul 2011
        • 2

        #4
        Originally posted by DZhang View Post
        Which step did you notice the program spent most of the time? Below is one entry in the Cufflinks FAQ:

        I'm trying to assemble a sample. Cufflinks is almost done, but it seems to be hanging at "99% complete". What's going on?

        Cufflinks spawns threads for each locus to assemble and quantitate the "bundle" of reads in that locus. Some loci may have more reads and more complicated alternative splicing than others, which requires more CPU cycles. These bundles can continue processing long after all others have completed, leading to this behavior. You may be able to decrease the number of such bundles by masking out ribosomal and mitochondrial RNA using the -M/--mask-file option described in the Manual.
        We noticed it hanging around 71% for a particularly long time one day, but since we had to leave it running over several nights, it's hard to say whether or not this was unusual.

        Also, it seems to take about the same amount of time regardless of how many threads we tell it to use (we've tried 1, 4, 7, and 8). However, that answer from the FAQ makes it sound like cufflinks spawns threads automatically, so we're wondering if maybe we misunderstood the -p option?

        Comment

        • DZhang
          Senior Member
          • Jun 2010
          • 177

          #5
          I believe the -p option at least works for bowtie.

          Douglas

          Comment

          • dhiralphadke
            Junior Member
            • Mar 2011
            • 2

            #6
            We have ~175 million aligned reads. I am running cufflinks version 1.0.3. It has been running since last 8 days and is not yet completed. The cufflinks output files are being updated after 1 or 2 days. Is this normal?

            here are the cufflinks options I used:
            cufflinks --GTF-guide refseq.gtf --frag-bias-correct /indexes --multi-read-correct -p 12

            The machine specs are:
            32 processors (2.4 GHz each)
            512 GB RAM

            Any thoughts would be greatly appreciated!

            Thanks!

            -Dhiral.

            Comment

            • thurisaz
              Member
              • Jun 2011
              • 24

              #7
              I think you should consider using a mask file (see post #3 above). cufflinks was also taking a long time to run on my data; when I had a look at the region where it was stalling, I could see that very many reads were aligning there. Creating a GFF file to mask these regions (with -M) solved the problem in my case.

              Comment

              • dhiralphadke
                Junior Member
                • Mar 2011
                • 2

                #8
                Originally posted by thurisaz View Post
                I think you should consider using a mask file (see post #3 above). cufflinks was also taking a long time to run on my data; when I had a look at the region where it was stalling, I could see that very many reads were aligning there. Creating a GFF file to mask these regions (with -M) solved the problem in my case.
                You created a mask file with the regions that it was stalling at? There could be valid transcripts in those regions if several reads were aligning there. Or did you mask out specific ribosomal RNA and mitochondiral RNA regions?

                Comment

                • thurisaz
                  Member
                  • Jun 2011
                  • 24

                  #9
                  Yes, I created a mask file to exclude the regions it was stalling at, since they were hugely over-represented and the analysis wouldn't finish otherwise. Comparing them now, however, I see that they do cover the annotated rRNA as well as some extra regions:

                  Code:
                  [B]Problem areas in my run:[/B]
                  Chr2    TAIR10  exon    1900    10200   .       .       .       ID=Chr2_problem_area
                  Chr3    TAIR10  exon    14143000        14145000        .       .       .       ID=Chr3_problem_area1
                  Chr3    TAIR10  exon    14195800        14204100        .       .       .       ID=Chr3_problem_area2
                  
                  [B]Annotated rRNA:[/B]
                  Chr2  TAIR10  rRNA  5782  5945  . + . ID=AT2G01020.1;Parent=AT2G01020;Name=AT2G01020.1;Index=1                                                                                       
                  Chr3  TAIR10  rRNA  14197677  14199484  . + . ID=AT3G41768.1;Parent=AT3G41768;Name=AT3G41768.1;Index=1                                                                               
                  Chr3  TAIR10  rRNA  14199753  14199916  . + . ID=AT3G41979.1;Parent=AT3G41979;Name=AT3G41979.1;Index=1

                  Comment

                  • zorph
                    Member
                    • May 2010
                    • 40

                    #10
                    did anyone find a way around getting Cufflinks to work faster on a large file without masking transcripts or making cufflinks run for a longer period of time?

                    ****I wish I could just divide the file in half and then figure out a way to merge the FPKMs***

                    Comment

                    • sudders
                      Member
                      • Dec 2011
                      • 32

                      #11
                      Originally posted by zorph View Post
                      did anyone find a way around getting Cufflinks to work faster on a large file without masking transcripts or making cufflinks run for a longer period of time?

                      ****I wish I could just divide the file in half and then figure out a way to merge the FPKMs***
                      You could in theory divide the input bams by which chromosomes the reads map to and then run a seperate cufflinks process for each chromosome. You'd have to find some way to renormalise the FPKMs afterwards.

                      Comment

                      Latest Articles

                      Collapse

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by SEQadmin2, Yesterday, 11:58 AM
                      0 responses
                      13 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-05-2026, 10:09 AM
                      0 responses
                      25 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-04-2026, 08:59 AM
                      0 responses
                      36 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-02-2026, 12:03 PM
                      0 responses
                      60 views
                      0 reactions
                      Last Post SEQadmin2  
                      Working...