Hello wise people 
I need help. I am analyzing ~100 samples of mouse RNAseq, Cufflinks version 2.2.1. Aligned with STAR, run through cufflinks, and merged the transcripts.gtf files using cuffmerge. The resultant gtf file is ~1G, with 41K genes/loci and ~380K transcripts.
Question one:
Is it too much? Do you usually filter the transcripts in transcripts.gtf before merging based on coverage, FPKM and status?
Now I am trying to run cuffquant, with 8 processors 4G each on one sample (~40M reads), with the merged.gtf file as reference. It seemed to work fine, but now it is stuck for few hours on
"> Processing Locus chrX:151168793-151474354 [************************ ] 99%".
Question 2:
Is it some parameter issue? What should I do?
command line for cuffquant:
cuffquant -p 8 -o TEST_1sample -u -b path2/genome.fa path2/merged.gtf path2/Sample1.bam
Any help would be much appreciated! Thank you in advance
Yu
------------------------------------------ an UPDATE ----------------------------------
In Russian they say "morning is wiser than evening", so I went to sleep and let my computer continue working. It is almost 10AM now.
The good news Cuffquant seems to overcome this particular locus (which means it was not stuck), but it is extremely slow. Specifically, I see the following:
**
[06:21:14] Learning bias parameters.
[07:14:21] Quantifying expression levels in locus.
> Processing Locus chr3:55586402-55587294 [*** ] 15%
****!!! it is 9:45AM !!!! now ******
If 15% is scalable to reflect the time, cuffquant will finish this step in 16 hours (is it the last step??) !!!!!
So let me rephrase the question - is it normal behavior of cuffquant to process one BAM file for >12 hours on 8 processors?
How much memory per processor does it actually need? Will it be faster if I run it on 32 processors, but with 1 or 2G each?
I would really appreciate if some of you could share your experience regarding running times and requirements of cuffquant, how much does it speed things up afterwards (the next step will be cuffnorm) and if there is a better way to run it. It would be really nice if someone had some benchmarking data on these.
Best,
Yu
--------------------------------------------------------------------------------------------------------------------------

I need help. I am analyzing ~100 samples of mouse RNAseq, Cufflinks version 2.2.1. Aligned with STAR, run through cufflinks, and merged the transcripts.gtf files using cuffmerge. The resultant gtf file is ~1G, with 41K genes/loci and ~380K transcripts.
Question one:
Is it too much? Do you usually filter the transcripts in transcripts.gtf before merging based on coverage, FPKM and status?
Now I am trying to run cuffquant, with 8 processors 4G each on one sample (~40M reads), with the merged.gtf file as reference. It seemed to work fine, but now it is stuck for few hours on
"> Processing Locus chrX:151168793-151474354 [************************ ] 99%".
Question 2:
Is it some parameter issue? What should I do?
command line for cuffquant:
cuffquant -p 8 -o TEST_1sample -u -b path2/genome.fa path2/merged.gtf path2/Sample1.bam
Any help would be much appreciated! Thank you in advance

------------------------------------------ an UPDATE ----------------------------------
In Russian they say "morning is wiser than evening", so I went to sleep and let my computer continue working. It is almost 10AM now.
The good news Cuffquant seems to overcome this particular locus (which means it was not stuck), but it is extremely slow. Specifically, I see the following:
**
[06:21:14] Learning bias parameters.
[07:14:21] Quantifying expression levels in locus.
> Processing Locus chr3:55586402-55587294 [*** ] 15%
****!!! it is 9:45AM !!!! now ******
If 15% is scalable to reflect the time, cuffquant will finish this step in 16 hours (is it the last step??) !!!!!
So let me rephrase the question - is it normal behavior of cuffquant to process one BAM file for >12 hours on 8 processors?
How much memory per processor does it actually need? Will it be faster if I run it on 32 processors, but with 1 or 2G each?
I would really appreciate if some of you could share your experience regarding running times and requirements of cuffquant, how much does it speed things up afterwards (the next step will be cuffnorm) and if there is a better way to run it. It would be really nice if someone had some benchmarking data on these.
Best,

--------------------------------------------------------------------------------------------------------------------------