Hello wise people 
I am trying to run the cufflinks->cuffmerge->cuffquant->cuffnorm pipeline (Cufflinks 2.2.1) on ~300 samples. The first 3 steps worked fine, and there is one merged gtf file and ~300 .cxb files.
Now I am trying to run cuffnorm:
$ cuffnorm -p 8 -o HypoTsamples --use-sample-sheet $CUFFGTF CXB_samples.txt
This produces the following behavior:
"You are using Cufflinks v2.2.1, which is the most recent release.
[17:48:10] Loading reference annotation.
Killed " <- about an hour later
Running cuffnorm with small subset of the CXB_samples.txt works just fine (takes ~10 minutes for 6 samples).
head of CXB_samples.txt looks like this:
sample_id group_id
/u/scratch/y/yhasin/Hypo_SampleBAM/1_CuffQuant/HypoT129X1zSvJzOBz1301/abundances.cxb HypoT129X1zSvJzOBz1301
/u/scratch/y/yhasin/Hypo_SampleBAM/1_CuffQuant/HypoT129X1zSvJzOBz1302/abundances.cxb HypoT129X1zSvJzOBz1302
/u/scratch/y/yhasin/Hypo_SampleBAM/1_CuffQuant/HypoT129X1zSvJzOBz1303/abundances.cxb HypoT129X1zSvJzOBz1303
/u/scratch/y/yhasin/Hypo_SampleBAM/1_CuffQuant/HypoTAKRzJzOBz1398/abundances.cxb HypoTAKRzJzOBz1398
/u/scratch/y/yhasin/Hypo_SampleBAM/1_CuffQuant/HypoTAKRzJzOBz1399/abundances.cxb HypoTAKRzJzOBz1399
/u/scratch/y/yhasin/Hypo_SampleBAM/1_CuffQuant/HypoTAKRzJzOBz1400/abundances.cxb HypoTAKRzJzOBz1400
/u/scratch/y/yhasin/Hypo_SampleBAM/1_CuffQuant/HypoTAXB12zPgnJzOBz1346/abundances.cxb HypoTAXB12zPgnJzOBz1346
/u/scratch/y/yhasin/Hypo_SampleBAM/1_CuffQuant/HypoTAXB13zPgnJzOBz1387/abundances.cxb HypoTAXB13zPgnJzOBz1387
/u/scratch/y/yhasin/Hypo_SampleBAM/1_CuffQuant/HypoTAXB13zPgnJzOBz1388/abundances.cxb HypoTAXB13zPgnJzOBz1388
Do you know if there a limit on number of samples for cuffnorm - either by the program ? Is it memory or number of processors issue (I can increase both if needed)?
What would be the best way to run it on more than the maximum number of samples - what should be the size of a data set? How many samples should overlap between them to get consistent fpkm estimates?
Best,
Yehudit

I am trying to run the cufflinks->cuffmerge->cuffquant->cuffnorm pipeline (Cufflinks 2.2.1) on ~300 samples. The first 3 steps worked fine, and there is one merged gtf file and ~300 .cxb files.
Now I am trying to run cuffnorm:
$ cuffnorm -p 8 -o HypoTsamples --use-sample-sheet $CUFFGTF CXB_samples.txt
This produces the following behavior:
"You are using Cufflinks v2.2.1, which is the most recent release.
[17:48:10] Loading reference annotation.
Killed " <- about an hour later
Running cuffnorm with small subset of the CXB_samples.txt works just fine (takes ~10 minutes for 6 samples).
head of CXB_samples.txt looks like this:
sample_id group_id
/u/scratch/y/yhasin/Hypo_SampleBAM/1_CuffQuant/HypoT129X1zSvJzOBz1301/abundances.cxb HypoT129X1zSvJzOBz1301
/u/scratch/y/yhasin/Hypo_SampleBAM/1_CuffQuant/HypoT129X1zSvJzOBz1302/abundances.cxb HypoT129X1zSvJzOBz1302
/u/scratch/y/yhasin/Hypo_SampleBAM/1_CuffQuant/HypoT129X1zSvJzOBz1303/abundances.cxb HypoT129X1zSvJzOBz1303
/u/scratch/y/yhasin/Hypo_SampleBAM/1_CuffQuant/HypoTAKRzJzOBz1398/abundances.cxb HypoTAKRzJzOBz1398
/u/scratch/y/yhasin/Hypo_SampleBAM/1_CuffQuant/HypoTAKRzJzOBz1399/abundances.cxb HypoTAKRzJzOBz1399
/u/scratch/y/yhasin/Hypo_SampleBAM/1_CuffQuant/HypoTAKRzJzOBz1400/abundances.cxb HypoTAKRzJzOBz1400
/u/scratch/y/yhasin/Hypo_SampleBAM/1_CuffQuant/HypoTAXB12zPgnJzOBz1346/abundances.cxb HypoTAXB12zPgnJzOBz1346
/u/scratch/y/yhasin/Hypo_SampleBAM/1_CuffQuant/HypoTAXB13zPgnJzOBz1387/abundances.cxb HypoTAXB13zPgnJzOBz1387
/u/scratch/y/yhasin/Hypo_SampleBAM/1_CuffQuant/HypoTAXB13zPgnJzOBz1388/abundances.cxb HypoTAXB13zPgnJzOBz1388
Do you know if there a limit on number of samples for cuffnorm - either by the program ? Is it memory or number of processors issue (I can increase both if needed)?
What would be the best way to run it on more than the maximum number of samples - what should be the size of a data set? How many samples should overlap between them to get consistent fpkm estimates?
Best,
Yehudit
Comment