Hello everyone,
I have a little problem with using the cufflinks pipeline. My data consists of RNA Seq reads (one experiment single end and one experiment paired end) which have to be mapped against the human hg19 genome and finally assembled to transcripts.
I started my experiments with the new tophat version (1.4.1) to map the reads. I tried it with an annotation file (Illumina package for cufflinks with ncbi annotation) and without one. There seem to be no problem with the mapping, so i started the next step using cufflinks with the same annotation file.
And here is the problem. My output consists of two typs of transcripts, exons... . I have transcripts which have a name like in the annotation file, but an FPKM value 0 and transcripts with an cufflinks identifier, that have an FPKM > 0. I tried also using cuffcompare to map the annotation to the cufflinks transcripts but nothing happens.
I have tried three different annotations: ensembl, refseq, ucsc; 2 different parameters for the annotation (-g and -G) and also use the -b option for bias correction. But I dont understand why cufflinks do not label my transcripts with one of the annotation names.
Have anybody the same problem and/or any solution for me?
Here are my commands
And some lines of my output file transcripts.gtf:
I have a little problem with using the cufflinks pipeline. My data consists of RNA Seq reads (one experiment single end and one experiment paired end) which have to be mapped against the human hg19 genome and finally assembled to transcripts.
I started my experiments with the new tophat version (1.4.1) to map the reads. I tried it with an annotation file (Illumina package for cufflinks with ncbi annotation) and without one. There seem to be no problem with the mapping, so i started the next step using cufflinks with the same annotation file.
And here is the problem. My output consists of two typs of transcripts, exons... . I have transcripts which have a name like in the annotation file, but an FPKM value 0 and transcripts with an cufflinks identifier, that have an FPKM > 0. I tried also using cuffcompare to map the annotation to the cufflinks transcripts but nothing happens.
I have tried three different annotations: ensembl, refseq, ucsc; 2 different parameters for the annotation (-g and -G) and also use the -b option for bias correction. But I dont understand why cufflinks do not label my transcripts with one of the annotation names.
Have anybody the same problem and/or any solution for me?
Here are my commands
Code:
cufflinks -p 4 -o ../cufflinks.out.deNovo/ -b ../../../human_genom/Illumina_paket/Homo_sapiens/NCBI/build37.2/Sequence/WholeGenomeFasta/genome.fa -g ../../../human_genom/Illumina_paket/Homo_sapiens/NCBI/build37.2/Annotation/Genes/genes.gtf accepted_hits.bam You are using Cufflinks v1.3.0, which is the most recent release. [16:48:35] Loading reference annotation. [16:48:41] Inspecting reads and determining fragment length distribution. > Processed 408173 loci. [*************************] 100% > Map Properties: > Total Map Mass: 141260632.27 > Fragment Length Distribution: Empirical (learned) > Estimated Mean: 161.78 > Estimated Std Dev: 47.67 [17:55:29] Assembling transcripts and estimating abundances. > Processed 408173 loci. [*************************] 100% [18:51:09] Loading reference annotation and sequence. Warning: couldn't find fasta record for 'chr1'! This contig will not be bias corrected. Warning: couldn't find fasta record for 'chr10'! This contig will not be bias corrected. ... Warning: couldn't find fasta record for 'Un|NT_167236.1'! This contig will not be bias corrected. [18:51:53] Learning bias parameters. > Processed 66680 loci. [*************************] 100% [19:48:34] Re-estimating abundances with bias correction. > Processed 66680 loci.
Code:
cufflinks -p 4 -b /Daten2/rna-seq/human_genom/Illumina_paket/Homo_sapiens/NCBI/build37.2/Sequence/WholeGenomeFasta/genome.fa -o cufflinks.out.ensembl -g ../../human_genom/Homo_sapiens_ENSEMBL.gtf tophat.out.sample1.reference/accepted_hits.bam
Code:
chr feature start end score strand gene_id transcript_id FPKM frac conf_lo conf_hi cov full_read_support chr1 transcript 135727 136178 1000 . CUFF.3 CUFF.3.1 0.2633463896 1 0.104543 0.42215 1.920432 yes chr1 transcript 165727 169225 1 - CUFF.2 CUFF.2.1 0 0 0 0 0 yes chr1 transcript 167772 169225 1000 - CUFF.2 CUFF.2.2 0.9961441143 1 0.769102 1.223186 11 transcript 57219163 57219495 1 + ENSG00000222998 ENST00000411066 0 0 0 0 0 no 11 transcript 57174429 57194523 1 - ENSG00000134802 ENST00000395123 0 0 0 0 0 no 11 transcript 57174429 57194594 1 - ENSG00000134802 ENST00000395124 0 0 0 0 0 no
Comment