Hi all,
I am trying to set up an RNAseq work flow:
1. Generated genome files for STAR using .fna files from NCBI ftp and gtf files from Gencode;
2. Aligned fq using STAR, convert sam to bam and sorted bam.
3. Then I used the sorted bam files to test cufflinks and compared different gtf files for the -G option. The cufflinks output somehow all have different positions for the same genes:
refSeq:
gene_id gene_short_name locus
PDIA3 - chr15:44038589-44064804
CD276 - chr15:73976621-74006859
PROM2 - chr2:95940200-95957055
gencode:
gene_id gene_short_name locus
ENSG00000167004.12 PDIA3 chr15:43746391-43773279
ENSG00000103855.17 CD276 chr15:73683965-73714518
ENSG00000155066.15 PROM2 chr2:95274452-95291308
And the FPKM as a result are very different in the two output.
What am I missing here and how to fix it, please? If the two gtf are inherently different in regard to gene loci, which one should I trust, pls?
Best,
Grace
I am trying to set up an RNAseq work flow:
1. Generated genome files for STAR using .fna files from NCBI ftp and gtf files from Gencode;
2. Aligned fq using STAR, convert sam to bam and sorted bam.
3. Then I used the sorted bam files to test cufflinks and compared different gtf files for the -G option. The cufflinks output somehow all have different positions for the same genes:
refSeq:
gene_id gene_short_name locus
PDIA3 - chr15:44038589-44064804
CD276 - chr15:73976621-74006859
PROM2 - chr2:95940200-95957055
gencode:
gene_id gene_short_name locus
ENSG00000167004.12 PDIA3 chr15:43746391-43773279
ENSG00000103855.17 CD276 chr15:73683965-73714518
ENSG00000155066.15 PROM2 chr2:95274452-95291308
And the FPKM as a result are very different in the two output.
What am I missing here and how to fix it, please? If the two gtf are inherently different in regard to gene loci, which one should I trust, pls?
Best,
Grace
Comment