Hi,
I've been using cufflinks/tophats to work with 28 RNASeq libraries (some of which below to group A, and some to group B) and so far I've the following workflow:
When I look at the merged.gtf file produced from cuffmerge, I see the attribute p_id added to some of the lines. According to the cufflinks website, p_id means:
I've been struggling to understand exactly what this even means. They must be referring to the CDS features in my refseq.gtf file? Cuffdiff is supposed to output a cds_exp.diff file which according to the website:
When I look at the merged.gtf file, I have yet to see transcript share the same p_id. Only exons of the same transcript share the same p_id. Am I missing something here? What does the p_id attribute even truly mean? Is my pipeline wrong here or something?
Thanks,
Fong
I've been using cufflinks/tophats to work with 28 RNASeq libraries (some of which below to group A, and some to group B) and so far I've the following workflow:
- Ran tophat on each library with refseq gene model annotations
- Ran cufflinks on each library with refseq to Reference Based Assembly
- Ran cuffmerge on all the libraries with refseq
- Running cuffdiff between two groups merged.gtf from cuffmerge
When I look at the merged.gtf file produced from cuffmerge, I see the attribute p_id added to some of the lines. According to the cufflinks website, p_id means:
The ID of the coding sequence this transcript contains. This attribute is attached by Cuffcompare to the .combined.gtf records only when it is run with a reference annotation that include CDS records. Further, differential CDS analysis is only performed when all isoforms of a gene have p_id attributes, because neither Cufflinks nor Cuffcompare attempt to assign an open reading frame to transcripts.
Coding sequence differential FPKM. Tests differences in the summed FPKM of transcripts sharing each p_id independent of tss_id
Thanks,
Fong
Comment