I believe this is a very simple question, but since I am a scripting noob, I am completely lost:
In the merged.gtf (cuffmerge output), each transcript is assigned a TCONS_##### number. I have a list of differntially expressed isoforms with exactly these IDs and I am interested in the corresponding refGene annotations. This won't work for novel transcripts, sure but let's say I disregard them. Could anyone please instruct me on how to extract these numbers from the merged.gtf? I don't mind using R or grep or sed or awk, no matter what. I just need an idea where to start.
In brief: I have a list of TCONS numbers and I would like to extract the corresponding nearest_ref from this type of data:
Thanks a lot!
Markus
In the merged.gtf (cuffmerge output), each transcript is assigned a TCONS_##### number. I have a list of differntially expressed isoforms with exactly these IDs and I am interested in the corresponding refGene annotations. This won't work for novel transcripts, sure but let's say I disregard them. Could anyone please instruct me on how to extract these numbers from the merged.gtf? I don't mind using R or grep or sed or awk, no matter what. I just need an idea where to start.
In brief: I have a list of TCONS numbers and I would like to extract the corresponding nearest_ref from this type of data:
Code:
chr1 Cufflinks exon 943908 944581 . + . gene_id "XLOC_000005"; transcript_id "TCONS_00000016"; exon_number "14"; gene_name "SAMD11"; oId "NM_152486"; nearest_ref "NM_152486"; class_code "="; tss_id "TSS6"; p_id "P2";
Markus