Hey,
I've ran the cufflinks pipeline with both flybase release 5 and UCSC release 3 Drosophila melanogaster annotations, and I'm getting strange annotations in the outputs. The example below is for the UCSC annotation, since it ended up working better with cufflinks overall (but this problem was also the case for the flybase annotation).
CuffMerge entries for "galectin" gene
chr2L Cufflinks exon 21821 22941 . + . gene_id "XLOC_000002"; transcript_id "TCONS_00000005"; exon_number "1"; gene_name "galectin"; oId "CUFF.59.1"; nearest_ref "NM_001272859"; class_code "j"; tss_id "TSS2";
chr2L Cufflinks exon 22998 23422 . + . gene_id "XLOC_000002"; transcript_id "TCONS_00000005"; exon_number "2"; gene_name "galectin"; oId "CUFF.59.1"; nearest_ref "NM_001272859"; class_code "j"; tss_id "TSS2";
chr2L Cufflinks exon 74903 75018 . + . gene_id "XLOC_000002"; transcript_id "TCONS_00000005"; exon_number "3"; gene_name "galectin"; oId "CUFF.59.1"; nearest_ref "NM_001272859"; class_code "j"; tss_id "TSS2";
chr2L Cufflinks exon 75078 76276 . + . gene_id "XLOC_000002"; transcript_id "TCONS_00000005"; exon_number "4"; gene_name "galectin"; oId "CUFF.59.1"; nearest_ref "NM_001272859"; class_code "j"; tss_id "TSS2";
UCSC annotation entries for "galectin"
chr2L unknown exon 71757 71804 . + . gene_id "galectin"; gene_name "galectin"; p_id "P18001"; transcript_id "NM_001258884"; tss_id "TSS6545";
chr2L unknown exon 71950 72081 . + . gene_id "galectin"; gene_name "galectin"; p_id "P18001"; transcript_id "NM_001258884"; tss_id "TSS6545";
chr2L unknown CDS 72013 72081 . + 0 gene_id "galectin"; gene_name "galectin"; p_id "P18001"; transcript_id "NM_001258884"; tss_id "TSS6545";
chr2L unknown start_codon 72013 72015 . + . gene_id "galectin"; gene_name "galectin"; p_id "P18001"; transcript_id "NM_001258884"; tss_id "TSS6545";
chr2L unknown exon 72387 72977 . + . gene_id "galectin"; gene_name "galectin"; p_id "P12530"; transcript_id "NM_134643"; tss_id "TSS12137";
chr2L unknown CDS 72603 72977 . + 0 gene_id "galectin"; gene_name "galectin"; p_id "P12530"; transcript_id "NM_134643"; tss_id "TSS12137";
chr2L unknown start_codon 72603 72605 . + . gene_id "galectin"; gene_name "galectin"; p_id "P12530"; transcript_id "NM_134643"; tss_id "TSS12137";
chr2L unknown exon 73485 73692 . + . gene_id "galectin"; gene_name "galectin"; p_id "P8803"; transcript_id "NM_001169367"; tss_id "TSS3981";
chr2L unknown exon 73485 73692 . + . gene_id "galectin"; gene_name "galectin"; p_id "P9464"; transcript_id "NM_001272859"; tss_id "TSS3981";
chr2L unknown CDS 73570 73692 . + 0 gene_id "galectin"; gene_name "galectin"; p_id "P8803"; transcript_id "NM_001169367"; tss_id "TSS3981";
chr2L unknown start_codon 73570 73572 . + . gene_id "galectin"; gene_name "galectin"; p_id "P8803"; transcript_id "NM_001169367"; tss_id "TSS3981";
chr2L unknown exon 73820 73897 . + . gene_id "galectin"; gene_name "galectin"; p_id "P9464"; transcript_id "NM_001272859"; tss_id "TSS3981";
chr2L unknown exon 74129 74572 . + . gene_id "galectin"; gene_name "galectin"; p_id "P7409"; transcript_id "NM_001169366"; tss_id "TSS12421";
chr2L unknown CDS 74501 74572 . + 0 gene_id "galectin"; gene_name "galectin"; p_id "P7409"; transcript_id "NM_001169366"; tss_id "TSS12421";
chr2L unknown start_codon 74501 74503 . + . gene_id "galectin"; gene_name "galectin"; p_id "P7409"; transcript_id "NM_001169366"; tss_id "TSS12421";
chr2L unknown CDS 74903 75018 . + 0 gene_id "galectin"; gene_name "galectin"; p_id "P8803"; transcript_id "NM_001169367"; tss_id "TSS3981";
chr2L unknown CDS 74903 75018 . + 0 gene_id "galectin"; gene_name "galectin"; p_id "P12530"; transcript_id "NM_134643"; tss_id "TSS12137";
chr2L unknown CDS 74903 75018 . + 0 gene_id "galectin"; gene_name "galectin"; p_id "P18001"; transcript_id "NM_001258884"; tss_id "TSS6545";
chr2L unknown CDS 74903 75018 . + 0 gene_id "galectin"; gene_name "galectin"; p_id "P7409"; transcript_id "NM_001169366"; tss_id "TSS12421";
chr2L unknown exon 74903 75018 . + . gene_id "galectin"; gene_name "galectin"; p_id "P8803"; transcript_id "NM_001169367"; tss_id "TSS3981";
chr2L unknown exon 74903 75018 . + . gene_id "galectin"; gene_name "galectin"; p_id "P12530"; transcript_id "NM_134643"; tss_id "TSS12137";
chr2L unknown exon 74903 75018 . + . gene_id "galectin"; gene_name "galectin"; p_id "P18001"; transcript_id "NM_001258884"; tss_id "TSS6545";
chr2L unknown exon 74903 75018 . + . gene_id "galectin"; gene_name "galectin"; p_id "P7409"; transcript_id "NM_001169366"; tss_id "TSS12421";
chr2L unknown exon 74903 75018 . + . gene_id "galectin"; gene_name "galectin"; p_id "P9464"; transcript_id "NM_001272859"; tss_id "TSS3981";
chr2L unknown CDS 75078 76095 . + 1 gene_id "galectin"; gene_name "galectin"; p_id "P8803"; transcript_id "NM_001169367"; tss_id "TSS3981";
chr2L unknown CDS 75078 76095 . + 1 gene_id "galectin"; gene_name "galectin"; p_id "P12530"; transcript_id "NM_134643"; tss_id "TSS12137";
chr2L unknown CDS 75078 76095 . + 1 gene_id "galectin"; gene_name "galectin"; p_id "P18001"; transcript_id "NM_001258884"; tss_id "TSS6545";
chr2L unknown CDS 75078 76095 . + 1 gene_id "galectin"; gene_name "galectin"; p_id "P7409"; transcript_id "NM_001169366"; tss_id "TSS12421";
chr2L unknown exon 75078 76211 . + . gene_id "galectin"; gene_name "galectin"; p_id "P8803"; transcript_id "NM_001169367"; tss_id "TSS3981";
chr2L unknown exon 75078 76211 . + . gene_id "galectin"; gene_name "galectin"; p_id "P12530"; transcript_id "NM_134643"; tss_id "TSS12137";
chr2L unknown exon 75078 76211 . + . gene_id "galectin"; gene_name "galectin"; p_id "P18001"; transcript_id "NM_001258884"; tss_id "TSS6545";
chr2L unknown exon 75078 76211 . + . gene_id "galectin"; gene_name "galectin"; p_id "P7409"; transcript_id "NM_001169366"; tss_id "TSS12421";
chr2L unknown exon 75078 76211 . + . gene_id "galectin"; gene_name "galectin"; p_id "P9464"; transcript_id "NM_001272859"; tss_id "TSS3981";
chr2L unknown CDS 75280 76095 . + 0 gene_id "galectin"; gene_name "galectin"; p_id "P9464"; transcript_id "NM_001272859"; tss_id "TSS3981";
chr2L unknown start_codon 75280 75282 . + . gene_id "galectin"; gene_name "galectin"; p_id "P9464"; transcript_id "NM_001272859"; tss_id "TSS3981";
chr2L unknown stop_codon 76096 76098 . + . gene_id "galectin"; gene_name "galectin"; p_id "P8803"; transcript_id "NM_001169367"; tss_id "TSS3981";
chr2L unknown stop_codon 76096 76098 . + . gene_id "galectin"; gene_name "galectin"; p_id "P12530"; transcript_id "NM_134643"; tss_id "TSS12137";
chr2L unknown stop_codon 76096 76098 . + . gene_id "galectin"; gene_name "galectin"; p_id "P18001"; transcript_id "NM_001258884"; tss_id "TSS6545";
chr2L unknown stop_codon 76096 76098 . + . gene_id "galectin"; gene_name "galectin"; p_id "P7409"; transcript_id "NM_001169366"; tss_id "TSS12421";
chr2L unknown stop_codon 76096 76098 . + . gene_id "galectin"; gene_name "galectin"; p_id "P9464"; transcript_id "NM_001272859"; tss_id "TSS3981";
As you can see, in the annotation galectin starts at 71757 on 2L, ending at 76098, however cufflinks has placed it starting at 21821 and ending at 76276.
Any thoughts on this would be very much appreciated.
Thanks,
Gordon
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
Latest Articles
Collapse
-
by seqadmin
Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.
Long-Read Sequencing
Long-read sequencing has seen remarkable advancements,...-
Channel: Articles
12-02-2024, 01:49 PM -
-
by seqadmin
The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben MartÃnez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...-
Channel: Articles
11-06-2024, 07:24 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 12-02-2024, 09:29 AM
|
0 responses
153 views
0 likes
|
Last Post
by seqadmin
12-02-2024, 09:29 AM
|
||
Started by seqadmin, 12-02-2024, 09:06 AM
|
0 responses
51 views
0 likes
|
Last Post
by seqadmin
12-02-2024, 09:06 AM
|
||
Started by seqadmin, 12-02-2024, 08:03 AM
|
0 responses
44 views
0 likes
|
Last Post
by seqadmin
12-02-2024, 08:03 AM
|
||
Started by seqadmin, 11-22-2024, 07:36 AM
|
0 responses
76 views
0 likes
|
Last Post
by seqadmin
11-22-2024, 07:36 AM
|