Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cufflinks Duplicate GFF ID: will it affect the result?

    Dear All,



    I recently started to used cufflinks to assemble transcripts from my RNA-seq data. With intention to do a Reference Annotation Based Transcript (RABT) assembly, an annotation GTF file of hg19 downloaded from GENCODE was supplied to cufflinks. The following command line was used on several samples:



    Code:
    cufflinks -p 16 -g gencode.v10.annotation.gtf -M gencode.v10.annotation.rRNA.gtf -b hg19.fa -u -N -o output sorted.bam


    All of the run had the almost identical error message at the end, only the part within the single quotes were different for 3 samples.



    The error:



    Code:
    [23:28:52] Loading reference annotation and sequence.
    
    Error: duplicate GFF ID 'ENST00000361547.2' encountered!


    I found loads of transcript IDs in the annotation file were found in the cufflinks output transcripts.gtf with FPKM and coverage values (non-zero), so I guess the 'duplicated ID' errors didn't suspend the run. But does it have other effects on the assembly result? (For example: ENST00000361547 is ignored and novel transcripts were identified on the same loci)



    I noticed that someone chose to amend the annotation file to overcome the problem. But each of cufflnks run can take more than a week to finish, I hope somebody can help me with quicker fix without runing cufflinks from the very begainning. Many thanks.



    Here's the bit of ENST00000361547 in the GTF annotation file:



    Code:
    chr1	HAVANA	transcript	26126667	26144713	.	+	.	gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1";
    
    chr1	HAVANA	Selenocysteine	26128584	26128586	.	+	.	gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1";
    
    chr1	HAVANA	Selenocysteine	26139280	26139282	.	+	.	gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1";
    
    chr1	HAVANA	exon	26126667	26126904	.	+	.	gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1";
    
    chr1	HAVANA	CDS	26126722	26126904	.	+	0	gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1";
    
    chr1	HAVANA	start_codon	26126722	26126724	.	+	0	gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1";
    
    chr1	HAVANA	exon	26127534	26127651	.	+	.	gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1";
    
    chr1	HAVANA	CDS	26127534	26127651	.	+	0	gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1";
    
    chr1	HAVANA	exon	26128507	26128608	.	+	.	gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1";
    
    chr1	HAVANA	CDS	26128507	26128608	.	+	2	gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1";
    
    chr1	HAVANA	exon	26131633	26131766	.	+	.	gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1";
    
    chr1	HAVANA	CDS	26131633	26131766	.	+	2	gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1";
    
    chr1	HAVANA	exon	26135071	26135280	.	+	.	gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1";
    
    chr1	HAVANA	CDS	26135071	26135280	.	+	0	gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1";
    
    chr1	HAVANA	exon	26135517	26135641	.	+	.	gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1";
    
    chr1	HAVANA	CDS	26135517	26135641	.	+	0	gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1";
    
    chr1	HAVANA	exon	26136174	26136311	.	+	.	gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1";
    
    chr1	HAVANA	CDS	26136174	26136311	.	+	1	gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1";
    
    chr1	HAVANA	exon	26137945	26138026	.	+	.	gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1";
    
    chr1	HAVANA	CDS	26137945	26138026	.	+	1	gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1";
    
    chr1	HAVANA	exon	26138182	26138370	.	+	.	gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1";
    
    chr1	HAVANA	CDS	26138182	26138370	.	+	0	gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1";
    
    chr1	HAVANA	exon	26139178	26139283	.	+	.	gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1";
    
    chr1	HAVANA	CDS	26139178	26139283	.	+	0	gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1";
    
    chr1	HAVANA	exon	26140372	26140484	.	+	.	gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1";
    
    chr1	HAVANA	CDS	26140372	26140484	.	+	2	gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1";
    
    chr1	HAVANA	exon	26140568	26140669	.	+	.	gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1";
    
    chr1	HAVANA	CDS	26140568	26140669	.	+	0	gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1";
    
    chr1	HAVANA	exon	26142039	26144713	.	+	.	gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1";
    
    chr1	HAVANA	CDS	26142039	26142206	.	+	0	gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1";
    
    chr1	HAVANA	stop_codon	26142207	26142209	.	+	0	gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1";
    
    chr1	HAVANA	UTR	26126667	26126721	.	+	.	gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1";
    
    chr1	HAVANA	UTR	26142207	26144713	.	+	.	gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1";
    Last edited by byb121; 03-26-2012, 12:30 AM.

  • #2
    Hope it's not a too stupid to ask.

    Any suggestion is welcome.

    Comment


    • #3
      Have had exactly the same problem. Would really be lovely to know how to get around this bug....

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM
      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      30 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      32 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      28 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      52 views
      0 likes
      Last Post seqadmin  
      Working...
      X