Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Robin
    Member
    • Nov 2009
    • 10

    cuffcompare warning message

    This is first time I used your cufflink software. I don't understand some of warning messager from the cuffcompare command line. I am using the lastest version cufflinks-0.8.2.Linux_x86_64.
    I download the reference annotation GTF files (human ensembl and refseq ) from UCSC table browser.
    1) UCSC human ensembl GTF file:
    chr1 hg19_ensGene CDS 67126196 67126207 0.000000 + 0 gene_id "ENST00000237247"; transcript_id "ENST00000237247";
    chr1 hg19_ensGene exon 67126196 67126207 0.000000 + . gene_id "ENST00000237247"; transcript_id "ENST00000237247";
    chr1 hg19_ensGene CDS 67133213 67133224 0.000000 + 0 gene_id "ENST00000237247"; transcript_id "ENST00000237247";
    chr1 hg19_ensGene exon 67133213 67133224 0.000000 + . gene_id "ENST00000237247"; transcript_id "ENST00000237247";
    chr1 hg19_ensGene CDS 67136678 67136702 0.000000 + 0 gene_id "ENST00000237247"; transcript_id "ENST00000237247";
    chr1 hg19_ensGene exon 67136678 67136702 0.000000 + . gene_id "ENST00000237247"; transcript_id "ENST00000237247";
    chr1 hg19_ensGene CDS 67137627 67137678 0.000000 + 2 gene_id "ENST00000237247"; transcript_id "ENST00000237247";

    2) cuffcompare command line:
    /usca/clscratch/geru1/cufflinks-0.8.2.Linux_x86_64/cuffcompare -r /usca/home/geru1/gtf/refgene.gtf -o s_1_and_s_2.txt -R -s /usca/clscratch/geru1/bowtie-0.12.5/indexes/ ./testme/transcripts.gtf ./testme_s2/transcripts.gtf

    3) Warning messager from cuffcompare:

    GFF Warning: discarded overlapping feature segment (3019321-3021003) for GFF ID ENST00000416194
    GFF Warning: discarded overlapping feature segment (2990575-2990576) for GFF ID ENST00000439917
    GFF Warning: discarded overlapping feature segment (2904529-2904530) for GFF ID ENST00000431516
    GFF Warning: discarded overlapping feature segment (2933284-2934966) for GFF ID ENST00000383431
    GFF Warning: discarded overlapping feature segment (2953771-2953772) for GFF ID ENST00000436814
    GFF Warning: discarded overlapping feature segment (2982531-2984213) for GFF ID ENST00000457089
    GFF Warning: discarded overlapping feature segment (2941694-2941695) for GFF ID ENST00000423612
    GFF Warning: discarded overlapping feature segment (2970446-2972128) for GFF ID ENST00000437010
    Warning: transcript ENST00000370343 discarded (structural errors found, length=88047).
    Warning: transcript ENST00000401006 discarded (structural errors found, length=22054).
    Warning: transcript ENST00000465119 discarded (structural errors found, length=35491).
    Warning: transcript ENST00000448632 discarded (structural errors found, length=26138).
    Warning: transcript ENST00000444385 discarded (structural errors found, length=41396).
    Warning: transcript ENST00000447431 discarded (structural errors found, length=30178).
    Warning: transcript ENST00000372433 discarded (structural errors found, length=2407).

    Thank you in advances!

    Robin
  • zorph
    Member
    • May 2010
    • 40

    #2
    bump

    Comment

    • mfischer
      Junior Member
      • Mar 2010
      • 9

      #3
      Hi everybody,

      I ran into the same warnings when running cuffcompare (v0.8.4) with the refFlat or refGene gtf files downloaded from UCSC table browser as reference parameter. When using Ensembl's gtf reference file (which cufflink's manual referes to) everything works fine.

      Here are the first view warnings:
      GFF Warning: discarded overlapping feature segment (43916982-43916984) for GFF ID HYI
      GFF Warning: discarded overlapping feature segment (43916824-43916982) for GFF ID HYI
      Warning: transcript HYI discarded (structural errors found, length=2680).
      And the refFlat entries which seem to cause them: (I don't show all of HYI's exons and CDS)
      chr1 hg19_refFlat stop_codon 43916981 43916983 0.000000 - . gene_id "HYI"; transcript_id "HYI";
      chr1 hg19_refFlat CDS 43916984 43916982 0.000000 - 2 gene_id "HYI"; transcript_id "HYI";
      chr1 hg19_refFlat exon 43916824 43916982 0.000000 - . gene_id "HYI"; transcript_id "HYI";
      chr1 hg19_refFlat CDS 43919266 43919464 0.000000 - 0 gene_id "HYI"; transcript_id "HYI";
      chr1 hg19_refFlat start_codon 43919462 43919464 0.000000 - . gene_id "HYI"; transcript_id "HYI";
      chr1 hg19_refFlat exon 43919266 43919660 0.000000 - . gene_id "HYI"; transcript_id "HYI";
      I recognized that the stop codon outreaches the last exon (ending at 43916982) which causes the first warning. Am I using the wrong gtf reference?

      Are there any recommendations which reference gtf files should be used with cufflinks?

      Thanks in advance

      Comment

      • zun
        Member
        • Oct 2010
        • 26

        #4
        me too

        Hi,everyones

        I have same warnig shown as below,

        GFF Warning: discarded overlapping feature segment (1610953-1611069) for GFF ID Os06t0130100-02
        Warning: transcript Os06t0130100-02 discarded (structural errors found, length=6310).

        I checked my reference GTF file and found that the gene(ID:Os06t0130100)
        has alternative splicing.

        but there are many other genes which have altenative splicing and no warnings.

        What should I do??

        I gave up that gene

        Comment

        • Bacilo
          Junior Member
          • May 2010
          • 5

          #5
          Same problem

          I have the same problem using cufflinks and using -G option in tophat (1.1.2 that admits a GTF annotation file). Does anyone get a solutions or an explanation to this warning message?

          If the problem is alternative splicing perhaps the program is discarding the duplicated exon, present in several mRNAs, and it only counts this exon once to build junctions database.

          Comment

          • adumitri
            Member
            • Jan 2010
            • 27

            #6
            Hi,

            It does not seem like anyone had solved the problem mentioned in this thread, but I am hoping that someone could help me with a similar problem. I am using the latest version of Cufflinks (v0.9.3) and I am getting a lot of warnings that look like this:

            GFF warning: merging adjacent/overlapping segments of ENST00000323801 on chr1 (245133554-245133622, 245133624-245133839)
            GFF warning: merging adjacent/overlapping segments of ENST00000400934 on chr1 (247206093-247206248, 247206251-247206433)
            GFF warning: merging adjacent/overlapping segments of ENST00000400934 on chr1 (247206093-247206433, 247206436-247206753)
            The used .gtf file is the one downloaded from the UCSC browser.
            Does anyone have a clue what the problem might be?

            Even more, further during the Cufflinks run, I get these errors:

            > Processed 32736 loci. [*************************] 100%
            [14:57:01] Re-estimating abundances with bias correction.
            > Processing Locus chr20:18118498-18169031 [************ ] 51%E
            rror: sqrt(det(cov)) == 0, 0.000000 after rounding.
            > Processing Locus chr3:12919020-12926710 [************** ] 56%E
            rror: sqrt(det(cov)) == 0, 0.000000 after rounding.
            > Processing Locus chr3:49977439-50226508 [************** ] 57%E
            rror: sqrt(det(cov)) == 0, 0.000000 after rounding.
            > Processing Locus chr7:99686576-99689823 [******************* ] 79%E
            rror: sqrt(det(cov)) == 0, 0.000000 after rounding.
            > Processed 32736 loci. [*************************] 100%

            Any help would be appreciated,
            Alexandra

            Comment

            • jb2
              Member
              • Jun 2010
              • 25

              #7
              I am also getting similar error messages to adumitri. The sqrt(det(cov)) issue was also mentioned in this thread: http://seqanswers.com/forums/showthread.php?t=6178

              Comment

              • josiah42
                Junior Member
                • Apr 2011
                • 1

                #8
                Since no one else has responded with a solution, I thought this might help:
                Here is the error I was getting:
                GFF warning: merging adjacent/overlapping segments of ENSMUST00000170708 on chr19 (9090282-9092073, 9092076-9092111)
                GFF warning: merging adjacent/overlapping segments of ENSMUST00000170708 on chr19 (9090282-9092111, 9092116-9093685)
                GFF warning: merging adjacent/overlapping segments of ENSMUST00000073056 on chr19 (9290834-9291148, 9291150-9291487)
                GFF warning: merging adjacent/overlapping segments of ENSMUST00000088040 on chr19 (9357869-9358351, 9358354-9358368)
                GFF warning: merging adjacent/overlapping segments of ENSMUST00000088040 on chr19 (9357869-9358368, 9358371-9358391)
                GFF warning: merging adjacent/overlapping segments of ENSMUST00000088040 on chr19 (9357869-9358391, 9358394-9358654)
                I was running cufflinks using the -G option with a gtf file that I downloaded from the UCSC Genome Browser "Tables" page.

                The issue is that I was using Ensembl gene names and they didn't match my data. Switching to using RefSeq gene names fixed the problem. For me, it was as simple as changing the "Track" dropdown box. I hope that helps someone in the future.

                Comment

                • jhb1980
                  Junior Member
                  • Dec 2010
                  • 7

                  #9
                  Hi all,

                  Unsure if this issue has been cleared yet, but I recently encountered the same GFF warning messages using Ensembl's v64 (mm9) *.gtf when running Cufflinks v1.1.0, e.g.:

                  Code:
                  GFF warning: merging adjacent/overlapping segments of ENSMUST00000098967 on chr2 (181331877-181332007, 181332010-181332048)
                  Looking at the gene tracking output files, Cufflinks seems to have merged well over 1,000 reference gene loci. I went through a few of them on the UCSC browser, and it would appear that these merges occur when a reference transcript is annotated to extend into a downstream gene on the same strand. In the attached example, Cufflinks merged Lypla1 and Tcea1 into a single gene locus due to ENMUST**0155020 supposedly extending into Tcea1. I guess it's hard to tell if this is genuine alternative splicing or just an annotation artifact.

                  Looking at the merged reference genes, it's not any of apparent interest to me so I guess I'll live with it for the time being. Other than manually removing the individual transcripts causing the merge from the reference *.gtf, I am not sure if there's any way to suppress these merges in Cufflinks? If so, please let me know!
                  Attached Files

                  Comment

                  Latest Articles

                  Collapse

                  • SEQadmin2
                    From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                    by SEQadmin2


                    Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                    The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                    ...
                    06-02-2026, 10:05 AM
                  • SEQadmin2
                    Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                    by SEQadmin2


                    With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                    Introduction

                    Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                    05-22-2026, 06:42 AM
                  • SEQadmin2
                    Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                    by SEQadmin2

                    Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                    Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                    05-06-2026, 09:04 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by SEQadmin2, 06-02-2026, 12:03 PM
                  0 responses
                  21 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-02-2026, 11:40 AM
                  0 responses
                  14 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 05-28-2026, 11:40 AM
                  0 responses
                  29 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 05-26-2026, 10:12 AM
                  0 responses
                  31 views
                  0 reactions
                  Last Post SEQadmin2  
                  Working...