Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • tguo
    Junior Member
    • Sep 2010
    • 5

    cufflinks - having difficulty reading reference GTF

    Hi all,

    I'm using cufflinks 0.9.0 beta. But the problem persists when I use v0.8.4.

    Everything seems to work fine if I don't use a reference GTF.
    But when I do use a reference (downloaded from the Ensembl FTP "current GTF" folder, Mouse release 59), cuffcompare doesn't seem to read from it.
    (I did manually delete the giant transcript ENSMUST00000127664 from the GTF file to preclude the error message.)

    When I run:
    cuffcompare -r ../Mus_musculus.NCBIM37.56.gtf A.gtf B.gtf
    where A.gtf and B.gtf are the two "transcripts.gtf" generated by cufflinks

    the resulting "combined.gtf" file from cuffcompare does not contain any gene ID or gene name from the mouse reference GTF, and the refmap file is completely empty, too, besides the header row.

    Does anyone have any idea what is going on?

    Another weird thing is, if I run cufflinks with
    "cufflinks -G ../Mus_musculus.NCBIM37.56.gtf A.sam",
    then cufflinks does read the reference GTF, but in the output "transcripts.gtf", the estimated transcript levels are 0 for all transcripts...
    However, if I run cufflinks without "-G", then the level estimation is correct.

    Any ideas or help are highly appreciated!

    TG
    Last edited by tguo; 09-30-2010, 10:03 PM.
  • ldong
    Member
    • May 2010
    • 15

    #2
    Hi, tguo,
    It looks like you don't have enough reads as input. Please check how many reads are in your fastq file? To test cufflinks, I first used 1000 reads and got the same problem as yours. Later on, I used 10000 reads, then cuffcompare produces transcript expression level. But the problem is that I don't know how to get the gene expression level. Do you have any idea on this?
    Best,
    ldong

    Comment

    • tguo
      Junior Member
      • Sep 2010
      • 5

      #3
      Hi ldong,

      I guess gene expression levels are in the "genes.expr" output file.

      I didn't think too few reads were the reason causing the problem. I was using a 10M read RNA-seq result as an input.. the SAM input into cufflinks were several gigabytes..

      Thanks,

      Ting

      Comment

      • ldong
        Member
        • May 2010
        • 15

        #4
        New version of tophat works!
        Last edited by ldong; 10-08-2010, 07:49 AM.

        Comment

        • Pejman
          Member
          • Jul 2010
          • 23

          #5
          Originally posted by tguo View Post
          Hi all,


          Everything seems to work fine if I don't use a reference GTF.
          I have the same problem. I'm using cufflinks-0.9.1.Linux_x86_64 and trying to get it to produce expression levels based on a GTF file. If I run it without a reference GTF file, it works fine, gives some expressions. But with reference:
          Code:
          $cufflinks -G hg18_annotation.gtf MyFile.sam
          it gives:
          Code:
          [20:24:17] Inspecting reads and determining fragment length distribution.
          > Processing Locus chr6:27032750-27099732      [*****************        ]  69%
          Error: this SAM file doesn't appear to be correctly sorted!
                  current hit is at chr10:272501, last one was at chr9:139953334
          Cufflinks requires that if your file has SQ records in
          the SAM header that they appear in the same order as the chromosomes names 
          in the alignments.
          If there are no SQ records in the header, or if the header is missing,
          the alignments must be sorted lexicographically by chromsome
          name and by position.

          The GTF file I downloaded from UCSC genomebrowser portal and .sam file is produced by Bowtie, and then sorted by Samtools. Does anybody have any clue?

          Comment

          • ldong
            Member
            • May 2010
            • 15

            #6
            Hi, Pejman,
            I got the same error. To walk around it I use the gtf file when I call cuffcompare. See our wiki site for detail: https://sites.google.com/a/brown.edu...nks-and-tophat
            Best,
            ldong

            Comment

            • adarob
              Member
              • Jul 2010
              • 71

              #7
              Pejman,

              Try remapping with the newest version of TopHat and your problem should disappear. Alternatively, you can add a proper header to your SAM file.

              -Adam

              Comment

              • Pejman
                Member
                • Jul 2010
                • 23

                #8
                @ ldong Thanks for the nice link!

                I managed to get around the problem, by manually adding the SQ headers to the sorted same file! They were dumped out during the SAM->BAM->sort->SAM process

                But about TopHAt, I have tried to use it, so far have not been so successful, it gives errors. I'm using SOLiD data.

                Comment

                • Pejman
                  Member
                  • Jul 2010
                  • 23

                  #9
                  @Adam:
                  I'm not interested in new splice junctions, just looking for differential expression of the known transcripts using Cufflinks. However, I kinda think that it would be still better to use tophat (without new splice junction prediction) for the alignment part than bowtie, in order to get more accurate results on the genes that have isoforms. What do you think as a Cufflinker?

                  Comment

                  • ldong
                    Member
                    • May 2010
                    • 15

                    #10
                    Hi adarob,
                    Thank you for pointing to the right direction. I downloaded new version tophat binary. Now it can read gtf file. I assume in this way, tophat will be more accurate. But when I compare the output file accepted_hit.sam between using and not using gtf on a small data set, there is no difference. By the way, the new tophat does not output sam, I have to use samtools to convert the file:
                    samtools view -h -o accepted_hits.sam accepted_hits.bam
                    Any commnents on this? Best, ldong

                    Comment

                    • adarob
                      Member
                      • Jul 2010
                      • 71

                      #11
                      Idong,

                      I don't work on TopHat so I can't answer your question about the GTF. You can contact Daehwan from the TopHat page. As for the bam output, conversion is not necessary since Cufflinks 0.9.x takes bam as input.

                      -Adam

                      Comment

                      • ldong
                        Member
                        • May 2010
                        • 15

                        #12
                        Hi, Adam,
                        Thanks. I didn't realize new cufflnks can read bam. Now I am trying to see why UCSC genome browser can not read the bam file. Best, ldong

                        Comment

                        • tguo
                          Junior Member
                          • Sep 2010
                          • 5

                          #13
                          I think I've figured out what the problem was...

                          I have used a UCSC mm9 reference assembly against an EMSEMBL GTF, and that apparently was the problem. Some attributes of the two apparently doesn't match (didn't look in to see exactly why).

                          If instead, I use a UCSC GTF for mm9 reference, or conversely, use the EMSEMBL m_musculus_ncbi37 reference for EMSEMBL GTF, it seems to work.

                          Comment

                          • ldong
                            Member
                            • May 2010
                            • 15

                            #14
                            Hi, tguo,
                            I think the reason is that reference sequence ID in mm9 does not match the chromosome Id in the GTF file.
                            By the way, where did you download the UCSC GTF file? Could you remind me? I could not find it anywhere. I use a GFF file from AceView.
                            Thanks,
                            ldong

                            Comment

                            • tguo
                              Junior Member
                              • Sep 2010
                              • 5

                              #15
                              Sure, from the UCSC table browser:

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM
                              • SEQadmin2
                                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                by SEQadmin2


                                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                Introduction

                                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                05-22-2026, 06:42 AM
                              • SEQadmin2
                                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                by SEQadmin2

                                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                05-06-2026, 09:04 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Yesterday, 08:59 AM
                              0 responses
                              14 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              22 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 11:40 AM
                              0 responses
                              19 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-28-2026, 11:40 AM
                              0 responses
                              32 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...