Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cufflinks Dropping Transcripts

    Hello all,

    I am currently using cufflinks version 2.0.2 to analyze a set of pig RNAseq data. I am running cufflinks with the -g option using the Ensembl gtf for the latest build. When I do a count of the unique gene names in the ensembl gtf, there are 25,009 genes. I run the data through cufflinks using the default settings utilizing the -g option and when I check the output gtf, I only get 23,917 unique Ensembl gene names.

    The documentation says that cufflinks will include all genes in the gtf in the output.
    Tells Cufflinks to use the supplied reference annotation (GFF) to guide RABT assembly. Reference transcripts will be tiled with faux-reads to provide additional information in assembly. Output will include all reference transcripts as well as any novel genes and isoforms that are assembled.
    My question is this: Why would cufflinks be dropping these genes? Are there built in settings in cufflinks that would cause it to drop genes from different regions?

    To check to make sure that these genes did indeed have reads aligning to the region they were in, I ran htseq using the tophat output and the original Ensembl gtf file and get read counts to those genes.

  • #2
    We have found, in human as well as several other model systems, that Cufflinks will "trip" and not assemble transcripts (and, surprisingly, not even include the ones in the reference!) when there are too many reads in the region. Basically, I think it just gives up and falls over when trying to untangle the De Bruijn graph.

    I circumvent this by adding the reference gtf at the cuffmerge stage, using alternative tools for DE, and visually checking any calls that cufflinks is making in regions I am interested in by comparing the cufflinks output and the wiggle/ bamAsBed of the actual reads.

    I am also interested in WTF it is doing this and why. And am leaving this comment so you know you're not alone.

    Comment


    • #3
      Thanks for the info. We had assumed that for some reason, there were too many reads in the area and *cufflinks* was choking.

      Edit: Sorry, I was writing in a hurry and said tophat when I meant cufflinks. We know it's cufflinks choking because we have ran other analysis and know that there millions of reads in the particular region we were looking at, so tophat did align reads to that region.
      Last edited by ercfrtz; 11-12-2012, 06:33 AM.

      Comment


      • #4
        tophat was choking.
        An easy way to see whether it's top or cuff that is choking is to visualize the data by doing either a bamToBed or bamToWiggle to see how many reads are being mapped to the offending loci. I do this with bedtools or RSeQC + ucsc tools.

        Comment


        • #5
          Originally posted by dvanic View Post
          visualize the data by either a bamToBed or bamToWiggle to see how many reads are being mapped to the offending loci. I do this with bedtools or RSeQC + ucsc tools.
          Or, more simply, drag and drop the BAM file into IGV browser and look at the locus.

          Comment


          • #6
            drag and drop the BAM file into IGV browser and look at the locus.
            Yup, also works. I just upload both the tophat wiggles and the cufflinks gtfs and compare them visually in UCSC with known annotations + features. You could do this in IGV as well.

            Comment


            • #7
              Check out the --max-bundle-frags option for cufflinks. I don't know why anyone would need the option to skip loci due to excessive coverage but they put it in there. Maybe raising that value will fix the issue...or maybe cufflinks will explode. It says in there that skipped loci are reported in the skipped.gtf file but I've never seen anything in that file even though every time I run it it skips at least one bundle.
              /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
              Salk Institute for Biological Studies, La Jolla, CA, USA */

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:37 PM
              0 responses
              10 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 06:07 PM
              0 responses
              9 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              51 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              67 views
              0 likes
              Last Post seqadmin  
              Working...
              X