Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • cufflinks filter output gtf

    I have been running cufflinks on a number of RNA-seq files to find novel transcripts/isoforms. I sequenced fairly deeply and it seems to be calling a ton of novel, short, single exon transcripts, a number of which I think are junk. Do people usually filter these out before running cuffmerge and cuffcompare? Do you just write a small script to manually filter these?
    Thanks

  • #2
    I have the same question. waiting ...

    Comment


    • #3
      Have you tried constraining your cufflinks assembly parameters? You could increase the -F and -j options (min-isoform-fraction and pre-mrna-fraction). But its been my general experience that if you’re looking for novel isoforms with just the cufflinks RABT assembly, you’re going to have to filter through a lot of junk no matter what you do with cufflinks.

      You can also start doing things a little more sophisticated with maker. They have nice way of handling cufflinks assemblies and potentially creating a “reannotation” with them. Check out: http://gmod.org/wiki/MAKER_Tutorial

      Another option is the PASA pipeline: http://pasa.sourceforge.net. Neither Maker or PASA may not really be easy to get up and running for the casual bioinformatic tool user but they aren’t that bad either.

      Comment


      • #4
        I generally only keep those of the new transcripts that have class_code "j". There might be something real in the other classes, but too many of them do not look like they're real.

        Concerning the -F and -j options: I tested those quite extensively but in my experience you just get more pre-mRNA in the transcripts.

        I have found that it matters how you filter your reads before aligning. If you have paired-end reads, only use the fragments where both reads map for building the transcripts.

        I have no experience with MAKER or PASA, but they look interesting. Thanks for the pointer.

        Comment


        • #5
          Originally posted by jake13 View Post
          I have been running cufflinks on a number of RNA-seq files to find novel transcripts/isoforms. I sequenced fairly deeply and it seems to be calling a ton of novel, short, single exon transcripts, a number of which I think are junk. Do people usually filter these out before running cuffmerge and cuffcompare? Do you just write a small script to manually filter these?
          Thanks
          Hi, do you have any idea about why cufflinks gives a large number of predicted transcripts and how to filter the result now? Thank you.

          Comment


          • #6
            Originally posted by 11xinqi View Post
            Hi, do you have any idea about why cufflinks gives a large number of predicted transcripts and how to filter the result now? Thank you.
            I don't really know why this happens (It does in my data as well. I suspect it is influenced by library construction method and any adapter and rRNA contamination that may be present.). But here is one way I have seen people dealing with it: http://www.ncbi.nlm.nih.gov/pubmed/23237380. Basically they look at the distributions of the "c" and "=" class codes, and based on the hypothesis that artifacts (an unknown subset of j,o,x,u etc) and partially assembled transcripts (c) have separate FPKM distributions (generally lower values) than perfectly assembled transcripts (=), they build a simple classifier that labels transcripts as artifacts if they have an FPKM lower than a certain threshold. The output GTF from cufflinks is then filtered using this threshold as a cutoff.

            This is another approach (FRFE), which according to the authors perform better with respect to single-exon transcripts: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3882232/.

            If any of you know any other approaches, I am certainly interested in them.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Genetic Variation in Immunogenetics and Antibody Diversity
              by seqadmin



              The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
              Today, 07:24 PM
            • seqadmin
              Choosing Between NGS and qPCR
              by seqadmin



              Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
              10-18-2024, 07:11 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 11-01-2024, 06:09 AM
            0 responses
            24 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 10-30-2024, 05:31 AM
            0 responses
            21 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 10-24-2024, 06:58 AM
            0 responses
            25 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 10-23-2024, 08:43 AM
            0 responses
            55 views
            0 likes
            Last Post seqadmin  
            Working...
            X