Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cufflinks and CuffDiff bugs?

    <##I posted the following as a reply thread, but I would like to make a new thread to have more attentions to solve the problems, thanks!##>

    1. New released 1.3.0, after Cuffcompare, FPKM column contains all 0, missing FPKM values even tracking files have them;

    2. in all the versions of CuffDiff, if you compare different conditions against the same control samples, the FPKM in the same control samples in different comparing is different; for example,
    CuffDiff I: condition 1 v.s. condition control;
    CuffDiff II: condition 2 v.s condition control;

    after CuffDiff, when FPKM numbers are tracked, the FPKM of Gene X in condition control in CuffDiff I is different from the FPKM of Gene X in condition control in CuffDiff II. GeneX roughly are 20-30% in total annotated genes and the rest are the same.
    anybody has explanation or suggestions for this? Thanks!
    Last edited by lewewoo; 06-06-2011, 08:18 AM.

  • #2
    i've also been hoping for a response to this thread (as well as the other thread you posted this question in).

    cufflinks 1.0.3 is not giving FPKM values other than zero for paired-end reads from SOLiD. 1.0.3 works fine with single-end data. the same paired-end data runs fine through cufflinks 0.9.3 and FPKM values are calculated just fine.

    Code:
    cufflinks --output-dir $out --num-threads 8 --GTF-guide $gtf --multi-read-correct --library-type fr-secondstrand --upper-quartile-norm --label l --frag-bias-correct $hg19All.fa $bam
    
    (assume my variable references are correct)
    anyone have any ideas to try as a workaround? anyone else having similar issues?

    Comment


    • #3
      Seems lot of confusing changes in cufflinks. Have not able to find a fix yet.

      Comment


      • #4
        I was having a similar problem as stated in #1 by lewewoo -- cufflinks was not generating accurate FPKMs. Specifically, they were all zero.

        Cufflinks 1.0.3 using SOLiD pair-ends reads at 50 bp x 35 bp mapped using Bioscope.

        1. Add XS flag as per Cufflinks manual
        Code:
        samtools view -F 0x04 -h unedited.bam | awk 'BEGIN{OFS="\t"} (!/^@/){minus=and($2, 0x10); print $0"\tXS:A:"(minus ? "-":"+") } (/^@/){ print }' | samtools view -bhS - > xs.bam
        This runs through Cufflinks and gives FPKM = 0 for everything.

        2. Increment NH flag by 1 as per Cufflinks developer Adam Roberts
        Code:
        samtools view -F 0x04 -h xs.bam | awk 'BEGIN{OFS="\t"}(! /^@/){ split($12,a,":"); $12 = a[1]":"a[2]":"a[3]+1; print $0 } (/^@/){ print }' > xs.nh.sam
        This seems to be working, but I don't have the output of a full run yet.

        Comment


        • #5
          I'm seeing the same thing as lewewoo #2 statement.

          I am getting different FPKM values for the same control used against two different samples in two different cuffdiff runs. Is this expected? Does cuffdiff consider all samples provided to calculate the FPKM? If this is true - what is the best workflow for getting FPKM values for samples that you want to performer further analysis on outside of cufflinks suite? Should I run cufflinks on individual samples and work with those FPKMs or should I put all samples I'm interested in analyzing into cuffdiff and use those FPKMs since they might be normalized across samples?

          Any suggests or ideas to what is happening would be great!!
          Thanks!

          Comment


          • #6
            The FPKMs should have normal ranges included. Do those ranges overlap?

            Comment


            • #7
              Good point. Thanks, gringer - quick look and the ranges do seem to overlap. I did a scatter plot and there is concordance between values with a very tight spread at extremes but quite a big spread at the middle. I guess I just expected much more agreement across the range - especially since it is the same sample.

              Sorry, to ask again - but does this mean that cuffdiff does not consider both samples when calculating FPKM? (I assume this but not 100% positive this assumption is correct) What would be the recommended workflow to just get FPKM values for further analysis? Can I use cuffdiff (maybe with all the samples analyzed together if some cross sample normalization is occurring) or should I use cufflinks? BTW - I should mention - I was not using -N option (quantile normalization) in cuffdiff.

              Thanks so much for the help!! This has been a big source of discussion - that is which approach to take to get FPKMs. Really appreciate it!
              Last edited by jaldrich; 07-14-2011, 09:21 AM.

              Comment


              • #8
                I would recommend using cuffdiff for analysing FPKM, because the FPKM calculations may make assumptions that are not obvious to the people who didn't write the cufflinks/cuffdiff code.

                It's probably worth having a look at a couple of runs to see the difference with and without quantile normalisation. I would expect that cufflinks is "good enough" without this, because they haven't included it as a default option even though it's relatively simple to calculate.

                There's a bit of information on how things are calculated on the cufflinks website:

                Cuffdiff calculates the FPKM of each transcript, primary transcript, and gene in each sample. Primary transcript and gene FPKMs are computed by summing the FPKMs of transcripts in each primary transcript group or gene group.

                Cuffdiff requires that transcripts in the input GTF be annotated with certain attributes in order to look for changes in primary transcript expression, splicing, coding output, and promoter use.... The above attributes, along with the gene_id required by the GTF specification, make each transcript a member of a "gene group", "primary transcript group", and "CDS group".
                And later...
                Cuffdiff pools the fragments before calculating the individual isoform abundances and then examines the likelihood surface of the replicate pool via importance sampling.
                Note the magic word right at the end of that, sampling. This suggests that you should expect slightly different results by running cuffdiff on the same data (it is unlikely that the sampling will be done in exactly the same way on each run).

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Non-Coding RNA Research and Technologies
                  by seqadmin




                  Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                  Nobel Prize for MicroRNA Discovery
                  This week,...
                  10-07-2024, 08:07 AM
                • seqadmin
                  Recent Developments in Metagenomics
                  by seqadmin





                  Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
                  09-23-2024, 06:35 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 10-11-2024, 06:55 AM
                0 responses
                10 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 10-02-2024, 04:51 AM
                0 responses
                109 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 10-01-2024, 07:10 AM
                0 responses
                114 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 09-30-2024, 08:33 AM
                1 response
                119 views
                0 likes
                Last Post EmiTom
                by EmiTom
                 
                Working...
                X