Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cufflinks and CuffDiff bugs?

    <##I posted the following as a reply thread, but I would like to make a new thread to have more attentions to solve the problems, thanks!##>

    1. New released 1.3.0, after Cuffcompare, FPKM column contains all 0, missing FPKM values even tracking files have them;

    2. in all the versions of CuffDiff, if you compare different conditions against the same control samples, the FPKM in the same control samples in different comparing is different; for example,
    CuffDiff I: condition 1 v.s. condition control;
    CuffDiff II: condition 2 v.s condition control;

    after CuffDiff, when FPKM numbers are tracked, the FPKM of Gene X in condition control in CuffDiff I is different from the FPKM of Gene X in condition control in CuffDiff II. GeneX roughly are 20-30% in total annotated genes and the rest are the same.
    anybody has explanation or suggestions for this? Thanks!
    Last edited by lewewoo; 06-06-2011, 08:18 AM.

  • #2
    i've also been hoping for a response to this thread (as well as the other thread you posted this question in).

    cufflinks 1.0.3 is not giving FPKM values other than zero for paired-end reads from SOLiD. 1.0.3 works fine with single-end data. the same paired-end data runs fine through cufflinks 0.9.3 and FPKM values are calculated just fine.

    Code:
    cufflinks --output-dir $out --num-threads 8 --GTF-guide $gtf --multi-read-correct --library-type fr-secondstrand --upper-quartile-norm --label l --frag-bias-correct $hg19All.fa $bam
    
    (assume my variable references are correct)
    anyone have any ideas to try as a workaround? anyone else having similar issues?

    Comment


    • #3
      Seems lot of confusing changes in cufflinks. Have not able to find a fix yet.

      Comment


      • #4
        I was having a similar problem as stated in #1 by lewewoo -- cufflinks was not generating accurate FPKMs. Specifically, they were all zero.

        Cufflinks 1.0.3 using SOLiD pair-ends reads at 50 bp x 35 bp mapped using Bioscope.

        1. Add XS flag as per Cufflinks manual
        Code:
        samtools view -F 0x04 -h unedited.bam | awk 'BEGIN{OFS="\t"} (!/^@/){minus=and($2, 0x10); print $0"\tXS:A:"(minus ? "-":"+") } (/^@/){ print }' | samtools view -bhS - > xs.bam
        This runs through Cufflinks and gives FPKM = 0 for everything.

        2. Increment NH flag by 1 as per Cufflinks developer Adam Roberts
        Code:
        samtools view -F 0x04 -h xs.bam | awk 'BEGIN{OFS="\t"}(! /^@/){ split($12,a,":"); $12 = a[1]":"a[2]":"a[3]+1; print $0 } (/^@/){ print }' > xs.nh.sam
        This seems to be working, but I don't have the output of a full run yet.

        Comment


        • #5
          I'm seeing the same thing as lewewoo #2 statement.

          I am getting different FPKM values for the same control used against two different samples in two different cuffdiff runs. Is this expected? Does cuffdiff consider all samples provided to calculate the FPKM? If this is true - what is the best workflow for getting FPKM values for samples that you want to performer further analysis on outside of cufflinks suite? Should I run cufflinks on individual samples and work with those FPKMs or should I put all samples I'm interested in analyzing into cuffdiff and use those FPKMs since they might be normalized across samples?

          Any suggests or ideas to what is happening would be great!!
          Thanks!

          Comment


          • #6
            The FPKMs should have normal ranges included. Do those ranges overlap?

            Comment


            • #7
              Good point. Thanks, gringer - quick look and the ranges do seem to overlap. I did a scatter plot and there is concordance between values with a very tight spread at extremes but quite a big spread at the middle. I guess I just expected much more agreement across the range - especially since it is the same sample.

              Sorry, to ask again - but does this mean that cuffdiff does not consider both samples when calculating FPKM? (I assume this but not 100% positive this assumption is correct) What would be the recommended workflow to just get FPKM values for further analysis? Can I use cuffdiff (maybe with all the samples analyzed together if some cross sample normalization is occurring) or should I use cufflinks? BTW - I should mention - I was not using -N option (quantile normalization) in cuffdiff.

              Thanks so much for the help!! This has been a big source of discussion - that is which approach to take to get FPKMs. Really appreciate it!
              Last edited by jaldrich; 07-14-2011, 09:21 AM.

              Comment


              • #8
                I would recommend using cuffdiff for analysing FPKM, because the FPKM calculations may make assumptions that are not obvious to the people who didn't write the cufflinks/cuffdiff code.

                It's probably worth having a look at a couple of runs to see the difference with and without quantile normalisation. I would expect that cufflinks is "good enough" without this, because they haven't included it as a default option even though it's relatively simple to calculate.

                There's a bit of information on how things are calculated on the cufflinks website:

                Cuffdiff calculates the FPKM of each transcript, primary transcript, and gene in each sample. Primary transcript and gene FPKMs are computed by summing the FPKMs of transcripts in each primary transcript group or gene group.

                Cuffdiff requires that transcripts in the input GTF be annotated with certain attributes in order to look for changes in primary transcript expression, splicing, coding output, and promoter use.... The above attributes, along with the gene_id required by the GTF specification, make each transcript a member of a "gene group", "primary transcript group", and "CDS group".
                And later...
                Cuffdiff pools the fragments before calculating the individual isoform abundances and then examines the likelihood surface of the replicate pool via importance sampling.
                Note the magic word right at the end of that, sampling. This suggests that you should expect slightly different results by running cuffdiff on the same data (it is unlikely that the sampling will be done in exactly the same way on each run).

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Advanced Methods for the Detection of Infectious Disease
                  by seqadmin




                  The recent pandemic caused worldwide health, economic, and social disruptions with its reverberations still felt today. A key takeaway from this event is the need for accurate and accessible tools for detecting and tracking infectious diseases. Timely identification is essential for early intervention, managing outbreaks, and preventing their spread. This article reviews several valuable tools employed in the detection and surveillance of infectious diseases.
                  ...
                  11-27-2023, 01:15 PM
                • seqadmin
                  Strategies for Investigating the Microbiome
                  by seqadmin




                  Microbiome research has led to the discovery of important connections to human and environmental health. Sequencing has become a core investigational tool in microbiome research, a subject that we covered during a recent webinar. Our expert speakers shared a number of advancements including improved experimental workflows, research involving transmission dynamics, and invaluable analysis resources. This article recaps their informative presentations, offering insights...
                  11-09-2023, 07:02 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 08:23 AM
                0 responses
                8 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 12-01-2023, 09:55 AM
                0 responses
                21 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 11-30-2023, 10:48 AM
                0 responses
                20 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 11-29-2023, 08:26 AM
                0 responses
                15 views
                0 likes
                Last Post seqadmin  
                Working...
                X