Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • cufflinks FPKM >>> Cuffdiff FPKM

    I cannot understand why the FPKM estimated in cufflinks is SO much larger than that in cuffdiff:

    Cufflinks
    Code:
    cufflinks -p8 -m320 -u -o /media/hd/working/tuco/17Jan12socialcuff -L social \
    --upper-quartile-norm --max-mle-iterations 20000 \
    /media/hd/working/tuco/b2.social/social.bam
    
    cat transcripts.gtf | grep 'comp14388_c0_seq1'
    
    comp14388_c0_seq1; FPKM "[B]1630419.4581286784[/B]";
    I merged the .gtf files from each cufflinks run, and fed that to cufflinks
    I have 5 biological reps for each group

    Cuffdiff
    Code:
    mkdir /media/hd/working/tuco/17Jan.cuffdiff
    cd /media/hd/working/tuco/17Jan.cuffdiff
    
    cuffdiff -p8 -L social,solitary -N -u \
    --max-mle-iterations 10000 /media/hd/working/tuco/17Jan12cuffcompare/*gtf \
    /media/hd/working/tuco/b2.bams/406A.bam,\
    /media/hd/working/tuco/b2.bams/4262.bam,\
    /media/hd/working/tuco/b2.bams/2354.bam,\
    /media/hd/working/tuco/b2.bams/4241.bam,\
    /media/hd/working/tuco/b2.bams/401C.bam \
    /media/hd/working/tuco/b2.bams/6236.bam,\
    /media/hd/working/tuco/b2.bams/2226.bam,\
    /media/hd/working/tuco/b2.bams/5B5C.bam,\
    /media/hd/working/tuco/b2.bams/255D.bam,\
    /media/hd/working/tuco/b2.bams/4572.bam
    
    cat gene_exp.diff | grep 'comp14388_c0_seq1'
    
    comp14388_c0_seq1:0-1977	social	solitary	[B]10.5437[/B]	8.08172

    ok... 1630419.4581286784 >>> 10.5437 Why??

  • #2
    I should note that 'social.bam' is just a product of samtools merge for all the individuals in the social treatment.. Those bamfiles are listed individually in Cuffdiff-- to indicate that there are biological replicates.

    So, in essence, the FPKM from social.bam from cufflinks should be the average value from all the individuals in that group.

    Comment


    • #3
      Just at first glance, in your cufflinks run you specify two different parameters that will affect the FPKM calculation.
      Code:
      --upper-quartile-norm --max-mle-iterations 20000
      I would try changing --max-mle-iterations to match cuffdiff, disabling quartile normization, and running the biological replicates through cufflinks separately to see if this difference is true. Then I would try cufflinks with the merged BAMs. Internally the same code does the quantification in both cufflinks and cuffdiff.

      Also, I noticed you're looking in transcripts.gtf for cufflinks and gene_exp.diff for cuffdiff. It would be better to look in isoforms.fpkm_tracking for both cufflinks and cuffdiff, as gene_exp.diff lists quantification at the locus level while transcripts.gtf is at the isoform level.

      Comment


      • #4
        also, I just realized that log10(1630419.4581286784) is about 6, which is pretty close to 10.. I wonder if the difference is this easy.

        Comment


        • #5
          Did you ever find a solution to this? We run into the same problem.

          Our pipeline is thus:
          We map reads with tophat for each sample
          Run cufflinks on each sample to generate a transcriptome assembly

          the command looks something like:
          Code:
           cufflinks --label tax-Pre-R5
                         --num-threads 4
                         --library-type fr-secondstrand
                         --frag-bias-correct /ifs/mirror/genomes/bowtie/hg19.fa
                         --multi-read-correct
                         --upper-quartile-norm
                         /ifs/projects/proj004/rnaseq4/tax-Pre-R5.accepted.bam
          Run Cuffmerge and Cuffcompare to generate merged gene sets.

          We also run cuff diff to test for differences.

          Our cuffdiff commands look like:

          Code:
           cuffdiff --output-dir abinitio.cuffdiff.dir             
                           --library-type fr-secondstrand
                           --upper-quartile-norm 
                           --frag-bias-correct /ifs/mirror/genomes/bowtie/hg19.fa
                           --multi-read-correct
                           --verbose
                           --num-threads 16
                           --labels Prostate-Pre-agg,Prostate-Post-agg,tax-Pre-agg,tax-Post-agg              
                           --FDR 0.050000
                          abinitio.gtf
                        Prostate-Pre-R7.accepted.bam,Prostate-Pre-R1.accepted.bam,Prostate-Pre-R4.accepted.bam,Prostate-Pre-R2.accepted.bam,Prostate-Pre-R8.accepted.bam,Prostate-Pre-R5.accepted.bam,Prostate-Pre-R3.accepted.bam,Prostate-Pre-R6.accepted.bam
                       Prostate-Post-R7.accepted.bam,Prostate-Post-R8.accepted.bam,Prostate-Post-R6.accepted.bam,Prostate-Post-R3.accepted.bam,Prostate-Post-R5.accepted.bam,Prostate-Post-R2.accepted.bam,Prostate-Post-R4.accepted.bam,Prostate-Post-R1.accepted.bam   
                      tax-Pre-R1.accepted.bam,tax-Pre-R3.accepted.bam,tax-Pre-R2.accepted.bam,tax-Pre-R6.accepted.bam,tax-Pre-R4.accepted.bam,tax-Pre-R5.accepted.bam
                     tax-Post-R6.accepted.bam,tax-Post-R1.accepted.bam,tax-Post-R4.accepted.bam,tax-Post-R5.accepted.bam,tax-Post-R2.accepted.bam,tax-Post-R3.accepted.bam
          If we compare the FPKMs coming out of cuffcompare and cuffdiff they are not even within two or three orders of magnitude of each other, with the cuffcompare FPKMs being in the millions or tens of millions, while the cuffdiff outputs being in the more sensible 0 - several hundred range.

          We're using cufflinks 1.3.1.

          Comment


          • #6
            Hi,

            we had the same problem and tried the new Cufflinks version 2.0.2 and it seems the values from Cufflinks and Cuffdiff are the same (have to check it more carefully)

            these are the commands I used

            Code:
            cufflinks -o ./Sample001_cufflinks_out_No_N_2.0.2 -u -g ../genes.gtf -p 2 --total-hits-norm ../Sample_001_accepted_hits.bam
            Code:
            cuffdiff -o ./COMPARISON1_SAMPLE1_SAMPLE1BIS_cuffdiff_out/ -L SAMPLE1,SAMPLE1BIS -p 2 -u -v -emit-count-tables -total-hits-norm ../Sample001_cufflinks_out/transcripts.gtf ../Sample_001_accepted_hits.sam ../Sample_001_bis_accepted_hits.sam
            I know it's weird to use cuffdiff to compare one sample to itself but I had no other choice...

            HTH

            Marina

            EDIT: Though the FPKM values from Cufflinks and Cuffdiff are now more similar I still get unreasonable high FPKM values specially for very short genes (around 37nt, regulatory RNAs I guess). Searching for some kind of explanation I found this thread http://seqanswers.com/forums/showthread.php?t=20702 it's worth reading it, good explanation by Cole Trapnell on why in small genes you can get extremely high FPKM values
            Last edited by mmanrique; 08-04-2012, 07:25 AM.

            Comment


            • #7
              hi all, i had the same prob and i was told to run cuffdiff WITHOUT the "N" option (perform quartile normalization)

              hope it helps....
              ib

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Recent Developments in Metagenomics
                by seqadmin





                Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
                09-23-2024, 06:35 AM
              • seqadmin
                Understanding Genetic Influence on Infectious Disease
                by seqadmin




                During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

                Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
                09-09-2024, 10:59 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 04:51 AM
              0 responses
              8 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 10-01-2024, 07:10 AM
              0 responses
              11 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 09-30-2024, 08:33 AM
              0 responses
              16 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 09-26-2024, 12:57 PM
              0 responses
              16 views
              0 likes
              Last Post seqadmin  
              Working...
              X