Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • interpreting cuffdiff output with and without replicates

    Hi SEQers

    I am calling on your expertise!

    I have an RNA Seq data set which is a growth time series.

    I have used Tophat->Cufflinks->Cuffdiff on these data.

    To look at differential gene expression between 6 time points I have use the cuffdiff output file the gene_exp.diff file.

    I have run cuffdiff three different ways
    1. with the -T (time series) option
    2. with out the -T option
    3. just using two time points - comparing the first time point to each successive time point.

    Each of these runs produces different numbers of significantly different genes between any two time points. As one might expect, the overall the numbers of genes with significant differential expression is much lower in case 1 and 2 compared to case 3.

    I would like to understand the reason for this.

    Looking at the gene_exp.diff file output from cuffdiff particularly column 10 the "test stat" this number is different if looking at a .diff file that came from a comparison of two files (case 3) vs one which came from all 6 time points (cases 1 and 2).

    Reading the cufflink manuel and associate information on line it looks like "test stat" is calculated based on a variance of the fpkm's of each replicate.

    I am assuming that these differences are because when I use these 6 files (.bams) in cuffdiff with or with out the -T option (case 1 or 2 ) cuffdiff uses these 6 files as though they are replicates. In case 3 where I have only use two bam files for cuffdiff
    for some genes there are two replicates and others there are no replicates.

    My question is how would you interpret the data from case 3? Should I disregard case 3 and only consider data from cases 1 and 2?

    Thanks for your insights,

    Cynthia


    from http://cufflinks.cbcb.umd.edu/howitworks#hdif

    "Note that in order to calculate the test statistic T, we need to know the variance of the expression level in each condition. The variance needs to include the variability in the number of fragments generated by the transcript across replicates, and should also incorporate any uncertainty in the expression estimate itself."

  • #2
    Whoops where I said .bam I meant .sam!

    Comment


    • #3
      Actually I think this assumption of mine in not correct:

      I am assuming that these differences are because when I use these 6 files (.sam) in cuffdiff with or with out the -T option (case 1 or 2 ) cuffdiff uses these 6 files as though they are replicates. In case 3 where I have only use two bam files for cuffdiff
      for some genes there are two replicates and others there are no replicates.

      These are not considered replicates - so why am I getting different "test stat" numbers in the gene_exp.diff files?

      Comment


      • #4
        I use Tophat outfiles (.sam) in cuffdiff always without the -T option.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Understanding Genetic Influence on Infectious Disease
          by seqadmin




          During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

          Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
          09-09-2024, 10:59 AM
        • seqadmin
          Addressing Off-Target Effects in CRISPR Technologies
          by seqadmin






          The first FDA-approved CRISPR-based therapy marked the transition of therapeutic gene editing from a dream to reality1. CRISPR technologies have streamlined gene editing, and CRISPR screens have become an important approach for identifying genes involved in disease processes2. This technique introduces targeted mutations across numerous genes, enabling large-scale identification of gene functions, interactions, and pathways3. Identifying the full range...
          08-27-2024, 04:44 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Today, 06:25 AM
        0 responses
        13 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 01:02 PM
        0 responses
        12 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 09-18-2024, 06:39 AM
        0 responses
        14 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 09-11-2024, 02:44 PM
        0 responses
        14 views
        0 likes
        Last Post seqadmin  
        Working...
        X