Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cuffdiff, replicates, and weird results

    Hi everyone. Please let me know if this has been answered in another thread. I have a lab meeting coming up, so I'm a bit desperate to figure out this problem.

    So I'm doing RNA-Seq on some yeast strains. I have two conditions, one wild time and one with a gene KO, and I have two replications of each for four total. I also have a sequenced and annotated genome, so I have reliable .GTF and .GFF files to work with.

    Here's my issue: When I compare one WT data set to one KO data set, I get a gene expression change considered significant for ~200-300 genes out of 6100. When I try to merge my replicates, I wind up with something like 3000 significant changes. This gene KO shouldn't be affecting everything in the cell, so I"m assuming I'm doing something wrong. What is the appropriate way to use replication with Tuxedo? Should I be mapping my replicated together with Tophat? Should I map separately and then use cuffdiff to combine the mapped data sets?

    Any help would be much appreciated.

    -DrAlexander

  • #2
    If you need a fast and more conservative result, I would suggest you compare one against one for both replicates and then take the overlap of both comparisons.

    This way you reduce your false positives.

    Comment


    • #3
      read counts

      I have seen this before when the total read counts in the samples(replicates) vary significantly. 10M vs 30M or 3M vs 15M reads that aligned. In which case DrAlexander is the best solution.

      How many reads did you get for each sample? You can also try baySeq as you then have better control of your scaling factor used between libraries/replicates.

      Comment


      • #4
        Originally posted by DerSeb View Post
        If you need a fast and more conservative result, I would suggest you compare one against one for both replicates and then take the overlap of both comparisons.

        This way you reduce your false positives.
        Cool, I was gonna do this anyway, but it's nice to know I'm not crazy for thinking it.

        Comment


        • #5
          Originally posted by severin View Post
          I have seen this before when the total read counts in the samples(replicates) vary significantly. 10M vs 30M or 3M vs 15M reads that aligned. In which case DrAlexander is the best solution.

          How many reads did you get for each sample? You can also try baySeq as you then have better control of your scaling factor used between libraries/replicates.
          There is a big variation between my two KO data sets: one is at 31M reads and the other is at 15M reads. What I can't figure out is why this matters, as I thought that cuffdiff was comparing the frequency of each fragment based on the total number of fragments. Am I just naive?

          Comment


          • #6
            Originally posted by DrAlexander View Post
            There is a big variation between my two KO data sets: one is at 31M reads and the other is at 15M reads. What I can't figure out is why this matters, as I thought that cuffdiff was comparing the frequency of each fragment based on the total number of fragments. Am I just naive?
            yes, you are right in principle. the fragment counts are normalized.

            however, no normalization works perfectly and this is a rather complex problem. therefor will give you different results for different sequencing depths.

            Comment


            • #7
              When you say you "merge" your replicates, are you saying you merge them into a single file and run it as one? If hope not, but if so that is going to be the problem because Cuffdiff isn't going to be able to calculate variance in any reliable way and the increased read count will end up giving you more DEGs.

              Knot really knowing much about doing RNA-seq in yeast, two replicates doesn't really cut it for RNA-seq. That doesn't help you for your group meeting but you need to do more to get the statistics in your favor.

              Wild time - I like that. Are those wild type yeast cells that have more fun?
              --------------
              Ethan

              Comment


              • #8
                Originally posted by DerSeb View Post
                yes, you are right in principle. the fragment counts are normalized.

                however, no normalization works perfectly and this is a rather complex problem. therefor will give you different results for different sequencing depths.
                I see. I could also jackknife my read counts before I map them.

                When you say you "merge" your replicates, are you saying you merge them into a single file and run it as one? If hope not, but if so that is going to be the problem because Cuffdiff isn't going to be able to calculate variance in any reliable way and the increased read count will end up giving you more DEGs.
                No, I'm not doing that. Sorry if I'm being vague, I really have little clue what I'm doing here. I'm a biochemist, not a computer scientist, dammit! Anyway, I'm mapping all four data sets separately via Tophat, then designating the files as replicates when I send them through cuffdiff.

                Knot really knowing much about doing RNA-seq in yeast, two replicates doesn't really cut it for RNA-seq. That doesn't help you for your group meeting but you need to do more to get the statistics in your favor.
                Well, I had three for each condition, but a bunch of reads didn't get through, and this is just a fishing expedition to give me a clue on what this particular gene is doing in this cell, since the bloody thing shouldn't even be there in the first place according to evolutionary theory.

                Wild time - I like that. Are those wild type yeast cells that have more fun?
                Yeast cells are always ready to party, dude.

                Comment


                • #9
                  Originally posted by DerSeb View Post
                  If you need a fast and more conservative result, I would suggest you compare one against one for both replicates and then take the overlap of both comparisons.

                  This way you reduce your false positives.
                  No, this will produce lots of false positives. If you present cuffdiff with a single control and a single treatment sample, it cannot estimate biological variation and will assume it to be zero (in my view not a good design (though a common one); it should rather simply refuse to provide an analysis). The zero variance will result in a huge number of genes and hence a large overlap, too.

                  A good sanity check, by the way, is to assign one control an one KO sample as one group of replicates and the other control together with the other KO sample as the other group of replicates. If cuffdiff still gives you a large number of hits for this kind of test, something is severely wrong.

                  Comment


                  • #10
                    [QUOTE=Simon Anders;73143]No, this will produce lots of false positives. If you present cuffdiff with a single control and a single treatment sample, it cannot estimate biological variation and will assume it to be zero (in my view not a good design (though a common one); it should rather simply refuse to provide an analysis). The zero variance will result in a huge number of genes and hence a large overlap, too.

                    yes, I agree with you. The major problem is the low number of replicates. However, if he uses 1 sample vs 1 sample he gets 200-300 DEGs. Using 2 vs 2 he gets several thousand. This is of course not reasonable! Nevertheless for his meeting I would rather use the overlap of 2x 1vs1 (probably 100 DEGs) or try something totally different!

                    Also, I have no experience with yeast. From my experience with HEK cells it is very hard to analyze cells in culture. I once compared 3 replicates each from 2 different batches and it was a difference like comparing two different tissues. Effects of a single gene which has maybe 20 targets would be impossible to detect though genome-wide analyses.

                    Comment


                    • #11
                      Originally posted by DerSeb View Post
                      yes, I agree with you. The major problem is the low number of replicates. However, if he uses 1 sample vs 1 sample he gets 200-300 DEGs. Using 2 vs 2 he gets several thousand.
                      I somehow did not register this fact. This is strange, indeed. Maybe the cuffdiff authors fixed the issue with the zero variance and now use an extra conservative approach when no replication is available. (This is what we do in DESeq with our "blind" mode.)

                      Also, I have no experience with yeast. From my experience with HEK cells it is very hard to analyze cells in culture. I once compared 3 replicates each from 2 different batches and it was a difference like comparing two different tissues. Effects of a single gene which has maybe 20 targets would be impossible to detect though genome-wide analyses.
                      Yeast is much more well-behaved than HEK. As a single-cell organism, yeast cultures are much more homogeneous (especially if its liquid cultures). So, while it is not ideal to have only two replicates, the issue is that one might misjudge variability. The signal-to-noise ratio is usually good enough.

                      Comment


                      • #12
                        "If you present cuffdiff with a single control and a single treatment sample, it cannot estimate biological variation and will assume it to be zero"

                        This statement is not correct. Cuffdiff is explained here http://cufflinks.cbcb.umd.edu/ and a paper with details of the algorithm in version 2.0. is forthcoming.

                        Comment


                        • #13
                          I don't see how it could be possible to estimate biological variance without biological replicates. If this is possible, I would find it truly amazing, almost god like.
                          --------------
                          Ethan

                          Comment


                          • #14
                            It is possible, although not ideal. Since one can assume that most genes are not differential expressed, the control and single treatment can be viewed as biological replicates for the majority (although obviously not all) genes. This can be used to estimate the extent of variability even in the absence of replicates. The idea is described for RNA-Seq in the DESeq paper but has been discussed many times in the microarray context. For example:

                            Comment


                            • #15
                              It makes some sense. I guess there is a god, albeit a less then ideal one. It still seems like a waste of time and money to do an RNA-seq experiment without replicates.
                              --------------
                              Ethan

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                The Impact of AI in Genomic Medicine
                                by seqadmin



                                Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                                02-26-2024, 02:07 PM
                              • seqadmin
                                Multiomics Techniques Advancing Disease Research
                                by seqadmin


                                New and advanced multiomics tools and technologies have opened new avenues of research and markedly enhanced various disciplines such as disease research and precision medicine1. The practice of merging diverse data from various ‘omes increasingly provides a more holistic understanding of biological systems. As Maddison Masaeli, Co-Founder and CEO at Deepcell, aptly noted, “You can't explain biology in its complex form with one modality.”

                                A major leap in the field has
                                ...
                                02-08-2024, 06:33 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:12 AM
                              0 responses
                              17 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 02-23-2024, 04:11 PM
                              0 responses
                              67 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 02-21-2024, 08:52 AM
                              0 responses
                              73 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 02-20-2024, 08:57 AM
                              0 responses
                              62 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X