Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fruit Fly CummeRbund Plots Are Not The Same

    Hello all,

    After several days of moving through the TopHat Cufflinks workflow with the example fruit fly data (http://www.ncbi.nlm.nih.gov/pubmed/22383036), I have finally made it to CummeRbund.

    When I execute the following commands in R, I get plots that are not the same as the ones found in the publication. Is that a problem?
    library(cummeRbund)
    cuff_data <- readCufflinks('diff_out')
    csDensity(genes(cuff_data))
    csScatter(genes(cuff_data), 'C1', 'C2')
    csVolcano(genes(cuff_data), 'C1', 'C2')

    Attached is the volcano plot. I just took a screen shot and saved it as ppt since I didn't know how to export or save the R graph.

    Thanks and God bless,
    Jason
    Attached Files

  • #2
    csDensity is really different

    I also wanted to share my csDensity plot because it is really different.
    Attached Files

    Comment


    • #3
      This does look like a problem. Assuming that you are using the right original data (GSE32038), you should get the same volcano plot. Most concerning to me is that you're missing any significant genes.

      I imagine that you are using updated versions of the software relative to the paper, but in principle this shouldn't change the results so dramatically. Several default options have changed recently, though. Without looking over everything, it's tough to say for sure.

      An interesting and potentially educational for you approach would be to run the same data through Galaxy's tophat/cufflinks/cummerbund pipeline to see if you get the same results (or if perhaps you or the tutorial had done something incorrectly).

      Comment


      • #4
        I'd like to check the assumption of using the correct data with you.

        I went to http://0-www.ncbi.nlm.nih.gov.elis.t...i?acc=GSE32038, and copied the links for the RAW.tar and simulated_fastq_files.tar.gz.

        My commands were:
        (in GSE32038 directory)
        wget ftp://ftp.ncbi.nlm.nih.gov/geo/serie...E32038_RAW.tar -O GSE32038_RAW.tar
        tar -xvf GSE32038_RAW.tar
        gunzip GSM79448*
        (in the my_rnaseq_exp directory)
        wget ftp://ftp.ncbi.nlm.nih.gov/geo/serie...q_files.tar.gz -O GSE32038_simulated_fastq_files.tar.gz #Could not find any other fastq files in the GEO Assession
        tar -xvf GSE32038_simulated_fastq_files.tar.gz
        gunzip GSM79448*

        The TopHat Cufflinks journal article says to download the "raw sequencing reads, alignment reads, assembled transfrags and differential analysis", but I was not exactly sure which of files were the ones I needed. Did I download the correct ones?

        Thank you!
        Jason

        Comment


        • #5
          Based on the name of the file (GSE32038_simulated_fastq_files.tar.gz) you seem to have downloaded the right data file.

          Figure 8 in the paper has different numeric ranges for axes (specially the Y-axis) that may be why your volcano plot is looking different. We can't eliminate the possibility that you have different results in the underlying data. Same is true for the other graph.

          Also keep in mind that the example in the nature protocols paper used previous versions of the Tophat/Cufflinks suite. You are probably using the newer versions (v.2.x).
          Last edited by GenoMax; 06-17-2013, 11:38 AM.

          Comment


          • #6
            Okay, and for the fruit fly genome I went to ftp://ussd-ftp.illumina.com/.
            wget ftp://igenome:[email protected] -O Drosophila_melanogaster_Ensembl_BDGP5.25.tar.gz
            tar zxvf Drosophila_melanogaster_Ensembl_BDGP5.25.tar.gz

            Is that also correct?

            I also noticed the axis were different in Figure 6. If FPKM (fragments per kilobase of exon per million fragments mapped) is a decimal, log(FPKM) is negative. This is what we see in my figure but not theirs. If FPKM is a decimal than that means there are fewer fragments per kb of exon than million fragments mapped.

            In a previous thread (http://seqanswers.com/forums/showthread.php?t=31074), I posted several warnings that I got from ran Cuffmerge. These warnings were for missing fasta records such as heterochromatin and mitochondrial genome. Perhaps, without these index files, fragments mapped incorrectly all the way back at the TopHat step. This may account for negative values for log10(FPKM) in the csDensity graph.

            If that is the case, for me to get the same results as in the publication, I will need a fasta index with everything including heterochromatin and mitochondrial genome. If the website (ftp://ussd-ftp.illumina.com/) is not the correct source, does anyone know what is the website with the full genome?

            Comment


            • #7
              I actually have similar problems to yours.
              I also analyzed the simulated data, in combination with the iGenomes data. My results also differ from the results shown in the paper. However, my results also differ from yours.
              The reason you're not showing any significant genes in your volcano plots, is because you did not switch on the significant genes. For some reason you have to do this explicitly:

              e.g. csVolcano(genes(cuff_data), 'WT', 'KO', alpha=0.05, showSignificant=T)

              My error bars/standard deviation is also really high for each gene. Much bigger than one standard deviation, I don't know why this is.

              Did you ever figure out if you did something wrong, or what the reason for the changes are?

              Cheers
              Last edited by rubbertjes; 07-08-2013, 08:42 AM.

              Comment


              • #8
                Hello rubbertjes,

                I was never able to test my hypothesis since I could never find (or was given) a full fruit fly fasta index with the heterochromatin and mitochondrial genomes.

                Thanks for tip about showSignificant=T. I will give that a try this week to see what happens.

                God bless

                Comment


                • #9
                  Hello jmwhitha,

                  Did you finally guessed what were the reasons for the different plot outcomes? I am getting exactly the same "wrong" plots as you. If I run cummerbund using the reference result files from the tutorial, some of the plots are corrected (such as the volcano and the scatter plots), but others are not (eg the density plot and the barplots, which only show the error bars). I get some warnings/errors from cummerbund for these last commands (these, even when using the result data provided as reference, thus might be something due to R packages version I guess):

                  > csDensity(genes(cuff_data))
                  Warning messages:
                  1: Removed 4075 rows containing non-finite values (stat_density).
                  2: Removed 4082 rows containing non-finite values (stat_density).

                  > mygene <- getGene(cuff_data,'regucalcin')
                  > expressionBarplot(mygene)
                  Error : Mapping a variable to y and also using stat="bin".
                  With stat="bin", it will attempt to set the y value to the count of cases in each group.
                  This can result in unexpected behavior and will not be allowed in a future version of ggplot2.
                  If you want y to represent counts of cases, use stat="bin" and don't map a variable to y.
                  If you want y to represent values in the data, use stat="identity".
                  See ?geom_bar for examples. (Defunct; last used in version 0.9.2)


                  I am using Tophat2, cufflinks2 and R3.1.1 for these analysis, while I think in the tutorial they used older versions for these tools...however, I guess resutls should not differ so much, right?

                  Santi

                  Comment


                  • #10
                    Hello Santi,

                    I found a variety of issues with this pipeline so I went with DEGseq. Besides not having as many issues, the processing time was much faster. So, actually I'd recommend you try DEGseq or another pipeline rather than trying to get TopHat Cufflinks... to work.

                    God bless,
                    Jason

                    Comment


                    • #11
                      Hi Jason,

                      Thanks for the feedback! Cheers,

                      Santi

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Best Practices for Single-Cell Sequencing Analysis
                        by seqadmin



                        While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                        Yesterday, 07:15 AM
                      • seqadmin
                        Latest Developments in Precision Medicine
                        by seqadmin



                        Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                        Somatic Genomics
                        “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                        05-24-2024, 01:16 PM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, Yesterday, 08:18 AM
                      0 responses
                      11 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, Yesterday, 08:04 AM
                      0 responses
                      12 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 06-03-2024, 06:55 AM
                      0 responses
                      13 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 05-30-2024, 03:16 PM
                      0 responses
                      27 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X