Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Volcano plot with R

    Hello everyone!

    We are cooperating with an Institute that performed Illumina sequencing (HiSeq3000) for our RNA samples. They normalized and annotated the data using CLC Genomics Workbench 9. In the end, we received an Excel table containing the name of the gene, expression count, p, FDR, Bon, fold change and RPKM value.
    I wrote an R script to make a volcano plot (log2FC on the x-axis, -log10p on the y axis).


    The issues:
    (1) Turns out that roughly 66% of our genes have a p value of 1. I excluded these genes as they are plotted on the x-axis (log2(1)=0). Is it okay to pre-filter data for a volcano plot or do people usually plot the whole data set?
    (2) Another roughly 180 genes have a p-value of exactly 0. As I cannot calculate the logarithm of value 0, I first wanted to replace the zeros with the second smallest p value available in my dataset. However, as there are so many genes with p=0, it is hard to randomly assign a small p value without creating a suspicious pattern of dots in my plot. How do people plot genes with p=0?
    (3) We figured that maybe the p=0 and p=1 values are rounded values that appear when they ask the software to create an Excel file. Could that be possible?
    Our collaborator claims that none of the values are rounded. Yet, when they ask their automated software (CLC Genomics Workbench) to create a volcano plot, it looks normal, without any horizontal lines.

    Any input is greatly appreciated!!

    Best wishes
    DCseq

  • #2
    Why not ask them to export the normalized values from CLC (or better still the raw counts). You can do your own analysis (sounds like you are comfortable with R) with that data (e.g. DESeq2).

    Comment


    • #3
      I asked them for DESeq2 files but they replied they cannot give me such an output quoting the following:
      "
      ************************************
      Export of tables

      Tables can be exported in four different formats; CSV, tab-separated, Excel, or html. When exporting a table in CSV, tab-separated, or Excel format, numbers with many decimals are printed in the exported file with 10 decimals, or in 1.123E-5 format when the number is close to zero.

      When exporting a table in html format, data are exported with the number of decimals that have been defined in the workbench preference settings. When tables are exported in html format from the server or using command line tools, the default number of exported decimals is 3.
      ************************************
      "

      Nonetheless, they said they could give me BAM files. I have not worked with BAM files before. Would they be helpful in my case?

      Many thanks

      Comment


      • #4
        Everything would be fixed if you can get the bam files, then you can do your own analysis. Getting counts (in R) is easy, doing differential expression analysis isn't too hard.

        Comment


        • #5
          If you get the BAM files then you can use featureCounts (via R subread package) followed by DESeq2. You should ask them to let you know the exact genome build used (or better still ask them to provide corresponding GTF files) since you would need those for read counting using BAM files and featureCounts.

          Comment


          • #6
            Great, many thanks for your responses. I requested the GTF files from our collaborator and will let you know how everything goes!

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Best Practices for Single-Cell Sequencing Analysis
              by seqadmin



              While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
              06-06-2024, 07:15 AM
            • seqadmin
              Latest Developments in Precision Medicine
              by seqadmin



              Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

              Somatic Genomics
              “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
              05-24-2024, 01:16 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 06:58 AM
            0 responses
            13 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 06-06-2024, 08:18 AM
            0 responses
            20 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 06-06-2024, 08:04 AM
            0 responses
            18 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 06-03-2024, 06:55 AM
            0 responses
            13 views
            0 likes
            Last Post seqadmin  
            Working...
            X