Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Rebasing gene lengths when combining the FPKMs of several studies

    We're planning to combine FPKMs of several GEO datasets for our project. However, these projects uses different gene annotations on GRCh37--some use hg37 and some use hg19--for their gene counting.

    Do you think, in such a case, that it is necessary do "rebase" the FPKMs using gene lengths from the same annotation?

  • #2
    I'd suggest regenerating the FPKMs from the raw data with a uniform pipeline, as the processing methodologies obviously differ, which can have an unexpectedly large effect on results.

    Comment


    • #3
      Originally posted by Brian Bushnell View Post
      I'd suggest regenerating the FPKMs from the raw data with a uniform pipeline, as the processing methodologies obviously differ, which can have an unexpectedly large effect on results.
      The biggest problem is some of the data used has non-public SAM/BAMs, so I have to live with count-level data. This is not to mention some of these sets use Ensembl IDs, some use EntrezGene IDs and some use symbols...
      Last edited by SamCurt; 01-03-2017, 12:36 PM.

      Comment


      • #4
        Personally, I am not very interested in unexplainable or unreproducible data. I think that reproducibility is essential to science; using black-box software or secret data is not explainable or reproducible, and thus not scientific, in my opinion. I suggest that you pursue something that is reproducible.

        Comment


        • #5
          In addition to what Brian wrote: Currently, >50% of the public data I'd like to use for comparison to my data are not reproducible with the published analyses pipelines. This is either because there are different 1-man-1-time-0-comments-custom-scripts that simply do not work or because essential information (e.g. program parameters!) are not provided anywhere. However, I can not simply ignore this existing data as it is sometimes also the only fitting reference in terms of sample prep, sequencing, etc...

          Hence, I currently take the most annoying and time consuming ways: Start at fastq, contact the authors, dig into the code. If the data is published in a peer-reviewed journal, you should be able to get access to the raw data, so you are able to reproduce the published results.

          Comment


          • #6
            The complete code for one of those sets--the one I mentioned to currently only have "open" counts data" is available open source (I already saw the code). I just want to know how many days would I need to re-align those, since SRX toolkit is slow in my institution for some reason...

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin




              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
              04-22-2024, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 10:49 AM
            0 responses
            17 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-25-2024, 11:49 AM
            0 responses
            24 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-24-2024, 08:47 AM
            0 responses
            20 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            62 views
            0 likes
            Last Post seqadmin  
            Working...
            X