Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Rebasing gene lengths when combining the FPKMs of several studies

    We're planning to combine FPKMs of several GEO datasets for our project. However, these projects uses different gene annotations on GRCh37--some use hg37 and some use hg19--for their gene counting.

    Do you think, in such a case, that it is necessary do "rebase" the FPKMs using gene lengths from the same annotation?

  • #2
    I'd suggest regenerating the FPKMs from the raw data with a uniform pipeline, as the processing methodologies obviously differ, which can have an unexpectedly large effect on results.

    Comment


    • #3
      Originally posted by Brian Bushnell View Post
      I'd suggest regenerating the FPKMs from the raw data with a uniform pipeline, as the processing methodologies obviously differ, which can have an unexpectedly large effect on results.
      The biggest problem is some of the data used has non-public SAM/BAMs, so I have to live with count-level data. This is not to mention some of these sets use Ensembl IDs, some use EntrezGene IDs and some use symbols...
      Last edited by SamCurt; 01-03-2017, 12:36 PM.

      Comment


      • #4
        Personally, I am not very interested in unexplainable or unreproducible data. I think that reproducibility is essential to science; using black-box software or secret data is not explainable or reproducible, and thus not scientific, in my opinion. I suggest that you pursue something that is reproducible.

        Comment


        • #5
          In addition to what Brian wrote: Currently, >50% of the public data I'd like to use for comparison to my data are not reproducible with the published analyses pipelines. This is either because there are different 1-man-1-time-0-comments-custom-scripts that simply do not work or because essential information (e.g. program parameters!) are not provided anywhere. However, I can not simply ignore this existing data as it is sometimes also the only fitting reference in terms of sample prep, sequencing, etc...

          Hence, I currently take the most annoying and time consuming ways: Start at fastq, contact the authors, dig into the code. If the data is published in a peer-reviewed journal, you should be able to get access to the raw data, so you are able to reproduce the published results.

          Comment


          • #6
            The complete code for one of those sets--the one I mentioned to currently only have "open" counts data" is available open source (I already saw the code). I just want to know how many days would I need to re-align those, since SRX toolkit is slow in my institution for some reason...

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Genetic Variation in Immunogenetics and Antibody Diversity
              by seqadmin



              The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
              11-06-2024, 07:24 PM
            • seqadmin
              Choosing Between NGS and qPCR
              by seqadmin



              Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
              10-18-2024, 07:11 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 11-01-2024, 06:09 AM
            0 responses
            30 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 10-30-2024, 05:31 AM
            0 responses
            21 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 10-24-2024, 06:58 AM
            0 responses
            26 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 10-23-2024, 08:43 AM
            0 responses
            57 views
            0 likes
            Last Post seqadmin  
            Working...
            X