Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Normalizing mutation counts by Gene size

    What would the recommended standard be for doing a normalization to eliminate or reduce the impact of gene size, and where should I go to obtain the data to do this?

    I am working with TCGA MAF files so I have entrez IDs and Hugo names for the genes. I would like to normalize the counts according to the size of the gene.

    I THINK tcga only includes mutations that were within the expressed sequences, but I would have to double check to be sure. (if any of you know the answer for certain that would be appreciated.)

    Would the answer make a difference as to whether I should normalize only according to total number of nucleotides in exon sequences (and leave out the intron lengths?), or should I go for total gene start-stop length anyway?

    The data is Human data. To be clear, by counts I mean # of mutations per gene, per sample or grouping of samples.
    Last edited by Kotoro; 08-24-2015, 08:24 AM.

  • #2
    I'm surprised. I figured this one would be easy.

    Comment


    • #3
      I guess no one felt like answering. Normalize by total exonic length, thereby excluding introns. In theory, one could add a few bases per intron to account for finding splice-site mutations, but the difference due to that will be miniscule. You can get that from Ensembl/gencode/UCSC. Just parse the GTF file (it might be convenient to do this in R. I wouldn't be surprised if the UCSC table browser already has this info in some random table as well.

      Comment


      • #4
        Thank you for your response. I figured that exon-length sums was the best way to go for this, but wasn't sure. Thanks. I've pulled down some stuff from UCSC and am starting to look at my options in terms of the table browser for gene lists.

        I don't really KNOW R, though I could easily find a library in either perl or python to use to compute the numbers relatively quickly. (something this simple doesn't warrant a C program I would think)

        Any idea which of the known human genes tracks in the genome browser would be best to pair up with the TCGA data? (their MAF files use Hugo gene symbol and Entrez Gene ID identifiers).
        Last edited by Kotoro; 09-05-2015, 08:50 PM.

        Comment


        • #5
          Yup, a little python or perl script would make more sense than something in C. I expect that the refseq track matches the best, since that would likely use Entrez IDs already.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Recent Advances in Sequencing Analysis Tools
            by seqadmin


            The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
            05-06-2024, 07:48 AM
          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Today, 06:35 AM
          0 responses
          12 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, Yesterday, 02:46 PM
          0 responses
          18 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 05-07-2024, 06:57 AM
          0 responses
          17 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 05-06-2024, 07:17 AM
          0 responses
          19 views
          0 likes
          Last Post seqadmin  
          Working...
          X