Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to Normalize NGS data? Tags per million?

    Hi all,

    I have three libraries from 3 experiments. Our sequences are tags with a poly(A) tail, like the polyA-tail sequences in mRNA-seq but using different protocol. Most of the tags are located in 3'-UTR regions.

    I think I need to normalize the data before comparing different libraries.

    There is a normalization method called TPM:
    TPM (tags per million) according to the total tag count in each library
    as follows: TPMj i=Cji*10^6/Tj, where TPMj i is the TPM for SAGE tag i in SAGE library j, Cj i is the count of SAGE tag i in library j, and Tj is the total SAGE tag count in SAGE library j.

    It seems that this method is just to normalize the sequence number by the total sequence count in each library.

    But, if I use TPM, should I use the raw sequence count or only the count of the mapped sequence? Or any other normalization method?

    Thanks a lot.

  • #2
    I would think you would want to use mapped tags as the denominator; otherwise poorly sequenced/prepared libraries will artificially have lower expression values

    Comment


    • #3
      Form follows function

      Hi Xhuister

      when asking 'how' to do normalisation it is a good idea to first ask 'what for'. In RNA-Seq, a typical reason is to avoid spurious differential expression calls just because of differential library coverage. Another consideration is that you want to keep track of the actual counts when assessing statistical confidence in differential expression calls, since the counting noise is relatively more important when the numbers are small, even for the same fold-change.

      A paper by M. Robinson and A. Oshlack discusses especially the first aspect in quite some detail: http://genomebiology.com/2010/11/3/R25

      A paper by S. Anders and myself combines a very similar normalisation method with the error modeling needed for confidence computations: http://precedings.nature.com/documents/4282/version/2

      Best wishes
      Wolfgang Huber
      Wolfgang Huber
      EMBL

      Comment


      • #4
        Thank you Wolfgang and Krobison,

        I'm reading the DEG paper now. But I'm not sure whether it is suitable for my case.

        In my case, for each gene, there are only some (normally <5) locations with reads and most of the locations are in 3'-UTR, not like the case that the reads are distributed along the transcripts.

        Do you think it's OK to use the normalization method using DESeq or just use 'Tag per million' to normalize by the total count in each library? Thank you!

        Comment


        • #5
          Dear Xhuister

          as long as you have reason to believe that your counts are roughly proportional to the true target gene abundance (with unknown proportionality factors that depend on sample, lane and gene), these normalisations are in principle suitable. (And if that were not the case, then I am not sure what you want to normalise.)

          Best wishes
          Wolfgang
          Wolfgang Huber
          EMBL

          Comment


          • #6
            Thank you Wolfgang. Maybe I'll have a try both TPM and DESeq.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Recent Advances in Sequencing Analysis Tools
              by seqadmin


              The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
              05-06-2024, 07:48 AM
            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin




              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
              04-22-2024, 07:01 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 02:46 PM
            0 responses
            11 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 05-07-2024, 06:57 AM
            0 responses
            13 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 05-06-2024, 07:17 AM
            0 responses
            17 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 05-02-2024, 08:06 AM
            0 responses
            23 views
            0 likes
            Last Post seqadmin  
            Working...
            X