Seqanswers Leaderboard Ad



No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Up to date ERCC spike ins RNA seq analysis

    Hello everybody,

    I'm expecting to get my ERCC spiked RNA seq sequencing files soon.
    Therefore i would like to find out what's the best way to analyze them.
    There are several post where people ask some specific questions to parts of the analysis, but i couldn't find a beginning to end thing.

    My idea is to
    - map to data with TopHat2
    - find the number of reads per gene in the bam file from TopHat2 with HTSeq count
    - normalize these reads to the ERCCs. But i have no idea how to do that part

    I found a post suggesting to use loess normalization

    But there is also a paper, which claims that loess normalization is not a really good way to go. Additionally, i don't get how they run their suggested solution.

    So maybe someone has a suggestion how to do the ERCC normalization in 2016, what program i can use and what file or format one has to use. Maybe someone knows a thread, where it is written down how exactly to use the code.

    I guess, finally i would need to use DESeq count to compare my triplicates within different time points.

    Thanks a lot, Alex

  • #2
    Actually, in this paper, if you refer to the latest sentence of the antepenultimate paragraph, I understand you will get better results NOT using the ERCC spike-ins... I opened a discussion about these spike-ins right after I read this paper...


    • #3
      Could you please write what part you mean? If you start with the first few words of the text it will be easier to find. Thanks.

      I used the ERCCs, because i think that my factor might upregulate a lot of genes. If it does so, the normalization without ERCCs will destroy the result.
      This means upregulated genes might be interpreted as downregulated.

      Figure 1 (without ERCCs) and figure 2A (with ERCCs) in paper shows the problem clearly. I'm afraid there is no other way to exclude this, but ERCCs.


      • #4
        Of course, in some cases (as wrote blancha in the other discussion), it may help... Personally, I never used and I wanted to have to opinion of the SEQanswers community. It seems, depending on the project, they can bring more issues than they can solve (but again, depending on the project).

        I was referring to the the part where the authors say that their normalization is robust when applied to a set of control genes, or set of replicates, while it gave reasonable results using the ERCC spike-ins...


        • #5
          @Alex852013: Search here using "ERCC" and you will find the threads that @SylvainL is alluding to.


          • #6
            Hi Alex,

            the ERCC spike-ins do not contain any junctions. Thus, using TopHat2 solely on the ERCC- reference will cause some trouble. Either you need to combine your "host" annotation with the ERCC spike-in ones, or you run e.g. Bowtie2 on the ERCC sequences first and use the unmapped reads for the further analysis.

            Moreover, I'd suggest to use the ERCC-Dashboard to have an overview how the ERCCs behave in your experiment.
            IMHO, the ERCC transcripts are not reflecting the complexity of the transcriptome. This can be useful in case of controlling coverage, strandedness, and input/gene-read correlation. But they are not designed to control for different junction/PAS-usage, overlapping genes, SNP-detection, .....
            You might have a look at The 5' ends are not described correctly in the provided annotation files; whilst the polyA sequence is included in the fasta.

            tl;dr The ERCCs were designed for microarrays and can control nicely for a limited set of quality parameters. For normalising data in a higher complex sample space I would not use them.


            • #7
              I read quickly the paper you give the link to, Alex and I think it would be interesting to re-analyze their data with a pipeline adjusted for RNAseq, and a splicing aware mapper. They used bowtie and the package "affy". And they also used the RPKM counts, and it does not seem they did replicates... To me, it looks like they more or less did everything wrong there (not a RNAseq analysis pipeline)
              Last edited by SylvainL; 02-24-2016, 06:41 AM.


              • #8
                @ SylvainL:

                The paper "Revisiting Global Gene Expression Analysis" was thought to give people an idea of the problem i'm facing. I don't think it makes sense to discuss the quality here.
                To give everybody an ide of the problem without checking the paper, the pic:

                1st row: for a transcription factor regulating only a few genes, no spike ins are required for sure. Normalization works perfect.
                2nd row: if a transcription factor changes most of the genes (i've heard alredy 20 % of all genes is enough), the normalization will be biased, because the normlization programs assume that the expression of most genes will stay the same.
                3rd row: with the ERCCs included, the normalization bias mentioned in row 2 can be avoided. That's what i want to use the spike ins for.

                My protein is a transcriptional activator in a viral system, but in the human system it downregulated most of the genes. This was kind of unexpected. Therefore i want to exclude that i get the normalization bias, which is described in the picture.

                @ Thank you, i can also use bowtie 2, nevertheless i already made a file which includes each ERCCs RNA like a single chromosome.

                Maybe someone can nevertheless tell me, how to do the normalization. I will for sure check both ways of analysis (with and without ERCCs), but therefore i would need to know how to normalize with the ERCCs.
                Thanks a lot


                • #9
                  I think a better approach would have been adding ERCC spike-in to cells prior to RNA extraction. In this case an equal number of cells should be used for all samples.


                  • #10
                    That's what i did. I used the same amount of cells, added the ERCCs (i took 1 µl diluted this one 1:100 and added 10 µl to each tube (better than 1 µl from a 1:10 dilution to avoid a strong pipetting bias effect).


                    • #11

                      I totally understand your points and why you want to use the ERCC spike-ins. Probably in your case, it is really necessary. But I believe it is important to look how people did their analysis to be sure their normalization method really brings a plus... Unfortunately, I do not have time right now, but I will re-analyze the data of this paper, using different pipelines to get my own idea about this spike-ins normalization...


                      Latest Articles


                      • seqadmin
                        Recent Advances in Sequencing Analysis Tools
                        by seqadmin

                        The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                        05-06-2024, 07:48 AM
                      • seqadmin
                        Essential Discoveries and Tools in Epitranscriptomics
                        by seqadmin

                        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                        04-22-2024, 07:01 AM





                      Topics Statistics Last Post
                      Started by seqadmin, 05-14-2024, 07:03 AM
                      0 responses
                      Last Post seqadmin  
                      Started by seqadmin, 05-10-2024, 06:35 AM
                      0 responses
                      Last Post seqadmin  
                      Started by seqadmin, 05-09-2024, 02:46 PM
                      0 responses
                      Last Post seqadmin  
                      Started by seqadmin, 05-07-2024, 06:57 AM
                      0 responses
                      Last Post seqadmin