Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Up to date ERCC spike ins RNA seq analysis

    Hello everybody,

    I'm expecting to get my ERCC spiked RNA seq sequencing files soon.
    Therefore i would like to find out what's the best way to analyze them.
    There are several post where people ask some specific questions to parts of the analysis, but i couldn't find a beginning to end thing.

    My idea is to
    - map to data with TopHat2
    - find the number of reads per gene in the bam file from TopHat2 with HTSeq count
    - normalize these reads to the ERCCs. But i have no idea how to do that part

    I found a post suggesting to use loess normalization


    But there is also a paper, which claims that loess normalization is not a really good way to go. Additionally, i don't get how they run their suggested solution.


    So maybe someone has a suggestion how to do the ERCC normalization in 2016, what program i can use and what file or format one has to use. Maybe someone knows a thread, where it is written down how exactly to use the code.

    I guess, finally i would need to use DESeq count to compare my triplicates within different time points.

    Thanks a lot, Alex

  • #2
    Actually, in this paper, if you refer to the latest sentence of the antepenultimate paragraph, I understand you will get better results NOT using the ERCC spike-ins... I opened a discussion about these spike-ins right after I read this paper...

    Comment


    • #3
      Could you please write what part you mean? If you start with the first few words of the text it will be easier to find. Thanks.

      I used the ERCCs, because i think that my factor might upregulate a lot of genes. If it does so, the normalization without ERCCs will destroy the result.
      This means upregulated genes might be interpreted as downregulated.

      Figure 1 (without ERCCs) and figure 2A (with ERCCs) in paper http://www.sciencedirect.com/science...92867412012263 shows the problem clearly. I'm afraid there is no other way to exclude this, but ERCCs.

      Comment


      • #4
        Of course, in some cases (as wrote blancha in the other discussion), it may help... Personally, I never used and I wanted to have to opinion of the SEQanswers community. It seems, depending on the project, they can bring more issues than they can solve (but again, depending on the project).

        I was referring to the the part where the authors say that their normalization is robust when applied to a set of control genes, or set of replicates, while it gave reasonable results using the ERCC spike-ins...

        Comment


        • #5
          @Alex852013: Search here using "ERCC" and you will find the threads that @SylvainL is alluding to.

          Comment


          • #6
            Hi Alex,

            the ERCC spike-ins do not contain any junctions. Thus, using TopHat2 solely on the ERCC- reference will cause some trouble. Either you need to combine your "host" annotation with the ERCC spike-in ones, or you run e.g. Bowtie2 on the ERCC sequences first and use the unmapped reads for the further analysis.

            Moreover, I'd suggest to use the ERCC-Dashboard to have an overview how the ERCCs behave in your experiment.
            IMHO, the ERCC transcripts are not reflecting the complexity of the transcriptome. This can be useful in case of controlling coverage, strandedness, and input/gene-read correlation. But they are not designed to control for different junction/PAS-usage, overlapping genes, SNP-detection, .....
            You might have a look at https://www.biostars.org/p/170234/. The 5' ends are not described correctly in the provided annotation files; whilst the polyA sequence is included in the fasta.

            tl;dr The ERCCs were designed for microarrays and can control nicely for a limited set of quality parameters. For normalising data in a higher complex sample space I would not use them.

            Comment


            • #7
              I read quickly the paper you give the link to, Alex and I think it would be interesting to re-analyze their data with a pipeline adjusted for RNAseq, and a splicing aware mapper. They used bowtie and the package "affy". And they also used the RPKM counts, and it does not seem they did replicates... To me, it looks like they more or less did everything wrong there (not a RNAseq analysis pipeline)
              Last edited by SylvainL; 02-24-2016, 06:41 AM.

              Comment


              • #8
                @ SylvainL:

                The paper "Revisiting Global Gene Expression Analysis" was thought to give people an idea of the problem i'm facing. I don't think it makes sense to discuss the quality here.
                To give everybody an ide of the problem without checking the paper, the pic:



                1st row: for a transcription factor regulating only a few genes, no spike ins are required for sure. Normalization works perfect.
                2nd row: if a transcription factor changes most of the genes (i've heard alredy 20 % of all genes is enough), the normalization will be biased, because the normlization programs assume that the expression of most genes will stay the same.
                3rd row: with the ERCCs included, the normalization bias mentioned in row 2 can be avoided. That's what i want to use the spike ins for.

                My protein is a transcriptional activator in a viral system, but in the human system it downregulated most of the genes. This was kind of unexpected. Therefore i want to exclude that i get the normalization bias, which is described in the picture.

                @ Thank you, i can also use bowtie 2, nevertheless i already made a file which includes each ERCCs RNA like a single chromosome.

                Maybe someone can nevertheless tell me, how to do the normalization. I will for sure check both ways of analysis (with and without ERCCs), but therefore i would need to know how to normalize with the ERCCs.
                Thanks a lot

                Comment


                • #9
                  I think a better approach would have been adding ERCC spike-in to cells prior to RNA extraction. In this case an equal number of cells should be used for all samples.

                  Comment


                  • #10
                    That's what i did. I used the same amount of cells, added the ERCCs (i took 1 µl diluted this one 1:100 and added 10 µl to each tube (better than 1 µl from a 1:10 dilution to avoid a strong pipetting bias effect).

                    Comment


                    • #11
                      @Alex852013,

                      I totally understand your points and why you want to use the ERCC spike-ins. Probably in your case, it is really necessary. But I believe it is important to look how people did their analysis to be sure their normalization method really brings a plus... Unfortunately, I do not have time right now, but I will re-analyze the data of this paper, using different pipelines to get my own idea about this spike-ins normalization...

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        The Impact of AI in Genomic Medicine
                        by seqadmin



                        Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                        02-26-2024, 02:07 PM
                      • seqadmin
                        Multiomics Techniques Advancing Disease Research
                        by seqadmin


                        New and advanced multiomics tools and technologies have opened new avenues of research and markedly enhanced various disciplines such as disease research and precision medicine1. The practice of merging diverse data from various ‘omes increasingly provides a more holistic understanding of biological systems. As Maddison Masaeli, Co-Founder and CEO at Deepcell, aptly noted, “You can't explain biology in its complex form with one modality.”

                        A major leap in the field has
                        ...
                        02-08-2024, 06:33 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, Today, 06:12 AM
                      0 responses
                      13 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 02-23-2024, 04:11 PM
                      0 responses
                      67 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 02-21-2024, 08:52 AM
                      0 responses
                      70 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 02-20-2024, 08:57 AM
                      0 responses
                      61 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X