Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Deseq2 question

    Hi all,

    As a newbie in seq analysis, i have a question..: i have imported my data to the Deseq2 package. My data have nothing to do with comparisons (they are coming from the same cell population). What i'm trying to identify is how to find out which genes characterise my cell population (highest expression). Do i have to observe the baseMean value? and if yes, what is the threshold i would use? what is for example a baseline for a baseMean value?

    I hope i am clear and not cause any confusion..

    Thanks!!

  • #2
    Hi,
    if you don't have a comparison, aka differential expression, the DE part of DESeq2, I am not sure why you would go that route.

    You have some other things to think about e.g. gene length and others that might inflate counts of gene 1 compared to gene 2 for two genes having the same cellular abundance.

    Once you determine how you are going to account for gene specific factors to affect a normalization gene by gene, simply sort after normalization and take the top N rows. If you aren't going to do this, you could do it all via grep|sort|head at the command line and skip R. Something like

    grep "gene_specific_prefix" HTSeq_count_file.out | sort -k 2,2nr | head -n N >results_file.txt

    So, for human, the gene specific prefix would be something like ENS. Or, since you are in R already, simply sort and take the top genes and get rid of the couple of lines for read stats at the bottom of the count stack. Instead of specifying a count (baseMean or I would do this myself via apply) threshold, I might choose the genes that make up the top 10% (or 20 or ...) of the total. cumsum in R.

    Comment


    • #3
      I should have stated, before you sort in R, make sure you have removed the bottom couple of rows.

      Comment


      • #4
        Thanks a lot for your reply. I got the first point and you are absolutely right that i don't need to insert it to R. One of the reason that comes to my mind is to merge the count matrices that come from the samples (eg the same timepoint). Specifically i have 8 samples from the same developmental timepoint. Of course i could merge them using a simple python function.
        For the second part (talking about the prefix), i lost you a bit (newbie ).
        My question is when i sort for the baseMean (eg >15000) i get 30 specific genes with high values..is this not a way to get a first glance at the data? even if i open the count matrices i get the same genes....

        Comment


        • #5
          Ahh, now you added new info, you have replicates AND a structure to your experiment.

          Yes, that would be ok.

          If you had only a single sample AND the counts were from HTSeq, you could get only the genes using grep on a string specific to your genes. For human genes using Ensembl id's, all names start with ENS.

          Since you have replicates AND an experiment, this would not be the best way to go. Your way is fine, although you still should consider gene length etc if you are going to make specific statements on abundance. Even then you will have a lot of unknown factors that make these statements difficult when comparing across genes rather than within a single gene but across experimental units.

          Comment


          • #6
            Perfect! Thanks a lot for your help.... it is true that i didn't include a lot of details. Actually we sorted cells (belonging to the same population), coming from 3 different development stages. What we need to do, is to characterize these populations with multiple markers. So, i assume that i don't have to come to any comparison. And you are right about the internal controls..we have housekeeping genes and also experimental RNA controls where we can normalize the gene values. The whole confusion was, what could i reply if somebody asked me what baseMean means? what units? otherwise i think im starting understanding the analysis.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Genetic Variation in Immunogenetics and Antibody Diversity
              by seqadmin



              The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
              11-06-2024, 07:24 PM
            • seqadmin
              Choosing Between NGS and qPCR
              by seqadmin



              Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
              10-18-2024, 07:11 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 11-08-2024, 11:09 AM
            0 responses
            42 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 11-08-2024, 06:13 AM
            0 responses
            31 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 11-01-2024, 06:09 AM
            0 responses
            34 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 10-30-2024, 05:31 AM
            0 responses
            23 views
            0 likes
            Last Post seqadmin  
            Working...
            X