Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DESeq: filtering genes by supplementary information

    Hi all,

    I have noted and read, in different threads on this forum, about filtering data in DESeq. However, the filtering I'm looking for is not primarily about the data quality or reducing hypotheses for limiting FDR.

    My situation is that I will run the nbinomTest()-function from the DESeq package,
    And I will do so on a subset of the data based on supplementary information about the genes. Namely, only those genes which are located on sex-chromosomes. Also, i will compare the result when using different conditions ('cds_object@phenoData@data$condition').

    I already know which genes are located on sex-chromosomes, so that isn't the problem - the problem arises when I want to filter out those genes in a CountDataSet-object (cds-object) in R.
    After size factor and dispersion is estimated, this is not a trivial thing to do since the 'gene-names' is now inside the cds-object, and no longer a column of the count-table.

    The walkaround is to use a different, already filtered, input data (count table) - but that will lead to another sizeFactor, and I don't wan't to call the estimateSizeFactor()-function on the filtered data set since it is genes from the sex-chromosome (we do expect a higher over-all expression in the homogametic sex, depending on the level of dosage compensation - and that should not affect the sizeFactor). So therefore, the full dataset is needed only to aquire the correct sizeFactors. And then, instead of filtering this dataset - I use another, already filtered, dataset (that really is just a subset of the first dataset) and assign the previously aquired sizeFactors to the new cds-object. This feels quite awkward and is also ineffective - especially if you would run in into a situation where different kind of filtering is to be done on the same dataset, and all the filtering is based on supplementary information.

    I feel as it is a better way of doing this. I'm new to both R and DESeq, so it might be something simple that I'm just missing. For example, lets say that I have an R-vector where each element correspond to a gene-name I want to filter out. Is there a way to grep these gene-names in the CountDataSet-object, so that I get the gene-names with corresponding samples' gene-counts, and the sizeFactor is untouched, saved in a new cds-object?

    Thanks in advance
    Markus

  • #2
    Have you tried just subsetting it as normal (i.e., "cds_object[IDX,]" where IDX is an index of genes of interest)? The CountDataSet extends an eSet, and that's how you'd subset that.

    Comment


    • #3
      Originally posted by dpryan View Post
      Have you tried just subsetting it as normal (i.e., "cds_object[IDX,]" where IDX is an index of genes of interest)? The CountDataSet extends an eSet, and that's how you'd subset that.
      Hi, and thank you for your quick reply.

      Using standard indexing did not help, since the actual count-table is just a part of the cds_object, not the cds_object itself. However, i did find a way to solve it: The count data can be found in cds_object@assayData$counts. So from there it shouldn't be any problem with indexing and filtering.

      Comment


      • #4
        Right, but if an object offers a method for subsetting then that will typically apply to all of its components. So generally:

        Code:
        cds_sub <- cds_object[IDX,]
        table(counts(cds_sub) == counts(cds_object)[IDX,])
        (or something like that) will yield all True.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Latest Developments in Precision Medicine
          by seqadmin



          Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

          Somatic Genomics
          “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
          05-24-2024, 01:16 PM
        • seqadmin
          Recent Advances in Sequencing Analysis Tools
          by seqadmin


          The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
          05-06-2024, 07:48 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 05-24-2024, 07:15 AM
        0 responses
        16 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-23-2024, 10:28 AM
        0 responses
        18 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-23-2024, 07:35 AM
        0 responses
        22 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-22-2024, 02:06 PM
        0 responses
        11 views
        0 likes
        Last Post seqadmin  
        Working...
        X