Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • sdriscoll
    replied
    i usually trim out zero count genes (across all samples) before calling newCountDataSet like so...

    Code:
    mycounts <- mycounts[rowSums(mycounts) > 0,]

    Leave a comment:


  • chadn737
    replied
    When you make the resSig it keeps the same number of lines that were in res and just writes NA for all those lines that did not meet the cutoff. The presence of NA in these lines is not a problem.

    To get rid of them, just use the na.omit function:

    Code:
    resSig<-na.omit(resSig)
    This will omit all lines that have an NA, leaving you only those lines with differentially expressed genes.

    Leave a comment:


  • fatakias
    replied
    Dear DESeq experts,

    Apologies for continuing this thread - I have an identical problem.

    I try to identify Differentially Expressed Genes (DEG) from a known dataset. I am trying to understand why over 80% of entries with 'NA' values are obtained extracted from counts table as obtained via DESeq_1.8.2. I have seen similar queries in the forum but I believe I am using the latest release of DESeq that is not a development version. However, if this issue is fixed in an updated version please let us know and how do we load that library in R? Thanks.
    Best,
    sarosh


    A two step approach to my workflow is as follows:


    Part_1- extract dataset (pasilla dataset)
    Part_2- use DESeq library calls to identify DEGs and show the 'NA' values.

    This dataset (countstable.txt) has
    14,470 entries with count information, of which
    ~2,500 entries with count information 0 for all case replicates


    Code:
    ################################################
    #
    #Part_1- extract a dataset
    #
    
    rm(list = ls());
    
    #require(DESeq);
    require(pasilla);
    
    data("pasillaGenes");
    
    head(counts(pasillaGenes));
    
    #save_data to view and contrast
    write.table(counts(pasillaGenes), file="countstable.txt", quote=FALSE, sep="  ", row.names=TRUE);

    ################################################

    #edit countstable.txt - remove header
    #count the number of entries with all counts 0
    # (use grep command ..)
    #start R again


    Code:
    ################################################
    ################################################
    #
    #Part_2
    
    require(DESeq);
    require(pasilla);
    
    countsTable <-read.table("countstable.txt", header=TRUE, stringsAsFactors=TRUE)
    rownames( countsTable ) <- countsTable$gene
    countsTable <- countsTable[,-1]
    conds=c("U","U","U","U","T","T","T");
    
    cds <- newCountDataSet( countsTable, conds);
    cds <-estimateSizeFactors(cds);
    
    #normcds <- counts( cds, normalized=TRUE );
    #write.table(normcds, file="normalized.countstable.txt", quote=FALSE, sep="\t", row.names=TRUE);
    
    cds <- estimateDispersions( cds, sharingMode="fit-only" );
    res <- nbinomTest(cds, "U","T");
    
    resSig <- res[ res$padj < 0.05,];
    resSig <- resSig[ order(resSig$pval), ];
    write.table(resSig, file="DEGsig_list.txt", quote=FALSE, sep="\t", row.names=FALSE);
    
    #############################################
    Final list of DEG has a large majority of NA entries.

    Leave a comment:


  • Simon Anders
    replied
    Typically, such entries appear in R when subsetting with a conditional expression that may contain or result in NA. Please post your full R code (and the output of sessionInfo()), the we can have a look.

    Leave a comment:


  • marcowanger
    replied
    Originally posted by marcowanger View Post
    specifically, you mean there is no reads mapped at all.

    Do you mean for a majority of samples within 1 group has no reads at all, and thus shows "0"?

    Visually, suppose there are 2 samples per group

    [group1] sample 1: 0, sample 2: 0
    [group2] sample 1: 1, sample 2: 2

    ?
    idyll_ty, have you checked your data??

    Leave a comment:


  • marcowanger
    replied
    specifically, you mean there is no reads mapped at all.

    Do you mean for a majority of samples within 1 group has no reads at all, and thus shows "0"?

    Visually, suppose there are 2 samples per group

    [group1] sample 1: 0, sample 2: 0
    [group2] sample 1: 1, sample 2: 2

    ?

    Leave a comment:


  • idyll_ty
    replied
    Yes, the genes names are consistent.

    I find the problem. Because in my input read count data, for some genes, there are no reads mapped at all, and those genes cause NA values in the results.

    Leave a comment:


  • marcowanger
    replied
    Originally posted by idyll_ty View Post
    I am using DESeq to analyze my RNA-seq data. However, I found in my generated differentially expressed genes there were a a bunch of "NA". Please see the attached table for details. The number of those "NA" genes is different for different comparisons.

    id baseMean baseMeanA baseMeanB foldChange log2FoldChange pval padj resVarA resVarB
    NA NA NA NA NA NA NA NA NA NA NA
    NA.1 NA NA NA NA NA NA NA NA NA NA
    616 GPR128 187.5648803 0 234.4561004 Inf Inf 1.19E-15 1.16E-12 0 19.90527498


    Is there anyone else who experience this before? What could be the problem? Thanks.
    Did you check the the same set of geneID used for read-counting is identical for every samples (gene with no read : 0)?

    Leave a comment:


  • DESeq: "NA" generated in the resulted differentially expressed genes

    I am using DESeq to analyze my RNA-seq data. However, I found in my generated differentially expressed genes there were a a bunch of "NA". Please see the attached table for details. The number of those "NA" genes is different for different comparisons.

    id baseMean baseMeanA baseMeanB foldChange log2FoldChange pval padj resVarA resVarB
    NA NA NA NA NA NA NA NA NA NA NA
    NA.1 NA NA NA NA NA NA NA NA NA NA
    616 GPR128 187.5648803 0 234.4561004 Inf Inf 1.19E-15 1.16E-12 0 19.90527498


    Is there anyone else who experience this before? What could be the problem? Thanks.

Latest Articles

Collapse

  • seqadmin
    Recent Advances in Sequencing Technologies
    by seqadmin







    Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

    Long-Read Sequencing
    Long-read sequencing has...
    12-02-2024, 01:49 PM
  • seqadmin
    Genetic Variation in Immunogenetics and Antibody Diversity
    by seqadmin



    The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
    11-06-2024, 07:24 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 12-02-2024, 09:29 AM
0 responses
151 views
0 likes
Last Post seqadmin  
Started by seqadmin, 12-02-2024, 09:06 AM
0 responses
51 views
0 likes
Last Post seqadmin  
Started by seqadmin, 12-02-2024, 08:03 AM
0 responses
42 views
0 likes
Last Post seqadmin  
Started by seqadmin, 11-22-2024, 07:36 AM
0 responses
75 views
0 likes
Last Post seqadmin  
Working...
X