Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DESeq no recognizing row.names

    Hi,

    I'm trying to use DESeq to know the differential expressed genes of my datasets and i'm encountering that DESeq is not recognizing my row.names so i can't create my cds.

    My .csv input file looks like:

    Code:
    transcript_id,C4,CRL_2APR10,CRL_1_15JUL11,CRL_2_15JUL11 
    comp1000201_c0_seq1,5.00,0.00,0.00,0.00
    comp1000297_c0_seq1,7.00,0.00,0.00,0.00
    comp100036_c0_seq1,0.00,0.00,0.00,0.00
    comp10003_c1_seq1,2.00,0.00,0.00,0.00
    comp100041_c0_seq1,3.00,0.00,0.00,0.00
    comp100041_c0_seq2,0.00,0.00,0.00,0.00
    comp100041_c0_seq3,0.00,0.00,0.00,0.00
    comp100051_c0_seq1,0.00,0.00,0.00,0.00
    comp1000890_c0_seq1,3.00,0.00,0.00,0.00
    This is what i'm running:

    Code:
    > spercysts_vs_embryos = read.csv (
    +   file.choose(), 
    +   header = TRUE, 
    +   row.names=1, 
    +   sep = ",", 
    +   dec = ".")
    
    > head(spercysts_vs_embryos)
                        C4 CRL_2APR10 CRL_1_15JUL11 CRL_2_15JUL11
    comp1000201_c0_seq1  5          0             0             0
    comp1000297_c0_seq1  7          0             0             0
    comp100036_c0_seq1   0          0             0             0
    comp10003_c1_seq1    2          0             0             0
    comp100041_c0_seq1   3          0             0             0
    comp100041_c0_seq2   0          0             0             0
    
    >cond = factor(c("SP", "SP", "EB", "EB"))
    
    > spercysts_vs_embryosDesign = data.frame(
    +   row.names = colnames( spercysts_vs_embryos ), 
    +   condition = c( "SP", "SP", "EB", "EB" ), 
    +   libType = c( "paired-end", "paired-end", "paired-end", "paired-end" ) )
    > spercysts_vs_embryosDesign
                  condition    libType
    C4                   SP paired-end
    CRL_2APR10           SP paired-end
    CRL_1_15JUL11        EB paired-end
    CRL_2_15JUL11        EB paired-end
    
    > str(spercysts_vs_embryos)
    'data.frame':	307048 obs. of  4 variables:
     $ C4           : num  5 7 0 2 3 0 0 0 3 0 ...
     $ CRL_2APR10   : num  0 0 0 0 0 0 0 0 0 0 ...
     $ CRL_1_15JUL11: num  0 0 0 0 0 0 0 0 0 10 ...
     $ CRL_2_15JUL11: num  0 0 0 0 0 0 0 0 0 3 ...
    So, everything looks fine to me. But when i try to create my cds:

    Code:
    > cds <-newCountDataSet(spercysts_vs_embryos, cond )
    Error in newCountDataSet(spercysts_vs_embryos, cond) : 
      The countData is not integer.
    So, if i check what is happening:

    Code:
    > which( is.na(spercysts_vs_embryos), arr.ind=TRUE )
         row col
    Any suggestions???
    Thanks!

  • #2
    looks like there'ssome non-integer, have you tried tail(spercysts_vs_embryos) ? Once I had some non-integer in tail
    pbseq

    Comment


    • #3
      Please do not crosspost the same question simultaneously in two forums (SeqAnswers and Bioconductor mailing list).

      Comment


      • #4
        Originally posted by Simon Anders View Post
        Please do not crosspost the same question simultaneously in two forums (SeqAnswers and Bioconductor mailing list).
        Sorry Simon, i was desperate...

        Comment


        • #5
          Originally posted by pbseq View Post
          looks like there'ssome non-integer, have you tried tail(spercysts_vs_embryos) ? Once I had some non-integer in tail
          pbseq
          > tail(spercysts_vs_embryos)
          C4 CRL_2APR10 CRL_1_15JUL11 CRL_2_15JUL11
          comp99965_c0_seq1 3 0 11 0
          comp99972_c0_seq1 0 0 22 0
          comp99988_c0_seq2 0 0 0 0
          comp99995_c0_seq1 2 0 0 0
          comp999991_c0_seq1 3 0 9 0
          comp99999_c0_seq1 5 0 0 0

          Comment


          • #6
            Hi,

            Perform colSums(spercysts_vs_embryos), to see if there are any decimal vaues
            in the sums.

            Thanks
            --
            Muthu

            Comment


            • #7
              Originally posted by muthu545 View Post
              Hi,

              Perform colSums(spercysts_vs_embryos), to see if there are any decimal vaues
              in the sums.

              Thanks
              --
              Muthu

              > colSums(spercysts_vs_embryos)
              C4 CRL_2APR10 CRL_1_15JUL11 CRL_2_15JUL11
              17856472 4152157 27308366 3531719

              Comment


              • #8
                Hi,

                Is it possible for you to attach the csv file (if in case you do not mind)
                in_order to replicate the same problem you encounter.

                Thanks
                --
                Muthu

                Comment


                • #9
                  it's not very polished , but I'd try:
                  new_DF =data.frame(cbind(as.integer(spercysts_vs_embryos[,1]),as.integer(spercysts_vs_embryos[,2]),as.integer(spercysts_vs_embryos[,3])))

                  then to get back to proper colnames:
                  colnames(new_DF)=c("a","b","c")

                  Comment


                  • #10
                    Hi all,

                    I discovered that the problem was that RSEM was generating (for some reason that i cannot explain) decimal number in the column of expected count where you are suppose to have only integer numbers... I fixed it with excel (i know that is not a fancy way but i didn't know how to do it).

                    Thanks,
                    alisrpp

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Essential Discoveries and Tools in Epitranscriptomics
                      by seqadmin


                      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                      Yesterday, 07:01 AM
                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    37 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    41 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    35 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-04-2024, 09:00 AM
                    0 responses
                    55 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X