Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • [DEXSeq] problem with estimateSizeFactors

    Dear collegues,

    I have a problem with sizeFactorEstimate function.
    After loading of ExonCountObject and running makeCompleteDEUAnalysis(ecs) I got an error:
    Error in .local(object, ...) : Estimate size factors first.

    Then I tried to run analysis step by step:
    ecs <- estimateSizeFactors(ecs)
    This function works and doesn`t return any warnings.
    But sizeFactors(ecs) shows that all values of sizeFactors in ecs are still NA.
    At the same time counts(ecs) shows that the counts of features are OK.

    Another strange thing in this run occured while creating exonCountObject, but may be I misunderstand how to assign sample names. Design was:

    design <- data.frame(
    condition = conds,
    replicate = reps,
    row.names = tags,
    stringsAsFactors = TRUE,
    check.names = FALSE)


    where tags vector contained vector of sample names. And rows of design dataframe was as in tags.
    But after creating of exonCountObject:

    ecs = read.HTSeqCounts(
    countfiles = files,
    design = design,
    flattenedfile = "dexseq.gtf"
    );


    in ecs-slots (as counts or sizeFactors) sample names was not row names from design df, but full paths to files vector.

    Any ideas what could be the problem?

    Thank you,
    Yerbol

  • #2
    Hi Yerbol,

    Can you include the output of your sessionInfo()?

    I think the function estimateSizeFactors, that is called inside makeCompleteDEUAnalysis, is returning NAs. Do you have NA values in your counts? or any not integer value?

    Best wishes,
    Alejandro

    Comment


    • #3
      Originally posted by areyes View Post
      Hi Yerbol,

      Can you include the output of your sessionInfo()?

      I think the function estimateSizeFactors, that is called inside makeCompleteDEUAnalysis, is returning NAs. Do you have NA values in your counts? or any not integer value?

      Best wishes,
      Alejandro
      I see NAs in my counts.

      Comment


      • #4
        That should not be the case, either you have 0 or more counts, but NA counts are strange.
        What are you using to make the counts? Do you see NAs also in the count files?

        Comment


        • #5
          Originally posted by areyes View Post
          That should not be the case, either you have 0 or more counts, but NA counts are strange.
          What are you using to make the counts? Do you see NAs also in the count files?
          Oops, its my bad now. I saw it after running the R script. Not in the counts file. Sorry my friend.

          Comment


          • #6
            Can you include the output of your sessionInfo()?
            R version 2.15.1 (2012-06-22)
            Platform: x86_64-pc-linux-gnu (64-bit)

            locale:
            [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
            [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
            [7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C
            [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

            attached base packages:
            [1] stats graphics grDevices utils datasets methods base

            other attached packages:
            [1] SAJR_0.0 multicore_0.1-7 DEXSeq_1.2.1 Biobase_2.16.0 BiocGenerics_0.2.0

            loaded via a namespace (and not attached):
            [1] biomaRt_2.12.0 hwriter_1.3 MASS_7.3-7 plyr_1.7.1 RCurl_1.91-1 statmod_1.4.15
            [7] stringr_0.6 tools_2.15.1 XML_3.9-4
            I think the function estimateSizeFactors, that is called inside makeCompleteDEUAnalysis, is returning NAs. Do you have NA values in your counts? or any not integer value?
            no, i checked by is.na(ecs). Counts are seem to be OK, they were generated by dexseq_count.py

            Moreover, I guess the problem is somehow related to the number of conditions. When I run DEXSeq on two conditions (2 conds * 3 replicates) it works, but when vector with conditions in design contains more than two categories it fails.

            Comment


            • #7
              could you check using:

              any(is.na(counts(ecs)))

              ???

              Comment


              • #8
                Originally posted by areyes View Post
                could you check using:

                any(is.na(counts(ecs)))

                ???
                yes, i misstyped, I do that. It returns FALSE.
                And as I said, it works when I subset from same dataset only two conditions
                Last edited by yerbol; 10-11-2012, 04:20 AM.

                Comment


                • #9
                  The number of conditions or replicates should not be a problem. I am unable to reproduce this error... would you mind sending me your ExonCountSet object to give it a closer look?

                  Comment


                  • #10
                    yes, of course. email?

                    Comment


                    • #11
                      Hi Yerbol,

                      Thanks for sending me your ExonCountSet object. I did

                      Code:
                      > toOut <- colSums(counts(ecs1))
                      > names(toOut) <- NULL
                      > toOut
                       [1]  1779963   136044    26189  3148937  6446319  3636717  1587331   609052
                       [9]  1453709  5970441    57469  2804561       18       17       19  3630791
                      [17] 12749375  5368012
                      You have samples with very low counts (26189, 57469, 18, 19, 17), which is definitely not normal), which sequencing technology are you using? Maybe you should check your read and alignment files. When removing this strange samples, the normalization factors are not NA anymore.

                      Alejandro Reyes

                      Comment


                      • #12
                        Yes, I know, some samples failed and have no coverage.
                        And I havn`t done any prefiltering yet. I thought it shouldn`t influence results, because in case of low counts there will be no significance in DEU-test.

                        So, formally - what should be lowest (and may be highest??) count number for correct run of estimateSizeNumbers. And its really weird that it influence other samples too.

                        And thank you VERY much for help!

                        Comment


                        • #13
                          I suggest you standart quality controls and discard the samples that are not good (% of aligned reads, quality per cycle, PCR duplicates, contamination, etc). It is just impossible to compare a library with 12749375 read counts with one with 7 read counts.

                          No problem!
                          Alejandro

                          Comment


                          • #14
                            I found the similar NA problem and my library size looks ok following your sample scripts:

                            > out<-colSums(counts(ecs))
                            > names(out)<-NULL
                            > out
                            [1] 48205948 43440778 22486575 22125932 40800119 51553703 47167921 14781957
                            [9] 30978061 62536983 25509126 55754959 16406873 59322632 39926544 63520796
                            [17] 71058692 58530700

                            Comment


                            • #15
                              more details about my data:

                              any(is.na(counts(ecs)))
                              [1] FALSE

                              > design
                              countFile condition
                              MP ./ACAGTG///accepted_hits.exonCounts MP
                              FA ./ACTTGA///accepted_hits.exonCounts FA
                              FR ./AGGTTT///accepted_hits.exonCounts FR
                              FW ./AGTCAA///accepted_hits.exonCounts FW
                              MA ./ATCACG///accepted_hits.exonCounts MA
                              FW.1 ./CAGATC///accepted_hits.exonCounts FW
                              MM ./CGATGT///accepted_hits.exonCounts MM
                              FP ./CTTGTA///accepted_hits.exonCounts FP
                              MA.1 ./GATCAG///accepted_hits.exonCounts MA
                              MW ./GCCAAT///accepted_hits.exonCounts MW
                              FM ./GGCTAC///accepted_hits.exonCounts FM
                              FR.1 ./GTCCGC///accepted_hits.exonCounts FR
                              MM.1 ./TAGCTT///accepted_hits.exonCounts MM
                              MW.1 ./TGACCA///accepted_hits.exonCounts MW
                              MP.1 ./TTAGGC///accepted_hits.exonCounts MP
                              FP.1 ./index10///accepted_hits.exonCounts FP
                              FM.1 ./index11///accepted_hits.exonCounts FM
                              FA.1 ./index9///accepted_hits.exonCounts FA

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Essential Discoveries and Tools in Epitranscriptomics
                                by seqadmin




                                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                04-22-2024, 07:01 AM
                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-25-2024, 11:49 AM
                              0 responses
                              19 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-24-2024, 08:47 AM
                              0 responses
                              17 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              62 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              60 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X