Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • onyaw
    Junior Member
    • Nov 2013
    • 5

    DEXseq file loading flattened

    Hi, I'm using DEXseq for the first time and successfully created the gff file and .counts files with the py scripts, and sampleTable file from within R, as specified, but am getting the following error when creating an ecs.

    i do have all the files in the same wd on my desktop, although the gff and counts files were created on a different machine and moved over.

    sampleTable$countFile does read back the correct number of levels and file names

    any ideas; is it not recognizing the gff?

    thx

    onyaw

    > ecs <- read.HTSeqCounts( sampleTable$countFile,sampleTable,"C57BL6J_dexseq.gff" )

    Error in read.table(x, header = FALSE, stringsAsFactors = FALSE) :
    'file' must be a character string or connection
  • dpryan
    Devon Ryan
    • Jul 2011
    • 3478

    #2
    That error is occurring when DEXSeq is trying to read in the counts files. Can you paste in a excerpt of "sampleTable"?

    Comment

    • onyaw
      Junior Member
      • Nov 2013
      • 5

      #3
      Thanks - the table is below. when printing this i did see one small error (an extra comma) which i fixed, but still same issue.

      > sampleTable
      countFile condition libType
      B6J_wt_thal_1 B6J_wt_thal_1.counts B6J wt
      B6J_wt_thal_3 B6J_wt_thal_3.counts B6J wt
      B6J_wt_thal_4 B6J_wt_thal_4.counts B6J wt
      B6J_wt_ssctx_1 B6J_wt_ssctx_1.counts B6J wt
      B6J_wt_ssctx_3 B6J_wt_ssctx_3.counts B6J wt
      B6J_wt_ssctx_4 B6J_wt_ssctx_4.counts B6J wt
      FeJ_wt_thal_2 FeJ_wt_thal_2.counts FeJ wt
      FeJ_wt_thal_3 FeJ_wt_thal_3.counts FeJ wt
      FeJ_wt_thal_4 FeJ_wt_thal_4.counts FeJ wt
      FeJ_wt_ssctx_2 FeJ_wt_ssctx_2.counts FeJ wt
      FeJ_wt_ssctx_3 FeJ_wt_ssctx_3.counts FeJ wt
      FeJ_wt_ssctx_4 FeJ_wt_ssctx_4.counts FeJ wt
      B6J_mut_thal_1 B6J_mut_thal_1.counts B6J mut
      B6J_mut_thal_2 B6J_mut_thal_2.counts B6J mut
      B6J_mut_thal_3 B6J_mut_thal_3.counts B6J mut
      B6J_mut_ssctx_1 B6J_mut_ssctx_1.counts B6J mut
      B6J_mut_ssctx_2 B6J_mut_ssctx_2.counts B6J mut
      B6J_mut_ssctx_3 B6J_mut_ssctx_3.counts B6J mut
      FeJ_mut_thal_1 FeJ_mut_thal_1.counts FeJ mut
      FeJ_mut_thal_2 FeJ_mut_thal_2.counts FeJ mut
      FeJ_mut_thal_3 FeJ_mut_thal_3.counts FeJ mut
      FeJ_mut_ssctx_1 FeJ_mut_ssctx_1.counts FeJ mut
      FeJ_mut_ssctx_2 FeJ_mut_ssctx_2.counts FeJ mut
      FeJ_mut_ssctx_3 FeJ_mut_ssctx_3.counts FeJ mut

      Comment

      • dpryan
        Devon Ryan
        • Jul 2011
        • 3478

        #4
        By chance, does
        Code:
        typeof(sampleTable$countFile)
        return something other than "character"? BTW, you don't have to use "condition" and "libType" as column names. You might find "strain" and "genotype" more meaningful

        Comment

        • onyaw
          Junior Member
          • Nov 2013
          • 5

          #5
          well, that returns 'integer' not character.

          i realized about the column names (and obviously i have add'l conditions). but in trying to get it to work for me i thought i would be as literal as possible.

          Comment

          • dpryan
            Devon Ryan
            • Jul 2011
            • 3478

            #6
            At some point you converted your file names to factors, probably by using cbind(). Something like
            Code:
            sampleTable$countFile <- levels(sampleTable$countFile)[sampleTable$countFile]
            should fix that problem. In the future, don't use cbind() to create the sampleTable, but instead:
            Code:
            sampleTable <- data.frame(countFiles=list.files("counts$"),
                strain=factor(c(rep(c(rep("B6J",6), rep("FEJ",6)),2))),
                genotype=factor(c(rep("WT",12), rep("MUT",12))))

            Comment

            • onyaw
              Junior Member
              • Nov 2013
              • 5

              #7
              thanks - the table was ultimately constructed using literally the example in the pdf file, with names substituted. although originally i made it on my desktop as a csv file.

              i ran 'levels' as you suggested - it got further but now i'm getting this:

              > ecs <- read.HTSeqCounts( sampleTable$countFile,sampleTable,"C57BL6J_dexseq.gff" )
              Error: all(unlist(lapply(design, class)) == "factor") is not TRUE

              Comment

              • dpryan
                Devon Ryan
                • Jul 2011
                • 3478

                #8
                I should have mentioned that originally

                Code:
                 countFiles <- sampleTable$countFile
                design <- sampleTable[,-1]
                ecs <- read.HTSeqCounts(countFiles,design,"C57BL6J_dexseq.gff" )
                or something like that will probably work.

                Comment

                • onyaw
                  Junior Member
                  • Nov 2013
                  • 5

                  #9
                  Devon, thanks again. I tried that and got the same error. But maybe its because I have two conditions now ("condition" and "libType") and the value in our example was "-1" So I changed it to "-2" and it went without error! So i'll move on to the next steps...wish me smoothness, please!!

                  btw if I have multiple conditions, but that I want to test separately, do I need to specify the design formula beyond the design specified above? or am i better off making a separate sample table for each 'experiment' just looking at one condition/sample table at a time?

                  Comment

                  • dpryan
                    Devon Ryan
                    • Jul 2011
                    • 3478

                    #10
                    The -1 just removes the first column (the count file names) and -2 would remove the second (mouse strain), which you probably want to keep. So, I'm a bit surprised that didn't then produce an error (though perhaps I'm incorrectly visualizing the dataframe that you're using).

                    Anyway, I would recommend that you keep the full design when you do the analyses. Mouse strains have enough behavioral and other differences that, if unaccounted for, will end up killing your statistical power (all of the variances will be larger than need be). You could just remove the samples you don't need, but that will also decrease power. So leaving everything in is your best bet.

                    Comment

                    • thanhhoang
                      Member
                      • Jul 2013
                      • 16

                      #11
                      Hi Onyaw, Dpryan and everyone!
                      I have a similar problem when running read.HTSeqCounts. Could you guys please help me with that?
                      I counted the 6 SAM files from GSNAP output using dexseq_count.py by following DEXSeq manual, then I made sample table. Here is what I did:
                      >sampleTable <- data.frame(row.names = c( "E1", "E2", "E3","F1", "F2", "F3" ), countFile = c( "E1.count", "E2.count", "E3.count", "F1.counts","F2.count", "F3.count" ), condition = c( "E", "E", "E",
                      + "F", "F", "F" ))
                      >sampleTable
                      countFile condition
                      E1 E1.count E
                      E2 E2.count E
                      E3 E3.count E
                      F1 F1.counts F
                      F2 F2.count F
                      F3 F3.count F
                      >ecs <-read.HTSeqCounts(sampleTable$countFile,sampleTable,"protein_coding_flattened.gff")

                      Error in read.table(x, header = FALSE, stringsAsFactors = FALSE) :
                      'file' must be a character string or connection

                      I really appreciate your help.
                      Thanh

                      Comment

                      • dpryan
                        Devon Ryan
                        • Jul 2011
                        • 3478

                        #12
                        I saw your post on biostars first, so I replied there.

                        Comment

                        • thanhhoang
                          Member
                          • Jul 2013
                          • 16

                          #13
                          Hi dpryan,
                          Thank you
                          I just replied in Biostar. Here is what I just did:
                          >list.files()
                          [1] "CITATION" "DESCRIPTION"
                          [3] "DEXSeq note 11.11.13.odt" "DEXSeq_1.8.0.tar"
                          [5] "doc" "E1.count"
                          [7] "E2.count" "E3.count"
                          [9] "F1.count" "F2.count"
                          [11] "F3.count" "help"
                          [13] "html" "INDEX"
                          [15] "Meta" "NAMESPACE"
                          [17] "NEWS" "protein_coding_flattened.gff"
                          [19] "python_scripts" "R"

                          head -10 E1.count
                          ENSMUSG00000000001:001 1222
                          ENSMUSG00000000001:002 75
                          ENSMUSG00000000001:003 29
                          ENSMUSG00000000001:004 200
                          ENSMUSG00000000001:005 61
                          ENSMUSG00000000001:006 61
                          ENSMUSG00000000001:007 27
                          ENSMUSG00000000001:008 36
                          ENSMUSG00000000001:009 134
                          ENSMUSG00000000003:001 0

                          All files seem to be fine for me. I dont know whats going on

                          Comment

                          • dpryan
                            Devon Ryan
                            • Jul 2011
                            • 3478

                            #14
                            I mentioned this over on biostars too, but the common cause of this (and the one that affected onyaw) is that the file names aren't actually characters. If you used cbind() at some point to create the sampleTable, then these are actually factors now, which won't work very well. If this is the case, I'll try to get the authors to clarify this in the vignette for the next update. If it affects more than one user in a week then it's probably a common issue.

                            Comment

                            • areyes
                              Senior Member
                              • Aug 2010
                              • 165

                              #15
                              Thanks for poiting this out! It indeed needed to be corrected and clarified in DEXSeq.

                              I have changed added a change in the function that checks that the count files are all characters. I have also change the vignette to specify a "as.character" for the count files specified in the data.frame, e.g.:

                              Code:
                              > ecs <- read.HTSeqCounts(
                              + as.character( sampleTable$countFile ),
                              + sampleTable,
                              + "Dmel_flattenend.gff" )

                              Comment

                              Latest Articles

                              Collapse

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Yesterday, 10:09 AM
                              0 responses
                              10 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-04-2026, 08:59 AM
                              0 responses
                              19 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              26 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 11:40 AM
                              0 responses
                              21 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...