Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DEXseq file loading flattened

    Hi, I'm using DEXseq for the first time and successfully created the gff file and .counts files with the py scripts, and sampleTable file from within R, as specified, but am getting the following error when creating an ecs.

    i do have all the files in the same wd on my desktop, although the gff and counts files were created on a different machine and moved over.

    sampleTable$countFile does read back the correct number of levels and file names

    any ideas; is it not recognizing the gff?

    thx

    onyaw

    > ecs <- read.HTSeqCounts( sampleTable$countFile,sampleTable,"C57BL6J_dexseq.gff" )

    Error in read.table(x, header = FALSE, stringsAsFactors = FALSE) :
    'file' must be a character string or connection

  • #2
    That error is occurring when DEXSeq is trying to read in the counts files. Can you paste in a excerpt of "sampleTable"?

    Comment


    • #3
      Thanks - the table is below. when printing this i did see one small error (an extra comma) which i fixed, but still same issue.

      > sampleTable
      countFile condition libType
      B6J_wt_thal_1 B6J_wt_thal_1.counts B6J wt
      B6J_wt_thal_3 B6J_wt_thal_3.counts B6J wt
      B6J_wt_thal_4 B6J_wt_thal_4.counts B6J wt
      B6J_wt_ssctx_1 B6J_wt_ssctx_1.counts B6J wt
      B6J_wt_ssctx_3 B6J_wt_ssctx_3.counts B6J wt
      B6J_wt_ssctx_4 B6J_wt_ssctx_4.counts B6J wt
      FeJ_wt_thal_2 FeJ_wt_thal_2.counts FeJ wt
      FeJ_wt_thal_3 FeJ_wt_thal_3.counts FeJ wt
      FeJ_wt_thal_4 FeJ_wt_thal_4.counts FeJ wt
      FeJ_wt_ssctx_2 FeJ_wt_ssctx_2.counts FeJ wt
      FeJ_wt_ssctx_3 FeJ_wt_ssctx_3.counts FeJ wt
      FeJ_wt_ssctx_4 FeJ_wt_ssctx_4.counts FeJ wt
      B6J_mut_thal_1 B6J_mut_thal_1.counts B6J mut
      B6J_mut_thal_2 B6J_mut_thal_2.counts B6J mut
      B6J_mut_thal_3 B6J_mut_thal_3.counts B6J mut
      B6J_mut_ssctx_1 B6J_mut_ssctx_1.counts B6J mut
      B6J_mut_ssctx_2 B6J_mut_ssctx_2.counts B6J mut
      B6J_mut_ssctx_3 B6J_mut_ssctx_3.counts B6J mut
      FeJ_mut_thal_1 FeJ_mut_thal_1.counts FeJ mut
      FeJ_mut_thal_2 FeJ_mut_thal_2.counts FeJ mut
      FeJ_mut_thal_3 FeJ_mut_thal_3.counts FeJ mut
      FeJ_mut_ssctx_1 FeJ_mut_ssctx_1.counts FeJ mut
      FeJ_mut_ssctx_2 FeJ_mut_ssctx_2.counts FeJ mut
      FeJ_mut_ssctx_3 FeJ_mut_ssctx_3.counts FeJ mut

      Comment


      • #4
        By chance, does
        Code:
        typeof(sampleTable$countFile)
        return something other than "character"? BTW, you don't have to use "condition" and "libType" as column names. You might find "strain" and "genotype" more meaningful

        Comment


        • #5
          well, that returns 'integer' not character.

          i realized about the column names (and obviously i have add'l conditions). but in trying to get it to work for me i thought i would be as literal as possible.

          Comment


          • #6
            At some point you converted your file names to factors, probably by using cbind(). Something like
            Code:
            sampleTable$countFile <- levels(sampleTable$countFile)[sampleTable$countFile]
            should fix that problem. In the future, don't use cbind() to create the sampleTable, but instead:
            Code:
            sampleTable <- data.frame(countFiles=list.files("counts$"),
                strain=factor(c(rep(c(rep("B6J",6), rep("FEJ",6)),2))),
                genotype=factor(c(rep("WT",12), rep("MUT",12))))

            Comment


            • #7
              thanks - the table was ultimately constructed using literally the example in the pdf file, with names substituted. although originally i made it on my desktop as a csv file.

              i ran 'levels' as you suggested - it got further but now i'm getting this:

              > ecs <- read.HTSeqCounts( sampleTable$countFile,sampleTable,"C57BL6J_dexseq.gff" )
              Error: all(unlist(lapply(design, class)) == "factor") is not TRUE

              Comment


              • #8
                I should have mentioned that originally

                Code:
                 countFiles <- sampleTable$countFile
                design <- sampleTable[,-1]
                ecs <- read.HTSeqCounts(countFiles,design,"C57BL6J_dexseq.gff" )
                or something like that will probably work.

                Comment


                • #9
                  Devon, thanks again. I tried that and got the same error. But maybe its because I have two conditions now ("condition" and "libType") and the value in our example was "-1" So I changed it to "-2" and it went without error! So i'll move on to the next steps...wish me smoothness, please!!

                  btw if I have multiple conditions, but that I want to test separately, do I need to specify the design formula beyond the design specified above? or am i better off making a separate sample table for each 'experiment' just looking at one condition/sample table at a time?

                  Comment


                  • #10
                    The -1 just removes the first column (the count file names) and -2 would remove the second (mouse strain), which you probably want to keep. So, I'm a bit surprised that didn't then produce an error (though perhaps I'm incorrectly visualizing the dataframe that you're using).

                    Anyway, I would recommend that you keep the full design when you do the analyses. Mouse strains have enough behavioral and other differences that, if unaccounted for, will end up killing your statistical power (all of the variances will be larger than need be). You could just remove the samples you don't need, but that will also decrease power. So leaving everything in is your best bet.

                    Comment


                    • #11
                      Hi Onyaw, Dpryan and everyone!
                      I have a similar problem when running read.HTSeqCounts. Could you guys please help me with that?
                      I counted the 6 SAM files from GSNAP output using dexseq_count.py by following DEXSeq manual, then I made sample table. Here is what I did:
                      >sampleTable <- data.frame(row.names = c( "E1", "E2", "E3","F1", "F2", "F3" ), countFile = c( "E1.count", "E2.count", "E3.count", "F1.counts","F2.count", "F3.count" ), condition = c( "E", "E", "E",
                      + "F", "F", "F" ))
                      >sampleTable
                      countFile condition
                      E1 E1.count E
                      E2 E2.count E
                      E3 E3.count E
                      F1 F1.counts F
                      F2 F2.count F
                      F3 F3.count F
                      >ecs <-read.HTSeqCounts(sampleTable$countFile,sampleTable,"protein_coding_flattened.gff")

                      Error in read.table(x, header = FALSE, stringsAsFactors = FALSE) :
                      'file' must be a character string or connection

                      I really appreciate your help.
                      Thanh

                      Comment


                      • #12
                        I saw your post on biostars first, so I replied there.

                        Comment


                        • #13
                          Hi dpryan,
                          Thank you
                          I just replied in Biostar. Here is what I just did:
                          >list.files()
                          [1] "CITATION" "DESCRIPTION"
                          [3] "DEXSeq note 11.11.13.odt" "DEXSeq_1.8.0.tar"
                          [5] "doc" "E1.count"
                          [7] "E2.count" "E3.count"
                          [9] "F1.count" "F2.count"
                          [11] "F3.count" "help"
                          [13] "html" "INDEX"
                          [15] "Meta" "NAMESPACE"
                          [17] "NEWS" "protein_coding_flattened.gff"
                          [19] "python_scripts" "R"

                          head -10 E1.count
                          ENSMUSG00000000001:001 1222
                          ENSMUSG00000000001:002 75
                          ENSMUSG00000000001:003 29
                          ENSMUSG00000000001:004 200
                          ENSMUSG00000000001:005 61
                          ENSMUSG00000000001:006 61
                          ENSMUSG00000000001:007 27
                          ENSMUSG00000000001:008 36
                          ENSMUSG00000000001:009 134
                          ENSMUSG00000000003:001 0

                          All files seem to be fine for me. I dont know whats going on

                          Comment


                          • #14
                            I mentioned this over on biostars too, but the common cause of this (and the one that affected onyaw) is that the file names aren't actually characters. If you used cbind() at some point to create the sampleTable, then these are actually factors now, which won't work very well. If this is the case, I'll try to get the authors to clarify this in the vignette for the next update. If it affects more than one user in a week then it's probably a common issue.

                            Comment


                            • #15
                              Thanks for poiting this out! It indeed needed to be corrected and clarified in DEXSeq.

                              I have changed added a change in the function that checks that the count files are all characters. I have also change the vignette to specify a "as.character" for the count files specified in the data.frame, e.g.:

                              Code:
                              > ecs <- read.HTSeqCounts(
                              + as.character( sampleTable$countFile ),
                              + sampleTable,
                              + "Dmel_flattenend.gff" )

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              18 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              22 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              17 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X