Seqanswers Leaderboard Ad

**dpryan** · 11-06-2013, 08:47 AM

That error is occurring when DEXSeq is trying to read in the counts files. Can you paste in a excerpt of "sampleTable"?

**onyaw** · 11-06-2013, 08:52 AM

Thanks - the table is below. when printing this i did see one small error (an extra comma) which i fixed, but still same issue.

> sampleTable
countFile condition libType
B6J_wt_thal_1 B6J_wt_thal_1.counts B6J wt
B6J_wt_thal_3 B6J_wt_thal_3.counts B6J wt
B6J_wt_thal_4 B6J_wt_thal_4.counts B6J wt
B6J_wt_ssctx_1 B6J_wt_ssctx_1.counts B6J wt
B6J_wt_ssctx_3 B6J_wt_ssctx_3.counts B6J wt
B6J_wt_ssctx_4 B6J_wt_ssctx_4.counts B6J wt
FeJ_wt_thal_2 FeJ_wt_thal_2.counts FeJ wt
FeJ_wt_thal_3 FeJ_wt_thal_3.counts FeJ wt
FeJ_wt_thal_4 FeJ_wt_thal_4.counts FeJ wt
FeJ_wt_ssctx_2 FeJ_wt_ssctx_2.counts FeJ wt
FeJ_wt_ssctx_3 FeJ_wt_ssctx_3.counts FeJ wt
FeJ_wt_ssctx_4 FeJ_wt_ssctx_4.counts FeJ wt
B6J_mut_thal_1 B6J_mut_thal_1.counts B6J mut
B6J_mut_thal_2 B6J_mut_thal_2.counts B6J mut
B6J_mut_thal_3 B6J_mut_thal_3.counts B6J mut
B6J_mut_ssctx_1 B6J_mut_ssctx_1.counts B6J mut
B6J_mut_ssctx_2 B6J_mut_ssctx_2.counts B6J mut
B6J_mut_ssctx_3 B6J_mut_ssctx_3.counts B6J mut
FeJ_mut_thal_1 FeJ_mut_thal_1.counts FeJ mut
FeJ_mut_thal_2 FeJ_mut_thal_2.counts FeJ mut
FeJ_mut_thal_3 FeJ_mut_thal_3.counts FeJ mut
FeJ_mut_ssctx_1 FeJ_mut_ssctx_1.counts FeJ mut
FeJ_mut_ssctx_2 FeJ_mut_ssctx_2.counts FeJ mut
FeJ_mut_ssctx_3 FeJ_mut_ssctx_3.counts FeJ mut

**dpryan** · 11-06-2013, 09:01 AM

By chance, does

Code:

typeof(sampleTable$countFile)

return something other than "character"? BTW, you don't have to use "condition" and "libType" as column names. You might find "strain" and "genotype" more meaningful

**onyaw** · 11-06-2013, 09:03 AM

well, that returns 'integer' not character.

i realized about the column names (and obviously i have add'l conditions). but in trying to get it to work for me i thought i would be as literal as possible.

**dpryan** · 11-06-2013, 09:17 AM

At some point you converted your file names to factors, probably by using cbind(). Something like

Code:

sampleTable$countFile <- levels(sampleTable$countFile)[sampleTable$countFile]

should fix that problem. In the future, don't use cbind() to create the sampleTable, but instead:

Code:

sampleTable <- data.frame(countFiles=list.files("counts$"),
    strain=factor(c(rep(c(rep("B6J",6), rep("FEJ",6)),2))),
    genotype=factor(c(rep("WT",12), rep("MUT",12))))

**onyaw** · 11-06-2013, 09:26 AM

thanks - the table was ultimately constructed using literally the example in the pdf file, with names substituted. although originally i made it on my desktop as a csv file.

i ran 'levels' as you suggested - it got further but now i'm getting this:

> ecs <- read.HTSeqCounts( sampleTable$countFile,sampleTable,"C57BL6J_dexseq.gff" )
Error: all(unlist(lapply(design, class)) == "factor") is not TRUE

**dpryan** · 11-06-2013, 09:38 AM

I should have mentioned that originally

Code:

 countFiles <- sampleTable$countFile
design <- sampleTable[,-1]
ecs <- read.HTSeqCounts(countFiles,design,"C57BL6J_dexseq.gff" )

or something like that will probably work.

**onyaw** · 11-06-2013, 10:48 AM

Devon, thanks again. I tried that and got the same error. But maybe its because I have two conditions now ("condition" and "libType") and the value in our example was "-1" So I changed it to "-2" and it went without error! So i'll move on to the next steps...wish me smoothness, please!!

btw if I have multiple conditions, but that I want to test separately, do I need to specify the design formula beyond the design specified above? or am i better off making a separate sample table for each 'experiment' just looking at one condition/sample table at a time?

**dpryan** · 11-06-2013, 01:33 PM

The -1 just removes the first column (the count file names) and -2 would remove the second (mouse strain), which you probably want to keep. So, I'm a bit surprised that didn't then produce an error (though perhaps I'm incorrectly visualizing the dataframe that you're using).

Anyway, I would recommend that you keep the full design when you do the analyses. Mouse strains have enough behavioral and other differences that, if unaccounted for, will end up killing your statistical power (all of the variances will be larger than need be). You could just remove the samples you don't need, but that will also decrease power. So leaving everything in is your best bet.

**thanhhoang** · 11-10-2013, 11:25 PM

Hi Onyaw, Dpryan and everyone!
I have a similar problem when running read.HTSeqCounts. Could you guys please help me with that?
I counted the 6 SAM files from GSNAP output using dexseq_count.py by following DEXSeq manual, then I made sample table. Here is what I did:
>sampleTable <- data.frame(row.names = c( "E1", "E2", "E3","F1", "F2", "F3" ), countFile = c( "E1.count", "E2.count", "E3.count", "F1.counts","F2.count", "F3.count" ), condition = c( "E", "E", "E",
+ "F", "F", "F" ))
>sampleTable
countFile condition
E1 E1.count E
E2 E2.count E
E3 E3.count E
F1 F1.counts F
F2 F2.count F
F3 F3.count F
>ecs <-read.HTSeqCounts(sampleTable$countFile,sampleTable,"protein_coding_flattened.gff")
Error in read.table(x, header = FALSE, stringsAsFactors = FALSE) :
'file' must be a character string or connection
I really appreciate your help.
Thanh

**dpryan** · 11-11-2013, 03:52 AM

I saw your post on biostars first, so I replied there.

**thanhhoang** · 11-11-2013, 06:21 AM

Hi dpryan,
Thank you
I just replied in Biostar. Here is what I just did:
>list.files()
[1] "CITATION" "DESCRIPTION"
[3] "DEXSeq note 11.11.13.odt" "DEXSeq_1.8.0.tar"
[5] "doc" "E1.count"
[7] "E2.count" "E3.count"
[9] "F1.count" "F2.count"
[11] "F3.count" "help"
[13] "html" "INDEX"
[15] "Meta" "NAMESPACE"
[17] "NEWS" "protein_coding_flattened.gff"
[19] "python_scripts" "R"

head -10 E1.count
ENSMUSG00000000001:001 1222
ENSMUSG00000000001:002 75
ENSMUSG00000000001:003 29
ENSMUSG00000000001:004 200
ENSMUSG00000000001:005 61
ENSMUSG00000000001:006 61
ENSMUSG00000000001:007 27
ENSMUSG00000000001:008 36
ENSMUSG00000000001:009 134
ENSMUSG00000000003:001 0
All files seem to be fine for me. I dont know whats going on

**dpryan** · 11-11-2013, 06:27 AM

I mentioned this over on biostars too, but the common cause of this (and the one that affected onyaw) is that the file names aren't actually characters. If you used cbind() at some point to create the sampleTable, then these are actually factors now, which won't work very well. If this is the case, I'll try to get the authors to clarify this in the vignette for the next update. If it affects more than one user in a week then it's probably a common issue.

**areyes** · 11-12-2013, 01:05 AM

Thanks for poiting this out! It indeed needed to be corrected and clarified in DEXSeq.

I have changed added a change in the function that checks that the count files are all characters. I have also change the vignette to specify a "as.character" for the count files specified in the data.frame, e.g.:

Code:

> ecs <- read.HTSeqCounts(
+ as.character( sampleTable$countFile ),
+ sampleTable,
+ "Dmel_flattenend.gff" )

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 18 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 17 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 49 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

DEXseq file loading flattened

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News