Header Leaderboard Ad

Collapse

Error at Creating Count Table for DESeq2

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Error at Creating Count Table for DESeq2

    I have used Tophat-CuffDiff pipeline so far but I want to give a try for DESeq2. I have 2 conditions and 3 replicates for each, aim is to find the differentially expressed genes.

    For a couple of days, I am trying to use HTSeq to prepare my count files. I guess I did it but now I am stuck at creating the count table as the DESeq2 input.

    I didn't use R that much so far, so I am having difficulties. Here is the problem:

    Code:
    > library('DESeq2')
    Loading required package: GenomicRanges
    Loading required package: BiocGenerics
    Loading required package: parallel
    
    Attaching package: ‘BiocGenerics’
    
    The following objects are masked from ‘package:parallel’:
    
        clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, parApply, parCapply,
        parLapply, parLapplyLB, parRapply, parSapply, parSapplyLB
    
    The following object is masked from ‘package:stats’:
    
        xtabs
    
    The following objects are masked from ‘package:base’:
    
        anyDuplicated, append, as.data.frame, as.vector, cbind, colnames, duplicated, eval, evalq, Filter, Find,
        get, intersect, is.unsorted, lapply, Map, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
        pmin.int, Position, rank, rbind, Reduce, rep.int, rownames, sapply, setdiff, sort, table, tapply, union,
        unique, unlist
    
    Loading required package: IRanges
    Loading required package: XVector
    Loading required package: Rcpp
    Loading required package: RcppArmadillo
    
    > setwd("C:/Python27/SKMEL-5")
    > directory<-"C:/Python27/SKMEL-5/ALL"
    > sampleFiles <- grep("SKMEL-5",list.files(directory),value=TRUE)
    > sampleCondition<-c("KD","KD","KD","WT","WT","WT")
    > sampleTable<-data.frame(sampleName=sampleFiles, fileName=sampleFiles, condition=sampleCondition)
    > sampleTable
           sampleName        fileName condition
    1 SKMEL-5_I-1.txt SKMEL-5_I-1.txt        KD
    2 SKMEL-5_I-2.txt SKMEL-5_I-2.txt        KD
    3 SKMEL-5_I-3.txt SKMEL-5_I-3.txt        KD
    4 SKMEL-5_L-1.txt SKMEL-5_L-1.txt        WT
    5 SKMEL-5_L-2.txt SKMEL-5_L-2.txt        WT
    6 SKMEL-5_L-3.txt SKMEL-5_L-3.txt        WT
    > ddsHTSeq<-DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, directory=directory, design=~condition)
    Error in DESeqDataSetFromHTSeqCount(sampleTable = sampleTable, directory = directory,  : 
      Gene IDs (first column) differ between files.
    In addition: There were 36 warnings (use warnings() to see them)
    Here is the 36 warnings:

    Code:
    Warning messages:
    1: In read.table(file.path(directory, fn)) :
      line 1 appears to contain embedded nulls
    2: In read.table(file.path(directory, fn)) :
      line 2 appears to contain embedded nulls
    3: In read.table(file.path(directory, fn)) :
      line 3 appears to contain embedded nulls
    4: In read.table(file.path(directory, fn)) :
      line 4 appears to contain embedded nulls
    5: In read.table(file.path(directory, fn)) :
      line 5 appears to contain embedded nulls
    6: In scan(file = file, what = what, sep = sep, quote = quote,  ... :
      embedded nul(s) found in input
    7: In read.table(file.path(directory, fn)) :
      line 1 appears to contain embedded nulls
    8: In read.table(file.path(directory, fn)) :
      line 2 appears to contain embedded nulls
    9: In read.table(file.path(directory, fn)) :
      line 3 appears to contain embedded nulls
    10: In read.table(file.path(directory, fn)) :
      line 4 appears to contain embedded nulls
    11: In read.table(file.path(directory, fn)) :
      line 5 appears to contain embedded nulls
    12: In scan(file = file, what = what, sep = sep, quote = quote,  ... :
      embedded nul(s) found in input
    13: In read.table(file.path(directory, fn)) :
      line 1 appears to contain embedded nulls
    14: In read.table(file.path(directory, fn)) :
      line 2 appears to contain embedded nulls
    15: In read.table(file.path(directory, fn)) :
      line 3 appears to contain embedded nulls
    16: In read.table(file.path(directory, fn)) :
      line 4 appears to contain embedded nulls
    17: In read.table(file.path(directory, fn)) :
      line 5 appears to contain embedded nulls
    18: In scan(file = file, what = what, sep = sep, quote = quote,  ... :
      embedded nul(s) found in input
    19: In read.table(file.path(directory, fn)) :
      line 1 appears to contain embedded nulls
    20: In read.table(file.path(directory, fn)) :
      line 2 appears to contain embedded nulls
    21: In read.table(file.path(directory, fn)) :
      line 3 appears to contain embedded nulls
    22: In read.table(file.path(directory, fn)) :
      line 4 appears to contain embedded nulls
    23: In read.table(file.path(directory, fn)) :
      line 5 appears to contain embedded nulls
    24: In scan(file = file, what = what, sep = sep, quote = quote,  ... :
      embedded nul(s) found in input
    25: In read.table(file.path(directory, fn)) :
      line 1 appears to contain embedded nulls
    26: In read.table(file.path(directory, fn)) :
      line 2 appears to contain embedded nulls
    27: In read.table(file.path(directory, fn)) :
      line 3 appears to contain embedded nulls
    28: In read.table(file.path(directory, fn)) :
      line 4 appears to contain embedded nulls
    29: In read.table(file.path(directory, fn)) :
      line 5 appears to contain embedded nulls
    30: In scan(file = file, what = what, sep = sep, quote = quote,  ... :
      embedded nul(s) found in input
    31: In read.table(file.path(directory, fn)) :
      line 1 appears to contain embedded nulls
    32: In read.table(file.path(directory, fn)) :
      line 2 appears to contain embedded nulls
    33: In read.table(file.path(directory, fn)) :
      line 3 appears to contain embedded nulls
    34: In read.table(file.path(directory, fn)) :
      line 4 appears to contain embedded nulls
    35: In read.table(file.path(directory, fn)) :
      line 5 appears to contain embedded nulls
    36: In scan(file = file, what = what, sep = sep, quote = quote,  ... :
      embedded nul(s) found in input
    Because it says "Gene IDs (first column) differ between files.", I have checked each file but all have the same number of rows and I guess the first column is same for all (well, I have used the same gtf file for all of them, so it must be).

    I know the problem is at a very basic stage but I have no clue as an R-noob.
    Last edited by sazz; 03-23-2014, 04:27 AM.

  • #2
    Solved, my files were not in Tab Delimited format :/

    Comment


    • #3
      Additional answer

      I got the same issue and found your post helpful. To solve I opened the file in notepad and changed the encoding from Unicode to ANSI and then it imported cleanly into R.

      Comment

      Working...
      X