Announcement

Collapse
No announcement yet.

GAGE pathway analysis (R/Bioconductor): summarizeOverlaps error

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GAGE pathway analysis (R/Bioconductor): summarizeOverlaps error

    I am trying to use the GAGE pathway analysis tutorial with 3 samples of paired-end RNA-Seq reads (placed in a "tophat_all" directory) that I mapped with TopHat (.bam files have been indexed).

    An error occurred at the read count step:

    Code:
    > library(TxDb.Hsapiens.UCSC.hg19.knownGene)
    > exByGn <- exonsBy(TxDb.Hsapiens.UCSC.hg19.knownGene, "gene")
    > library(GenomicAlignments)
    > fls <- list.files("tophat_all/", pattern="bam$", full.names =T)
    > bamfls <- BamFileList(fls)
    > flag <- scanBamFlag(isSecondaryAlignment=FALSE, isProperPair=TRUE)
    > param <- ScanBamParam(flag=flag)
    > gnCnt <- summarizeOverlaps(exByGn, bamfls, mode="Union", ignore.strand=TRUE, single.end=FALSE, param=param)
    Error in validObject(.Object) : 
      invalid class "SummarizedExperiment0" object: 'assays' nrow differs from 'mcols' nrow

    When I tried the same with only one sample I get a different error:
    Code:
    > gnCnt <- summarizeOverlaps(exByGn, bamfls, mode="Union", ignore.strand=TRUE, single.end=FALSE, param=param)
    Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x),  : 
      'data' must be of a vector type, was 'NULL'
    Could someone explain me the error?
    Does the error come from my reads (potential unpaired reads, different length...)?

    Thanks !

    Notes:
    (1) I am using the last version of R, Biconductor and its packages. Could it come from the newer versions of the packages that would not fit the tutorial anymore?

    Code:
    > sessionInfo()
    R version 3.2.2 (2015-08-14)
    Platform: x86_64-pc-linux-gnu (64-bit)
    
    locale:
    [1] C
    
    attached base packages:
    [1] stats4    parallel  stats     graphics  grDevices utils     datasets 
    [8] methods   base     
    
    other attached packages:
     [1] GenomicAlignments_1.6.1                
     [2] Rsamtools_1.22.0                       
     [3] Biostrings_2.38.0                      
     [4] XVector_0.10.0                         
     [5] SummarizedExperiment_1.0.1             
     [6] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
     [7] GenomicFeatures_1.22.4                 
     [8] AnnotationDbi_1.32.0                   
     [9] Biobase_2.30.0                         
    [10] GenomicRanges_1.22.1                   
    [11] GenomeInfoDb_1.6.1                     
    [12] IRanges_2.4.1                          
    [13] S4Vectors_0.8.2                        
    [14] BiocGenerics_0.16.1                    
    
    loaded via a namespace (and not attached):
     [1] zlibbioc_1.16.0      BiocParallel_1.4.0   tools_3.2.2         
     [4] DBI_0.3.1            lambda.r_1.1.7       futile.logger_1.4.1 
     [7] rtracklayer_1.30.1   futile.options_1.0.0 bitops_1.0-6        
    [10] RCurl_1.95-4.7       biomaRt_2.26.0       RSQLite_1.0.0       
    [13] XML_3.98-1.3
    (2) Here is the traceback after the error occurred:
    Code:
    > traceback()
    26: stop(msg, ": ", errors, domain = NA)
    25: validObject(.Object)
    24: initialize(value, ...)
    23: initialize(value, ...)
    22: new("SummarizedExperiment0", NAMES = names, elementMetadata = rowData, 
            colData = colData, assays = assays, metadata = as.list(metadata))
    21: new_SummarizedExperiment0([email protected], names([email protected]), 
            mcols([email protected]), [email protected], [email protected])
    20: asMethod(object)
    19: as(object, superClass)
    18: slot(x, "assays")
    17: is(slot(x, "assays"), "Assays")
    16: .valid.SummarizedExperiment0.assays_current(x)
    15: valid.func(object)
    14: validityMethod(as(object, superClass))
    13: anyStrings(validityMethod(as(object, superClass)))
    12: validObject(.Object)
    11: initialize(value, ...)
    10: initialize(value, ...)
    9: new("RangedSummarizedExperiment", rowRanges = rowRanges, colData = colData, 
           assays = assays, elementMetadata = elementMetadata, metadata = as.list(metadata))
    8: .new_RangedSummarizedExperiment(assays, rowRanges, colData, metadata)
    7: .local(assays, ...)
    6: SummarizedExperiment(assays = SimpleList(counts = counts), rowRanges = features, 
           colData = colData)
    5: SummarizedExperiment(assays = SimpleList(counts = counts), rowRanges = features, 
           colData = colData)
    4: .dispatchBamFiles(features, reads, mode, match.arg(algorithm), 
           ignore.strand, inter.feature = inter.feature, singleEnd = singleEnd, 
           fragments = fragments, param = param, preprocess.reads = preprocess.reads, 
           ...)
    3: .local(features, reads, mode, algorithm, ignore.strand, ...)
    2: summarizeOverlaps(exByGn, bamfls, mode = "Union", ignore.strand = TRUE, 
           single.end = FALSE, param = param)
    1: summarizeOverlaps(exByGn, bamfls, mode = "Union", ignore.strand = TRUE, 
           single.end = FALSE, param = param)
    Last edited by user 31888; 11-16-2015, 05:06 PM.

  • #2
    This error suggests that one of your two input arguments is empty. Try the following two commands to see if anything looks odd:
    Code:
    > str(exByGn)
    > str(bamfls)

    Comment


    • #3
      Thanks ginger for your reply !

      Here are the outputs:
      Code:
      > str(exByGn)
      Formal class 'GRangesList' [package "GenomicRanges"] with 5 slots
        [email protected] unlistData     :Formal class 'GRanges' [package "GenomicRanges"] with 6 slots
        .. .. [email protected] seqnames       :Formal class 'Rle' [package "S4Vectors"] with 4 slots
        .. .. .. .. [email protected] values         : Factor w/ 93 levels "chr1","chr2",..: 19 8 20 18 1 23 3 14 15 11 ...
        .. .. .. .. [email protected] lengths        : int [1:19380] 15 2 12 17 17 4 1 11 10 21 ...
        .. .. .. .. [email protected] elementMetadata: NULL
        .. .. .. .. [email protected] metadata       : list()
        .. .. [email protected] ranges         :Formal class 'IRanges' [package "IRanges"] with 6 slots
        .. .. .. .. [email protected] start          : int [1:272776] 58858172 58858719 58859832 58860934 58861736 58862757 58863649 58864294 58864658 58864770 ...
        .. .. .. .. [email protected] width          : int [1:272776] 224 288 663 1084 282 297 273 270 36 96 ...
        .. .. .. .. [email protected] NAMES          : NULL
        .. .. .. .. [email protected] elementType    : chr "integer"
        .. .. .. .. [email protected] elementMetadata: NULL
        .. .. .. .. [email protected] metadata       : list()
        .. .. [email protected] strand         :Formal class 'Rle' [package "S4Vectors"] with 4 slots
        .. .. .. .. [email protected] values         : Factor w/ 3 levels "+","-","*": 2 1 2 1 2 1 2 1 2 1 ...
        .. .. .. .. [email protected] lengths        : int [1:11717] 15 2 46 5 11 87 1 4 1 8 ...
        .. .. .. .. [email protected] elementMetadata: NULL
        .. .. .. .. [email protected] metadata       : list()
        .. .. [email protected] elementMetadata:Formal class 'DataFrame' [package "S4Vectors"] with 6 slots
        .. .. .. .. [email protected] rownames       : NULL
        .. .. .. .. [email protected] nrows          : int 272776
        .. .. .. .. [email protected] listData       :List of 2
        .. .. .. .. .. ..$ exon_id  : int [1:272776] 250809 250810 250811 250812 250813 250814 250815 250816 250817 250818 ...
        .. .. .. .. .. ..$ exon_name: chr [1:272776] NA NA NA NA ...
        .. .. .. .. [email protected] elementType    : chr "ANY"
        .. .. .. .. [email protected] elementMetadata: NULL
        .. .. .. .. [email protected] metadata       : list()
        .. .. [email protected] seqinfo        :Formal class 'Seqinfo' [package "GenomeInfoDb"] with 4 slots
        .. .. .. .. [email protected] seqnames   : chr [1:93] "chr1" "chr2" "chr3" "chr4" ...
        .. .. .. .. [email protected] seqlengths : int [1:93] 249250621 243199373 198022430 191154276 180915260 171115067 159138663 146364022 141213431 135534747 ...
        .. .. .. .. [email protected] is_circular: logi [1:93] NA NA NA NA NA NA ...
        .. .. .. .. [email protected] genome     : chr [1:93] "hg19" "hg19" "hg19" "hg19" ...
        .. .. [email protected] metadata       : list()
        [email protected] elementMetadata:Formal class 'DataFrame' [package "S4Vectors"] with 6 slots
        .. .. [email protected] rownames       : NULL
        .. .. [email protected] nrows          : int 23459
        .. .. [email protected] listData       : Named list()
        .. .. [email protected] elementType    : chr "ANY"
        .. .. [email protected] elementMetadata: NULL
        .. .. [email protected] metadata       : list()
        [email protected] partitioning   :Formal class 'PartitioningByEnd' [package "IRanges"] with 5 slots
        .. .. [email protected] end            : int [1:23459] 15 17 29 46 63 67 68 79 89 110 ...
        .. .. [email protected] NAMES          : chr [1:23459] "1" "10" "100" "1000" ...
        .. .. [email protected] elementType    : chr "integer"
        .. .. [email protected] elementMetadata: NULL
        .. .. [email protected] metadata       : list()
        [email protected] elementType    : chr "GRanges"
        [email protected] metadata       :List of 1
        .. ..$ genomeInfo:List of 19
        .. .. ..$ Db type                                 : chr "TxDb"
        .. .. ..$ Supporting package                      : chr "GenomicFeatures"
        .. .. ..$ Data source                             : chr "UCSC"
        .. .. ..$ Genome                                  : chr "hg19"
        .. .. ..$ Organism                                : chr "Homo sapiens"
        .. .. ..$ Taxonomy ID                             : chr "9606"
        .. .. ..$ UCSC Table                              : chr "knownGene"
        .. .. ..$ Resource URL                            : chr "http://genome.ucsc.edu/"
        .. .. ..$ Type of Gene ID                         : chr "Entrez Gene ID"
        .. .. ..$ Full dataset                            : chr "yes"
        .. .. ..$ miRBase build ID                        : chr "GRCh37"
        .. .. ..$ transcript_nrow                         : chr "82960"
        .. .. ..$ exon_nrow                               : chr "289969"
        .. .. ..$ cds_nrow                                : chr "237533"
        .. .. ..$ Db created by                           : chr "GenomicFeatures package from Bioconductor"
        .. .. ..$ Creation time                           : chr "2015-10-07 18:11:28 +0000 (Wed, 07 Oct 2015)"
        .. .. ..$ GenomicFeatures version at creation time: chr "1.21.30"
        .. .. ..$ RSQLite version at creation time        : chr "1.0.0"
        .. .. ..$ DBSCHEMAVERSION                         : chr "1.1"
      > str(bamfls)
      Formal class 'BamFileList' [package "Rsamtools"] with 4 slots
        [email protected] listData       :List of 3
        .. ..$ sample_1.bam :Reference class 'BamFile' [package "Rsamtools"] with 8 fields
        .. .. ..$ .extptr         :<externalptr> 
        .. .. ..$ path            : chr "tophat_all//sample_1.bam"
        .. .. ..$ index           : chr "tophat_all//sample_1.bam.bai"
        .. .. ..$ yieldSize       : int NA
        .. .. ..$ obeyQname       : logi FALSE
        .. .. ..$ asMates         : logi FALSE
        .. .. ..$ qnamePrefixEnd  : chr NA
        .. .. ..$ qnameSuffixStart: chr NA
        .. .. ..and 14 methods.
        .. ..$ sample_2.bam    :Reference class 'BamFile' [package "Rsamtools"] with 8 fields
        .. .. ..$ .extptr         :<externalptr> 
        .. .. ..$ path            : chr "tophat_all//sample_2.bam"
        .. .. ..$ index           : chr "tophat_all//sample_2.bam.bai"
        .. .. ..$ yieldSize       : int NA
        .. .. ..$ obeyQname       : logi FALSE
        .. .. ..$ asMates         : logi FALSE
        .. .. ..$ qnamePrefixEnd  : chr NA
        .. .. ..$ qnameSuffixStart: chr NA
        .. .. ..and 14 methods.
        .. ..$ sample_3.bam:Reference class 'BamFile' [package "Rsamtools"] with 8 fields
        .. .. ..$ .extptr         :<externalptr> 
        .. .. ..$ path            : chr "tophat_all//sample_3.bam"
        .. .. ..$ index           : chr "tophat_all//sample_3.bam.bai"
        .. .. ..$ yieldSize       : int NA
        .. .. ..$ obeyQname       : logi FALSE
        .. .. ..$ asMates         : logi FALSE
        .. .. ..$ qnamePrefixEnd  : chr NA
        .. .. ..$ qnameSuffixStart: chr NA
        .. .. ..and 14 methods.
        [email protected] elementType    : chr "BamFile"
        [email protected] elementMetadata: NULL
        [email protected] metadata       : list()
      Last edited by user 31888; 11-17-2015, 01:27 PM. Reason: corrected from 'sample_2.bai' to 'sample_2.bam.ai'

      Comment


      • #4
        There's something odd about sample_2.bam:

        Code:
          .. .. ..$ path            : chr "GAGE//sample_2.bam"
          .. .. ..$ index           : chr "GAGE//sample_2.bai"
        I think this should be:

        Code:
          .. .. ..$ path            : chr "GAGE//sample_2.bam"
          .. .. ..$ index           : chr "GAGE//sample_2[b].bam.[/b]bai"
        If that's not the problem, I can't see anything else obviously out of place that might indicate an empty list.

        Comment


        • #5
          My mistake, sorry.

          It is actually 'sample_2.bam.bai' (I edited my previous post)

          Comment


          • #6
            Stupid thought: could it be something like EOF or other formatting issue that differs between my sample.bam files and TxDb.Hsapiens.UCSC.hg19.knownGene?

            (I am using a Linux cluster)

            Comment


            • #7
              Hello everyone

              Hi! user 31888, Does your issue being solved? if it does ,colud you please provide me some advice? I meet the same question ,Thank you
              Last edited by Enmity_LD; 01-19-2016, 07:01 PM.

              Comment


              • #8
                maybe see:
                https://support.bioconductor.org/p/74677/#75002

                But I ended up doing a fresh install of pretty much everything (R, Python, Bioconductor and associated packages), and it worked.

                Comment

                Working...
                X