Seqanswers Leaderboard Ad



No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Trying to apply overlap to each row of a "GRangesList"

    Dear Experts,

    I have a "GRangesList" object like this:

    GRangesList of length 1:
    GRanges with 700000 ranges and 1 metadata column:
    seqnames ranges strand | id
    <Rle> <IRanges> <Rle> | <factor>
    [1] chr1 [ 0, 10000] * | Factor1
    [2] chr1 [ 9600, 20000] * | Factor2
    [3] chr1 [ 24000, 30000] * | Factor2

    And I am trying to overlap each row of this List to a each row of a GRanges object like this:

    GRanges with 200 ranges and 1 metadata column:
    seqnames ranges strand | name
    <Rle> <IRanges> <Rle> | <factor>
    rs1 chr1 [0, 1] * | rs1
    rs2 chr1 [9700, 9701] * | rs2

    My goal is to get a data frame containing a count for the overlap of each of the GRanges object with each of the GRangesList like this:

    rs1_Factor1 0 0 0
    rs_Factor2 0 1 0

    I can do this for one value at a time of the Factors of GRangesList like this:

    hits=countOverlaps(obj1, objList[1], type=c("within"))

    But how do I apply this to each row of GRangesList?

    I have tried unsuccessfully with mapply (Error in (function (query, subject, maxgap = 0L, minoverlap = 1L, type = c("any") error in evaluating the argument 'query' in selecting a method for function 'countOverlaps': Error in dots[[1L]][[1L]] : this S4 class is not subsettable)

    Thanks so much!


  • #2
    You wanted lapply() rather than mapply:
    lapply(objList, function(x,y) {countOverlaps(y,x,type="within")}, obj1)
    Having said that, I'm pretty sure you can give countOverlaps() a GRangesList and that that'll be faster than applying a function.
    Last edited by dpryan; 09-27-2015, 11:23 PM. Reason: Forgot a ")"


    • #3
      Hi, thanks, the command as you have it gives me an error but this works: lapply(objList, function(x) countOverlaps(obj1, x, type=c("within")))

      However it does not give me what I am looking for because it loops through the lists and it gives me the overlap between each list element and the GRange object, but I would like to loop through each row of the list not each element...


      • #4
        I could try do this with a double loop with the first loop for each elements of the list and the second loop for each row of the list like this:

        for (i in names(objList)){
        for (j in length(objList[[i]])) {, objList[[i]][j,], type=c("within")))

        But is there really no better way?


        • #5
          Why not just:
          countOverlaps(obj1, unlist(objList), type="within")
          That would seem to give you what you want.


          • #6
            I had tried that but it doesn't work either... That gives me the overlap of obj1 and objList and not the overlap of the obj1 with each row of the list object (what I am trying tot get is the overlap of elements from obj1 with the first row of the first element in the list, then the overlap of the elements of obj1 with the second row of the first element in the list, etc...)
            Last edited by francy; 09-28-2015, 12:50 PM.


            • #7
              Just give a small example of what you would like (i.e., give an example GRanges object, a GRangesList object and the output you would like).


              • #8
                Yes sure, I am sorry for not having done that before.
                Here are the objList and obj1:

                TYPE1 <- GRanges(seqnames = c("chr1", "chr1", "chr1"), ranges=IRanges(start=c(0,9600,24000),
                end=c(10000, 20000, 30000)), id=c("Factor1", "Factor2", "Factor3"))

                TYPE2 <- GRanges(seqnames = c("chr2", "chr2", "chr2"), ranges=IRanges(start=c(0,9000,14000),
                end=c(13000, 20500, 30100)), id=c("Factor1", "Factor2", "Factor3"))

                objList <- GRangesList("TYPE1" = TYPE1, "TYPE2" = TYPE2)

                obj1 <- GRanges(seqnames = c("chr1", "chr1"), ranges=IRanges(start=c(0,9700), end=c(1, 9701)), id=c("rs1", "rs2"))

                And this is what I have working now with loops (to find the overlap of obj1 with each row of objList):


                for (name in nameList) {
                id= objList[[name]]$id
                for (i in 1:length(id)) {
      , objList[[name]][i,], type=c("within")))

                There must be a better way though, since the loops take a very long time...
                Thank you!


                • #9
                  I'm not sure what the point of that is, but
                  findOverlaps(obj1, unlist(objList), type='within')
                  would give you the same information faster. The output is also easier to deal with than what will likely be a gigantic and unwieldy (not to mention sparse) data frame.


                  • #10
                    Hi again...the output of that again gives me this, which is not what I am looking for.

                    > findOverlaps(obj1, unlist(objList), type='within')
                    Hits object with 3 hits and 0 metadata columns:
                    queryHits subjectHits
                    <integer> <integer>
                    [1] 1 1
                    [2] 2 1
                    [3] 2 2
                    queryLength: 2
                    subjectLength: 6

                    This is what I am trying to obtain (please see the example in R above):

                    > output
                    Factor1_TYPE1 Factor2_TYPE1 Factor3_TYPE1 Factor1_TYPE2 Factor2_TYPE2 Factor3_TYPE2
                    rs1 1 0 0 0 0 0
                    rs2 1 1 0 0 0 0
                    Last edited by francy; 09-29-2015, 03:35 AM.


                    • #11
                      Yes, as I said, what I wrote produces the same information much faster. If you want the data frame then just make a matrix of 0s and change values to 1 according to the output of findOverlaps.


                      • #12
                        Hi dpryan, thank you, can you please explain better how I would go from the output of findOverlap or countOverlap with 3 entries indicating overlap with queryHits and 3 entries with overlap of subjectHits to the output with 6 entries indicating binary overlap with each of the queryHits? I am sorry for the confusion...thank you very much for your help.


                        • #13
                          Something like:

                          o <- findOverlaps(obj1, unlist(objList), type='within')
                          m <- matrix(0, nrow=length(obj1), ncol=length(unlist(objList)))
                          m[cbind(queryHits(o), subjectHits(o))] <- 1
                          Something along those lines.

                          Edit: BTW, you might need to use a sparse matrix, depending on how much memory you have and how large your objects are.


                          • #14
                            Ah I see! That is AMAZINGLY faster than my loop... Thank you so much dpryan for explaining this trick, very very thankful!!!


                            Latest Articles


                            • seqadmin
                              Exploring the Dynamics of the Tumor Microenvironment
                              by seqadmin

                              The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
                              07-08-2024, 03:19 PM
                            • seqadmin
                              Exploring Human Diversity Through Large-Scale Omics
                              by seqadmin

                              In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
                              06-25-2024, 06:43 AM





                            Topics Statistics Last Post
                            Started by seqadmin, Today, 06:53 AM
                            0 responses
                            Last Post seqadmin  
                            Started by seqadmin, 07-10-2024, 07:30 AM
                            0 responses
                            Last Post seqadmin  
                            Started by seqadmin, 07-03-2024, 09:45 AM
                            0 responses
                            Last Post seqadmin  
                            Started by seqadmin, 07-03-2024, 08:54 AM
                            0 responses
                            Last Post seqadmin