Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Trying to apply overlap to each row of a "GRangesList"

    Dear Experts,

    I have a "GRangesList" object like this:

    GRangesList of length 1:
    $TYPE1
    GRanges with 700000 ranges and 1 metadata column:
    seqnames ranges strand | id
    <Rle> <IRanges> <Rle> | <factor>
    [1] chr1 [ 0, 10000] * | Factor1
    [2] chr1 [ 9600, 20000] * | Factor2
    [3] chr1 [ 24000, 30000] * | Factor2



    And I am trying to overlap each row of this List to a each row of a GRanges object like this:

    GRanges with 200 ranges and 1 metadata column:
    seqnames ranges strand | name
    <Rle> <IRanges> <Rle> | <factor>
    rs1 chr1 [0, 1] * | rs1
    rs2 chr1 [9700, 9701] * | rs2


    My goal is to get a data frame containing a count for the overlap of each of the GRanges object with each of the GRangesList like this:

    rs1_Factor1 0 0 0
    rs_Factor2 0 1 0

    I can do this for one value at a time of the Factors of GRangesList like this:

    hits=countOverlaps(obj1, objList[1], type=c("within"))

    But how do I apply this to each row of GRangesList?

    I have tried unsuccessfully with mapply (Error in (function (query, subject, maxgap = 0L, minoverlap = 1L, type = c("any") error in evaluating the argument 'query' in selecting a method for function 'countOverlaps': Error in dots[[1L]][[1L]] : this S4 class is not subsettable)


    Thanks so much!

    -fra

  • #2
    You wanted lapply() rather than mapply:
    Code:
    lapply(objList, function(x,y) {countOverlaps(y,x,type="within")}, obj1)
    Having said that, I'm pretty sure you can give countOverlaps() a GRangesList and that that'll be faster than applying a function.
    Last edited by dpryan; 09-27-2015, 11:23 PM. Reason: Forgot a ")"

    Comment


    • #3
      Hi, thanks, the command as you have it gives me an error but this works: lapply(objList, function(x) countOverlaps(obj1, x, type=c("within")))

      However it does not give me what I am looking for because it loops through the lists and it gives me the overlap between each list element and the GRange object, but I would like to loop through each row of the list not each element...

      Comment


      • #4
        I could try do this with a double loop with the first loop for each elements of the list and the second loop for each row of the list like this:

        for (i in names(objList)){
        for (j in length(objList[[i]])) {
        t=as.data.frame(countOverlaps(obj1, objList[[i]][j,], type=c("within")))
        }

        But is there really no better way?

        Comment


        • #5
          Why not just:
          Code:
          countOverlaps(obj1, unlist(objList), type="within")
          That would seem to give you what you want.

          Comment


          • #6
            I had tried that but it doesn't work either... That gives me the overlap of obj1 and objList and not the overlap of the obj1 with each row of the list object (what I am trying tot get is the overlap of elements from obj1 with the first row of the first element in the list, then the overlap of the elements of obj1 with the second row of the first element in the list, etc...)
            Last edited by francy; 09-28-2015, 12:50 PM.

            Comment


            • #7
              Just give a small example of what you would like (i.e., give an example GRanges object, a GRangesList object and the output you would like).

              Comment


              • #8
                Yes sure, I am sorry for not having done that before.
                Here are the objList and obj1:

                TYPE1 <- GRanges(seqnames = c("chr1", "chr1", "chr1"), ranges=IRanges(start=c(0,9600,24000),
                end=c(10000, 20000, 30000)), id=c("Factor1", "Factor2", "Factor3"))

                TYPE2 <- GRanges(seqnames = c("chr2", "chr2", "chr2"), ranges=IRanges(start=c(0,9000,14000),
                end=c(13000, 20500, 30100)), id=c("Factor1", "Factor2", "Factor3"))

                objList <- GRangesList("TYPE1" = TYPE1, "TYPE2" = TYPE2)

                obj1 <- GRanges(seqnames = c("chr1", "chr1"), ranges=IRanges(start=c(0,9700), end=c(1, 9701)), id=c("rs1", "rs2"))


                And this is what I have working now with loops (to find the overlap of obj1 with each row of objList):

                nameList=names(objList)

                output=data.frame(row.names=c("rs1","rs2"))
                for (name in nameList) {
                id= objList[[name]]$id
                for (i in 1:length(id)) {
                dftemp=as.data.frame(countOverlaps(obj1, objList[[name]][i,], type=c("within")))
                output=cbind.data.frame(output,dftemp)
                }
                }

                There must be a better way though, since the loops take a very long time...
                Thank you!

                Comment


                • #9
                  I'm not sure what the point of that is, but
                  Code:
                  findOverlaps(obj1, unlist(objList), type='within')
                  would give you the same information faster. The output is also easier to deal with than what will likely be a gigantic and unwieldy (not to mention sparse) data frame.

                  Comment


                  • #10
                    Hi again...the output of that again gives me this, which is not what I am looking for.

                    > findOverlaps(obj1, unlist(objList), type='within')
                    Hits object with 3 hits and 0 metadata columns:
                    queryHits subjectHits
                    <integer> <integer>
                    [1] 1 1
                    [2] 2 1
                    [3] 2 2
                    -------
                    queryLength: 2
                    subjectLength: 6

                    This is what I am trying to obtain (please see the example in R above):

                    > output
                    Factor1_TYPE1 Factor2_TYPE1 Factor3_TYPE1 Factor1_TYPE2 Factor2_TYPE2 Factor3_TYPE2
                    rs1 1 0 0 0 0 0
                    rs2 1 1 0 0 0 0
                    Last edited by francy; 09-29-2015, 03:35 AM.

                    Comment


                    • #11
                      Yes, as I said, what I wrote produces the same information much faster. If you want the data frame then just make a matrix of 0s and change values to 1 according to the output of findOverlaps.

                      Comment


                      • #12
                        Hi dpryan, thank you, can you please explain better how I would go from the output of findOverlap or countOverlap with 3 entries indicating overlap with queryHits and 3 entries with overlap of subjectHits to the output with 6 entries indicating binary overlap with each of the queryHits? I am sorry for the confusion...thank you very much for your help.

                        Comment


                        • #13
                          Something like:

                          Code:
                          o <- findOverlaps(obj1, unlist(objList), type='within')
                          m <- matrix(0, nrow=length(obj1), ncol=length(unlist(objList)))
                          m[cbind(queryHits(o), subjectHits(o))] <- 1
                          Something along those lines.

                          Edit: BTW, you might need to use a sparse matrix, depending on how much memory you have and how large your objects are.

                          Comment


                          • #14
                            Ah I see! That is AMAZINGLY faster than my loop... Thank you so much dpryan for explaining this trick, very very thankful!!!

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Advanced Tools Transforming the Field of Cytogenomics
                              by seqadmin


                              At the intersection of cytogenetics and genomics lies the exciting field of cytogenomics. It focuses on studying chromosomes at a molecular scale, involving techniques that analyze either the whole genome or particular DNA sequences to examine variations in structure and behavior at the chromosomal or subchromosomal level. By integrating cytogenetic techniques with genomic analysis, researchers can effectively investigate chromosomal abnormalities related to diseases, particularly...
                              09-26-2023, 06:26 AM
                            • seqadmin
                              How RNA-Seq is Transforming Cancer Studies
                              by seqadmin



                              Cancer research has been transformed through numerous molecular techniques, with RNA sequencing (RNA-seq) playing a crucial role in understanding the complexity of the disease. Maša Ivin, Ph.D., Scientific Writer at Lexogen, and Yvonne Goepel Ph.D., Product Manager at Lexogen, remarked that “The high-throughput nature of RNA-seq allows for rapid profiling and deep exploration of the transcriptome.” They emphasized its indispensable role in cancer research, aiding in biomarker...
                              09-07-2023, 11:15 PM
                            • seqadmin
                              Methods for Investigating the Transcriptome
                              by seqadmin




                              Ribonucleic acid (RNA) represents a range of diverse molecules that play a crucial role in many cellular processes. From serving as a protein template to regulating genes, the complex processes involving RNA make it a focal point of study for many scientists. This article will spotlight various methods scientists have developed to investigate different RNA subtypes and the broader transcriptome.

                              Whole Transcriptome RNA-seq
                              Whole transcriptome sequencing...
                              08-31-2023, 11:07 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, Yesterday, 06:57 AM
                            0 responses
                            10 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 09-26-2023, 07:53 AM
                            0 responses
                            10 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 09-25-2023, 07:42 AM
                            0 responses
                            15 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 09-22-2023, 09:05 AM
                            0 responses
                            45 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X