Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Matching a column from a data frame on the columns of another data frame

    I got two big data frames, one (df1) has this structure

    V1 V2 V3
    1 Chr1 7507 10944
    2 Chr1 10944 13170
    3 Chr1 13170 20065
    4 Chr1 20065 28273
    5 Chr1 28273 29960
    6 Chr1 29960 36599
    7 Chr1 36599 37513
    8 Chr1 37513 40360
    9 Chr1 40360 48796
    10 Chr1 48796 50661

    The other (df2) has this

    V1 V2 V3 V4 V5
    1 Chr1 7507 7507 1 1
    2 Chr1 10944 10944 1 2
    3 Chr1 13170 13170 1 22
    4 Chr1 20065 20065 1 3
    5 Chr1 28273 28273 1 161
    6 Chr1 29960 29960 1 10
    7 Chr1 36599 36599 1 604
    8 Chr1 37513 37513 1 117
    9 Chr1 40360 40360 1 8
    10 Chr1 48796 48796 1 3
    what I'm trying to do is to check if the column V2 or V3 (is the same) of df2 is = or between the range of V2 and V3 of df1 then I want to write the value of V5 of df2 in a new column in df1 if not write 0. the result that i want would be like :

    Chr1 7507 10944 1
    Chr1 10944 13170 2
    Chr1 13170 20065 22
    Chr1 20065 28273 3
    Chr1 28273 29960 161
    Chr1 29960 36599 10
    Chr1 36599 37513 604
    Chr1 37513 40360 117
    Chr1 40360 48796 8
    .
    .
    .
    Do you know any good way to do this?
    Thank you very much.
    Last edited by zisis86; 05-28-2014, 05:38 AM.

  • #2
    The simplest solution is to make these GRanges objects and then use findOverlaps. You can then add meta information columns to the first dataset (just 2 columns of 0s) and then increment those values according to the overlap values. This has the benefit of taking care of cases where there are multiple overlaps.

    Comment


    • #3
      Do you want to compare these line by line? Also, are V2 and V3 always the same?

      If so, why not use an ifelse statement in R?

      df3<-cbind(df1,nrow(df1) ##just adds another column to the df equal to 0
      df3[,4]<-ifelse((df2[,2]>(df1[,2]-1) || df2[,2]<(df1[,3]+1)),df2[,5],df3[,4])

      not tested, but something like this should work if you are going line by line

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM
      • seqadmin
        Techniques and Challenges in Conservation Genomics
        by seqadmin



        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

        Avian Conservation
        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
        03-08-2024, 10:41 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Yesterday, 06:37 PM
      0 responses
      10 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, Yesterday, 06:07 PM
      0 responses
      9 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-22-2024, 10:03 AM
      0 responses
      50 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-21-2024, 07:32 AM
      0 responses
      67 views
      0 likes
      Last Post seqadmin  
      Working...
      X