Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to extract Common Genes from 2 spreadsheets

    Now I have two Excel spreadsheets, parts of them are as below:

    Spreadsheet1

    Gene Change Padj
    1 212 0.0006
    2 99 0.09
    3 15 0.7
    4 59 0.456
    5 418 0.00001
    6 566 0

    Spreadsheet2

    Gene Statistic FDR
    1 28 0.0005
    2 17 0.007
    3 3 0.81
    4 8 0.28
    5 48 0.002
    6 111 0

    Using Excel, how can I create one new spreadsheet that only includes the
    Genes with both Padj and FDR are less than 0.05?

    And how can I do this in any other statsitical software, Like R, SPSS?
    Thanks very much for any idea for any question!

  • #2
    This is cake in R...

    Depending on how large the datasets are, read them into R. The example below (using read.table) is just for using the clipboard in MacOSX (highlight in excel, copy, then run that line). The equiv in windows is read.table("clipboard",sep='\t',header=TRUE).

    Code:
    [B]> df1 <- read.table(pipe("pbpaste"),header=TRUE)
    > df1[/B]
      Gene Change    Padj
    1    1    212 0.00060
    2    2     99 0.09000
    3    3     15 0.70000
    4    4     59 0.45600
    5    5    418 0.00001
    6    6    566 0.00000
    [B]> df2 <- read.table(pipe("pbpaste"),header=TRUE)
    > df2[/B]
      Gene Statistic    FDR
    1    1        28 0.0005
    2    2        17 0.0070
    3    3         3 0.8100
    4    4         8 0.2800
    5    5        48 0.0020
    6    6       111 0.0000
    [B]> df3 <- merge(df1,df2)
    > df3[/B]
      Gene Change    Padj Statistic    FDR
    1    1    212 0.00060        28 0.0005
    2    2     99 0.09000        17 0.0070
    3    3     15 0.70000         3 0.8100
    4    4     59 0.45600         8 0.2800
    5    5    418 0.00001        48 0.0020
    6    6    566 0.00000       111 0.0000
    [B]> df3[df3$Padj< 0.05 & df3$FDR < 0.05,][/B]
      Gene Change  Padj Statistic   FDR
    1    1    212 6e-04        28 5e-04
    5    5    418 1e-05        48 2e-03
    6    6    566 0e+00       111 0e+00
    [B]> write.csv(df3[df3$Padj< 0.05 & df3$FDR < 0.05,])[/B]
    "","Gene","Change","Padj","Statistic","FDR"
    "1",1,212,6e-04,28,5e-04
    "5",5,418,1e-05,48,0.002
    "6",6,566,0,111,0

    Comment


    • #3
      Thanks ECO, I run the similar codes in R, too. And I can get the result like yours Just based on the Spreadsheets showed in this thread. Sorry for not telling more info about my data:But I have more Genes(37970) in Spreadsheet2 than Spreadsheet1 (17980). It supposed to have 17980 Genes left after merging [the 19990(37970-17980) Genes which only in Spreadsheet2 will not appear in the merged sheets]. Unfortunately, it messed up after merging. Any idea will be greatly appreciated.

      Originally posted by ECO View Post
      This is cake in R...

      Depending on how large the datasets are, read them into R. The example below (using read.table) is just for using the clipboard in MacOSX (highlight in excel, copy, then run that line). The equiv in windows is read.table("clipboard",sep='\t',header=TRUE).

      Code:
      [B]> df1 <- read.table(pipe("pbpaste"),header=TRUE)
      > df1[/B]
        Gene Change    Padj
      1    1    212 0.00060
      2    2     99 0.09000
      3    3     15 0.70000
      4    4     59 0.45600
      5    5    418 0.00001
      6    6    566 0.00000
      [B]> df2 <- read.table(pipe("pbpaste"),header=TRUE)
      > df2[/B]
        Gene Statistic    FDR
      1    1        28 0.0005
      2    2        17 0.0070
      3    3         3 0.8100
      4    4         8 0.2800
      5    5        48 0.0020
      6    6       111 0.0000
      [B]> df3 <- merge(df1,df2)
      > df3[/B]
        Gene Change    Padj Statistic    FDR
      1    1    212 0.00060        28 0.0005
      2    2     99 0.09000        17 0.0070
      3    3     15 0.70000         3 0.8100
      4    4     59 0.45600         8 0.2800
      5    5    418 0.00001        48 0.0020
      6    6    566 0.00000       111 0.0000
      [B]> df3[df3$Padj< 0.05 & df3$FDR < 0.05,][/B]
        Gene Change  Padj Statistic   FDR
      1    1    212 6e-04        28 5e-04
      5    5    418 1e-05        48 2e-03
      6    6    566 0e+00       111 0e+00
      [B]> write.csv(df3[df3$Padj< 0.05 & df3$FDR < 0.05,])[/B]
      "","Gene","Change","Padj","Statistic","FDR"
      "1",1,212,6e-04,28,5e-04
      "5",5,418,1e-05,48,0.002
      "6",6,566,0,111,0
      Last edited by byou678; 10-03-2011, 12:08 PM.

      Comment


      • #4
        Just need to tell merge what to do...(note I added two extra genes into df1)

        Code:
        [B]> df1[/B]
          Gene Change    Padj
        1    1    212 0.00060
        2    2     99 0.09000
        3    3     15 0.70000
        4    4     59 0.45600
        5    5    418 0.00001
        6    6    566 0.00000
        7    7    123 0.00000
        8    8    125 0.04000
        [B]> df2[/B]
          Gene Statistic    FDR
        1    1        28 0.0005
        2    2        17 0.0070
        3    3         3 0.8100
        4    4         8 0.2800
        5    5        48 0.0020
        6    6       111 0.0000
        [B]> merge(df1,df2,on=Gene)[/B]
          Gene Statistic    FDR Change    Padj
        1    1        28 0.0005    212 0.00060
        2    2        17 0.0070     99 0.09000
        3    3         3 0.8100     15 0.70000
        4    4         8 0.2800     59 0.45600
        5    5        48 0.0020    418 0.00001
        6    6       111 0.0000    566 0.00000
        edit: Looks like "on" isn't necessary...something else going on with your data? Merge is pretty smart by default...

        Comment


        • #5
          Thanks again ECO. Yes, I like using R too and I have figured it out.

          Originally posted by ECO View Post
          Just need to tell merge what to do...(note I added two extra genes into df1)

          Code:
          [B]> df1[/B]
            Gene Change    Padj
          1    1    212 0.00060
          2    2     99 0.09000
          3    3     15 0.70000
          4    4     59 0.45600
          5    5    418 0.00001
          6    6    566 0.00000
          7    7    123 0.00000
          8    8    125 0.04000
          [B]> df2[/B]
            Gene Statistic    FDR
          1    1        28 0.0005
          2    2        17 0.0070
          3    3         3 0.8100
          4    4         8 0.2800
          5    5        48 0.0020
          6    6       111 0.0000
          [B]> merge(df1,df2,on=Gene)[/B]
            Gene Statistic    FDR Change    Padj
          1    1        28 0.0005    212 0.00060
          2    2        17 0.0070     99 0.09000
          3    3         3 0.8100     15 0.70000
          4    4         8 0.2800     59 0.45600
          5    5        48 0.0020    418 0.00001
          6    6       111 0.0000    566 0.00000
          edit: Looks like "on" isn't necessary...something else going on with your data? Merge is pretty smart by default...

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM
          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          17 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          21 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          16 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          46 views
          0 likes
          Last Post seqadmin  
          Working...
          X