Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to extract Common Genes from 2 spreadsheets

    Now I have two Excel spreadsheets, parts of them are as below:

    Spreadsheet1

    Gene Change Padj
    1 212 0.0006
    2 99 0.09
    3 15 0.7
    4 59 0.456
    5 418 0.00001
    6 566 0

    Spreadsheet2

    Gene Statistic FDR
    1 28 0.0005
    2 17 0.007
    3 3 0.81
    4 8 0.28
    5 48 0.002
    6 111 0

    Using Excel, how can I create one new spreadsheet that only includes the
    Genes with both Padj and FDR are less than 0.05?

    And how can I do this in any other statsitical software, Like R, SPSS?
    Thanks very much for any idea for any question!

  • #2
    This is cake in R...

    Depending on how large the datasets are, read them into R. The example below (using read.table) is just for using the clipboard in MacOSX (highlight in excel, copy, then run that line). The equiv in windows is read.table("clipboard",sep='\t',header=TRUE).

    Code:
    [B]> df1 <- read.table(pipe("pbpaste"),header=TRUE)
    > df1[/B]
      Gene Change    Padj
    1    1    212 0.00060
    2    2     99 0.09000
    3    3     15 0.70000
    4    4     59 0.45600
    5    5    418 0.00001
    6    6    566 0.00000
    [B]> df2 <- read.table(pipe("pbpaste"),header=TRUE)
    > df2[/B]
      Gene Statistic    FDR
    1    1        28 0.0005
    2    2        17 0.0070
    3    3         3 0.8100
    4    4         8 0.2800
    5    5        48 0.0020
    6    6       111 0.0000
    [B]> df3 <- merge(df1,df2)
    > df3[/B]
      Gene Change    Padj Statistic    FDR
    1    1    212 0.00060        28 0.0005
    2    2     99 0.09000        17 0.0070
    3    3     15 0.70000         3 0.8100
    4    4     59 0.45600         8 0.2800
    5    5    418 0.00001        48 0.0020
    6    6    566 0.00000       111 0.0000
    [B]> df3[df3$Padj< 0.05 & df3$FDR < 0.05,][/B]
      Gene Change  Padj Statistic   FDR
    1    1    212 6e-04        28 5e-04
    5    5    418 1e-05        48 2e-03
    6    6    566 0e+00       111 0e+00
    [B]> write.csv(df3[df3$Padj< 0.05 & df3$FDR < 0.05,])[/B]
    "","Gene","Change","Padj","Statistic","FDR"
    "1",1,212,6e-04,28,5e-04
    "5",5,418,1e-05,48,0.002
    "6",6,566,0,111,0

    Comment


    • #3
      Thanks ECO, I run the similar codes in R, too. And I can get the result like yours Just based on the Spreadsheets showed in this thread. Sorry for not telling more info about my data:But I have more Genes(37970) in Spreadsheet2 than Spreadsheet1 (17980). It supposed to have 17980 Genes left after merging [the 19990(37970-17980) Genes which only in Spreadsheet2 will not appear in the merged sheets]. Unfortunately, it messed up after merging. Any idea will be greatly appreciated.

      Originally posted by ECO View Post
      This is cake in R...

      Depending on how large the datasets are, read them into R. The example below (using read.table) is just for using the clipboard in MacOSX (highlight in excel, copy, then run that line). The equiv in windows is read.table("clipboard",sep='\t',header=TRUE).

      Code:
      [B]> df1 <- read.table(pipe("pbpaste"),header=TRUE)
      > df1[/B]
        Gene Change    Padj
      1    1    212 0.00060
      2    2     99 0.09000
      3    3     15 0.70000
      4    4     59 0.45600
      5    5    418 0.00001
      6    6    566 0.00000
      [B]> df2 <- read.table(pipe("pbpaste"),header=TRUE)
      > df2[/B]
        Gene Statistic    FDR
      1    1        28 0.0005
      2    2        17 0.0070
      3    3         3 0.8100
      4    4         8 0.2800
      5    5        48 0.0020
      6    6       111 0.0000
      [B]> df3 <- merge(df1,df2)
      > df3[/B]
        Gene Change    Padj Statistic    FDR
      1    1    212 0.00060        28 0.0005
      2    2     99 0.09000        17 0.0070
      3    3     15 0.70000         3 0.8100
      4    4     59 0.45600         8 0.2800
      5    5    418 0.00001        48 0.0020
      6    6    566 0.00000       111 0.0000
      [B]> df3[df3$Padj< 0.05 & df3$FDR < 0.05,][/B]
        Gene Change  Padj Statistic   FDR
      1    1    212 6e-04        28 5e-04
      5    5    418 1e-05        48 2e-03
      6    6    566 0e+00       111 0e+00
      [B]> write.csv(df3[df3$Padj< 0.05 & df3$FDR < 0.05,])[/B]
      "","Gene","Change","Padj","Statistic","FDR"
      "1",1,212,6e-04,28,5e-04
      "5",5,418,1e-05,48,0.002
      "6",6,566,0,111,0
      Last edited by byou678; 10-03-2011, 12:08 PM.

      Comment


      • #4
        Just need to tell merge what to do...(note I added two extra genes into df1)

        Code:
        [B]> df1[/B]
          Gene Change    Padj
        1    1    212 0.00060
        2    2     99 0.09000
        3    3     15 0.70000
        4    4     59 0.45600
        5    5    418 0.00001
        6    6    566 0.00000
        7    7    123 0.00000
        8    8    125 0.04000
        [B]> df2[/B]
          Gene Statistic    FDR
        1    1        28 0.0005
        2    2        17 0.0070
        3    3         3 0.8100
        4    4         8 0.2800
        5    5        48 0.0020
        6    6       111 0.0000
        [B]> merge(df1,df2,on=Gene)[/B]
          Gene Statistic    FDR Change    Padj
        1    1        28 0.0005    212 0.00060
        2    2        17 0.0070     99 0.09000
        3    3         3 0.8100     15 0.70000
        4    4         8 0.2800     59 0.45600
        5    5        48 0.0020    418 0.00001
        6    6       111 0.0000    566 0.00000
        edit: Looks like "on" isn't necessary...something else going on with your data? Merge is pretty smart by default...

        Comment


        • #5
          Thanks again ECO. Yes, I like using R too and I have figured it out.

          Originally posted by ECO View Post
          Just need to tell merge what to do...(note I added two extra genes into df1)

          Code:
          [B]> df1[/B]
            Gene Change    Padj
          1    1    212 0.00060
          2    2     99 0.09000
          3    3     15 0.70000
          4    4     59 0.45600
          5    5    418 0.00001
          6    6    566 0.00000
          7    7    123 0.00000
          8    8    125 0.04000
          [B]> df2[/B]
            Gene Statistic    FDR
          1    1        28 0.0005
          2    2        17 0.0070
          3    3         3 0.8100
          4    4         8 0.2800
          5    5        48 0.0020
          6    6       111 0.0000
          [B]> merge(df1,df2,on=Gene)[/B]
            Gene Statistic    FDR Change    Padj
          1    1        28 0.0005    212 0.00060
          2    2        17 0.0070     99 0.09000
          3    3         3 0.8100     15 0.70000
          4    4         8 0.2800     59 0.45600
          5    5        48 0.0020    418 0.00001
          6    6       111 0.0000    566 0.00000
          edit: Looks like "on" isn't necessary...something else going on with your data? Merge is pretty smart by default...

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Recent Advances in Sequencing Analysis Tools
            by seqadmin


            The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
            Today, 07:48 AM
          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Today, 07:17 AM
          0 responses
          11 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 05-02-2024, 08:06 AM
          0 responses
          19 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-30-2024, 12:17 PM
          0 responses
          20 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-29-2024, 10:49 AM
          0 responses
          28 views
          0 likes
          Last Post seqadmin  
          Working...
          X