Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • gene feature converter

    Hi guys,
    i hv got a GO file for my differentially expressed genes file, it goes like:

    FBgn00001 GO:0016301 [Name:****(annotation)]
    FBgn00002 GO:0016301 [Name:****(annotation)]
    FBgn00003 GO:0016301 [Name:****(annotation)]
    FBgn00004 GO:0003700 [Name:****(annotation)]
    FBgn00004 GO:0009651 [Name:****(annotation)]
    FBgn00004 GO:0006355 [Name:****(annotation)]
    FBgn00005 GO:0009556 [Name:****(annotation)]
    FBgn00005 GO:0005515 [Name:****(annotation)]
    FBgn00005 GO:0080019 [Name:****(annotation)]
    FBgn00005 GO:0016563 [Name:****(annotation)]
    FBgn00005 GO:0016627 [Name:****(annotation)]
    FBgn00006 GO:0003700 [Name:****(annotation)]
    FBgn00006 GO:0010018 [Name:****(annotation)]

    now i want to use WEGO ,so i need to convert it like:

    FBgn00001 GO:0016301
    FBgn00002 GO:0016301
    FBgn00003 GO:0016301
    FBgn00004 GO:0003700 GO:0009651 GO:0006355
    FBgn00005 GO:0009556 GO:0005515 GO:0080019 GO:0016563 GO:0016627
    FBgn00006 GO:0003700 GO:0010018

    I think this could be solved using a perl script. I am not able to do this since i am a beginner. Can someone help me out? A simple perl script is good enough for me^^

  • #2
    Originally posted by jason_ARGONAUTE View Post
    Hi guys,
    i hv got a GO file for my differentially expressed genes file, it goes like:

    FBgn00001 GO:0016301 [Name:****(annotation)]
    FBgn00002 GO:0016301 [Name:****(annotation)]
    FBgn00003 GO:0016301 [Name:****(annotation)]
    FBgn00004 GO:0003700 [Name:****(annotation)]
    FBgn00004 GO:0009651 [Name:****(annotation)]
    FBgn00004 GO:0006355 [Name:****(annotation)]
    FBgn00005 GO:0009556 [Name:****(annotation)]
    FBgn00005 GO:0005515 [Name:****(annotation)]
    FBgn00005 GO:0080019 [Name:****(annotation)]
    FBgn00005 GO:0016563 [Name:****(annotation)]
    FBgn00005 GO:0016627 [Name:****(annotation)]
    FBgn00006 GO:0003700 [Name:****(annotation)]
    FBgn00006 GO:0010018 [Name:****(annotation)]

    now i want to use WEGO ,so i need to convert it like:

    FBgn00001 GO:0016301
    FBgn00002 GO:0016301
    FBgn00003 GO:0016301
    FBgn00004 GO:0003700 GO:0009651 GO:0006355
    FBgn00005 GO:0009556 GO:0005515 GO:0080019 GO:0016563 GO:0016627
    FBgn00006 GO:0003700 GO:0010018

    I think this could be solved using a perl script. I am not able to do this since i am a beginner. Can someone help me out? A simple perl script is good enough for me^^
    both of files are tab-delemited.

    Comment


    • #3
      This might help

      sed 's/\[.*\]//g' genes_file

      Comment


      • #4
        This python script should work

        import csv
        reader = csv.reader(open("GO.txt","r"), delimiter="\t")
        new={}
        for row in reader:
        if row[0] not in new.keys():
        new[row[0]] = [row[1]]
        else:
        new[row[0]].append(row[1])


        with open("wego.txt","w") as f:
        for key, value in sorted(new.items()):
        f.write(key+"\t"+"\t".join(value)+"\n")
        Last edited by crazyhottommy; 10-18-2013, 10:26 AM.

        Comment


        • #5
          I don't know why the indentation is messed up....

          Originally posted by crazyhottommy View Post
          This python script should work

          import csv
          reader = csv.reader(open("GO.txt","r"), delimiter="\t")
          new={}
          for row in reader:
          if row[0] not in new.keys():
          new[row[0]] = [row[1]]
          else:
          new[row[0]].append(row[1])


          with open("wego.txt","w") as f:
          for key, value in sorted(new.items()):
          f.write(key+"\t"+"\t".join(value)+"\n")

          Comment


          • #6
            Originally posted by crazyhottommy View Post
            I don't know why the indentation is messed up....
            You need to use the "["CODE"]" tags (remove the quotes). If you go to the advanced mode, then click on the hash tag in the toolbar.

            Code:
            I'm in a code block
                and I can be indented to not muck up python

            Comment


            • #7
              [/CODE][/CODE]
              Originally posted by dpryan View Post
              You need to use the "["CODE"]" tags (remove the quotes). If you go to the advanced mode, then click on the hash tag in the toolbar.

              Code:
              I'm in a code block
                  and I can be indented to not muck up python

              Test...


              Code:
              import csv
              reader = csv.reader(open("GO.txt","r"), delimiter="\t")
              new={}
              for row in reader:
                  if row[0] not in new.keys():
                      new[row[0]] = [row[1]]
                  else:
                      new[row[0]].append(row[1])
              
              
              with open("wego.txt","w") as f:
                  for key, value in sorted(new.items()):
                     f.write(key+"\t"+"\t".join(value)+"\n")
              Last edited by crazyhottommy; 10-19-2013, 04:57 AM.

              Comment


              • #8
                This one line awk can do the trick...

                awk '{ if (a[$1]) a[$1]=a[$1]"\t"$2; else a[$1]=$2;} END { for (i in a) print i, a[i]}' OFS="\t" input.txt

                Comment


                • #9
                  i like the simplicity, thank you!

                  Comment


                  • #10
                    Originally posted by Ciaran View Post
                    This might help

                    sed 's/\[.*\]//g' genes_file

                    i like the simplicity of linux commands, thank you!

                    Comment


                    • #11
                      Originally posted by crazyhottommy View Post
                      This one line awk can do the trick...

                      awk '{ if (a[$1]) a[$1]=a[$1]"\t"$2; else a[$1]=$2;} END { for (i in a) print i, a[i]}' OFS="\t" input.txt
                      i'm new to command awk, but thanks anyway^^

                      Comment


                      • #12
                        Originally posted by crazyhottommy View Post
                        [/CODE][/CODE]


                        Test...


                        Code:
                        import csv
                        reader = csv.reader(open("GO.txt","r"), delimiter="\t")
                        new={}
                        for row in reader:
                            if row[0] not in new.keys():
                                new[row[0]] = [row[1]]
                            else:
                                new[row[0]].append(row[1])
                        
                        
                        with open("wego.txt","w") as f:
                            for key, value in sorted(new.items()):
                               f.write(key+"\t"+"\t".join(value)+"\n")
                        many people told me to learn Python instead of Perl, maybe i'll learn python someday^^

                        Comment


                        • #13
                          Originally posted by jason_ARGONAUTE View Post
                          many people told me to learn Python instead of Perl, maybe i'll learn python someday^^
                          A Perl version? Okay, here's something that might work:

                          Code:
                          perl -ane '
                            if($gn ne $F[0]){
                              print ($gn?"\n":"").$gn;
                            }
                            print " ".$F[1];
                            $gn = $F[0];
                            END {
                              print "\n";
                            }'
                          [delimiter can be changed with the -F option, i.e. -F '/\t/']

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Non-Coding RNA Research and Technologies
                            by seqadmin


                            Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                            [Article Coming Soon!]...
                            Today, 08:07 AM
                          • seqadmin
                            Recent Developments in Metagenomics
                            by seqadmin





                            Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
                            09-23-2024, 06:35 AM
                          • seqadmin
                            Understanding Genetic Influence on Infectious Disease
                            by seqadmin




                            During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

                            Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
                            09-09-2024, 10:59 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 10-02-2024, 04:51 AM
                          0 responses
                          14 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 10-01-2024, 07:10 AM
                          0 responses
                          24 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 09-30-2024, 08:33 AM
                          1 response
                          31 views
                          0 likes
                          Last Post EmiTom
                          by EmiTom
                           
                          Started by seqadmin, 09-26-2024, 12:57 PM
                          0 responses
                          19 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X