Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • fabio25
    Member
    • Aug 2008
    • 13

    retrieve gene name

    Hi everybody,
    I'm fabio and I wished to ask you if anybody knows how to retrieve from the genomic locations the gene name. Up to now I'm using the USCS genome browser, but It's quite complex to retrieve the gene name one by one! Thanks a lot,
    Fabio
  • ECO
    --Site Admin--
    • Oct 2007
    • 1360

    #2
    Hey Fabio,

    Keep in mind I'm not a programmer, so I'm sure someone else here has a better solution! But it's pretty easy to retrieve gene names (or anything really) using the Table Browser at UCSC combiend with some simple perl. I've used the following subroutine to get info about any gene (from the "knownGene" table) given the chromosome, start and end position. It will undoubtedly need updating as it's a few years old, and could certainly be coded better (it uses LWP::Simple)

    Code:
    sub knownGene{
        my %knowngene;
        #my $location = "chr" . $7 . ":" . $9 . "-" . $10;
        my ($chr,$start,$end) = @_;
        my $location = "chr" . $chr . ":" . $start . "-" . $end;
    
        my $p = "http://genome.cse.ucsc.edu/cgi-bin/hgText?";
        my $q = "db=hg16&table=hg16.knownGene&phase=Get+all+fields&position=$location&submit=submit&";
    
        my $c = get ("$p"."$q");
    
        my @b = split ('\n',$c);
    
        foreach my $line (@b) {
             if ( $line =~ /^\#/){
             next;
         }
         if ($line =~ /^(\w+)\s+(\w+)\s+([-+])\s+(\w+)\s+(\w+)\s+(\w+)\s+(\w+)\s+(\w+)\s+([\w\,]+)\s+([\w\,]+)/){
             $knowngene{'name'} = $1;
             $knowngene{'chr'} = $2;
             $knowngene{'strand'} = $3;
             $knowngene{'txStart'} = $4;
             $knowngene{'txEnd'} = $5;
             $knowngene{'cdsStart'} = $6;
             $knowngene{'cdsEnd'} = $7;
             $knowngene{'exonCount'} = $8;
             $knowngene{'exonStarts'} = $9;
             $knowngene{'exonEnds'} = $10;         
         }
         
        
         if ($knowngene{'name'}) { 
             return \%knowngene;
         }else{
             return undef;
         }
         }
    }
    I can't do it right now, but it's pretty easy to adapt this to read in a list of "chr:XXXXXX-YYYYYY" data and output the genes. Hope that helps.

    Comment

    • fabio25
      Member
      • Aug 2008
      • 13

      #3
      hi eco thank you very much for your reply. My problem is that I'm not familiar with Perl scripting, and so I'll start to learn it. Untill now I worked only in R and bioconductor, but unfortunately I didn't find any package to manage properly chip-seq data. Sorry for the stupid question...where do you insert the PERL code???

      Comment

      • kmay
        Member
        • Aug 2008
        • 29

        #4
        Dear Fabio,

        this question is more complex that it seems at first glance.
        When having large numbers of regions from a NGS experiment a big number of regions won´t fall into annotated regions. Then, is gene name really what you want or is it rather the transcript or exon, or promoter, or UTR, or..., or...
        NGS not alway is strand specific, so you need to look at the sense strand and anti-sense strand, both upstream and downstream.

        An easy way to get all this annotation for a bed-file is RegionMiner

        If you are interested in just the gene names overlaping with your regions, ECO´s script might help

        Cheers

        Klaus

        Comment

        • fabio25
          Member
          • Aug 2008
          • 13

          #5
          hi Kmay,
          thank you very much for your help. I was trying to use the RegionMiner (genomatix), but my bed file (raw data)was to0 big, aroung 60 Mb and the server told me that I cannot up-load it. Then I up-load the .wig file (analyzed by someone other else) in uscs browser and then I downloaded it as bed file, but the table browser didn't insert the data points, only the chromosonal locations. Do you know how I can do?

          Comment

          • kmay
            Member
            • Aug 2008
            • 29

            #6
            Fabio,

            before uploading the data, you have to cluster the raw data into regions of significant tag enrichment. Annotating the raw data will most likely give you almost every gene in the genome.
            You cannot upload all raw data tags in the on-line version for visualization nor annotation ( as said, the latter seems not very useful to me). For such you would need to have GGA on site.
            Our clustering is available only on the GGA.
            However, you might give Shirely Liu´s MACS a try and upload the cluster regions thereafter.

            Cheers

            Klaus

            Comment

            • ECO
              --Site Admin--
              • Oct 2007
              • 1360

              #7
              Originally posted by fabio25 View Post
              hi eco thank you very much for your reply. My problem is that I'm not familiar with Perl scripting, and so I'll start to learn it. Untill now I worked only in R and bioconductor, but unfortunately I didn't find any package to manage properly chip-seq data. Sorry for the stupid question...where do you insert the PERL code???
              Hey Fabio. Klaus is right, there are more comprehensive solutions out there, but they are costly, and rarely let you do the exact analysis you need.

              If you are interested in learning perl (which will undoubtedly help you at some point), there are a ton of great resources out there for learning it free...like here: http://www.perl.com/pub/a/2000/10/begperl1.html

              You'll need some sort of interpreter if you're working on windows...ActivePerl is a good place to start. Good luck!

              Comment

              • olus
                Member
                • Aug 2008
                • 22

                #8
                Hi Fabio,
                I never used GALAXY for NGS data but you can have a try:
                Galaxy is a community-driven web-based analysis platform for life science research.
                gabriele bucci

                Comment

                • fabio25
                  Member
                  • Aug 2008
                  • 13

                  #9
                  by fabio

                  hi Gbucci
                  thanks a lot for your advise. I tried one time to work with it but it dxoesn't work so fine with custom track in wiz format. probably it's me and I'll try again. May I ask you what do you use usually?

                  Comment

                  • kmay
                    Member
                    • Aug 2008
                    • 29

                    #10
                    fabio,

                    can I ftp your data? I´ll do a quick run on them and send you the results. Will take about 15 minutes.

                    if it helps...

                    Klaus

                    Comment

                    • dcfargo
                      Member
                      • Aug 2008
                      • 22

                      #11
                      If your organism is in Ensembl you can use the Biomart tool to extract genes (or other elements) by location.

                      Comment

                      • fabio25
                        Member
                        • Aug 2008
                        • 13

                        #12
                        hi dcfargo
                        i did that, but it's not so precised. I retrieves me even te genes around doing it in R. probably I 'll have to try on the website.

                        Comment

                        • fabio25
                          Member
                          • Aug 2008
                          • 13

                          #13
                          hi kmay,
                          I would like to do that, but the data are not mine and I cannot send them.
                          However, I'm ostinate to find an open source way how to deal with these data, but if I'm not able I'll work with GGA, how you suggested me before. Thank you very much.

                          Comment

                          • zee
                            NGS specialist
                            • Apr 2008
                            • 249

                            #14
                            Fabio,

                            Galaxy and the UCSC tables browser should do exactly what you need. Use some basic logic before trying to do it all in one go. I would:

                            1) Choose a subset of my query data e.g grep -w "chr1" file.bed > chr1.mydata.bed
                            2) Go to UCSC tables browser
                            3) Select the Gene Table
                            4) Select the Union/Intersection option
                            5) Intersect the chr1.mydata.bed with the Genes track
                            6) output the intersection results in comma/tab separated format
                            7) Import file into MS Excel or some spreadsheet program

                            If this can work then u just need to generalize it to you whole dataset and not try to do too many steps at once. THis is only one possible solution and there are probably more elegant open source methods.

                            Comment

                            • olus
                              Member
                              • Aug 2008
                              • 22

                              #15
                              Originally posted by fabio25 View Post
                              hi Gbucci
                              thanks a lot for your advise. I tried one time to work with it but it dxoesn't work so fine with custom track in wiz format. probably it's me and I'll try again. May I ask you what do you use usually?
                              Hi Fabio,
                              when I deal with long list of [chr\tstart\tend\tstrand] genomic coordinates I use a perl script pretty like the one ECO suggested you. The script parses your file, reading in the coords and passes them to the UCSC remote database, using a mysql query.
                              I'm quite sure that does exist a Bioconductor's way of doing it, but I can't tell you more since I never experimented it. You may have a look in the BioC mailing list.

                              Ask if you need help with perl scripting.

                              My Best

                              G
                              gabriele bucci

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                by SEQadmin2


                                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


                                Here are nine questions we think about, in roughly the order they matter, before...
                                06-18-2026, 07:11 AM
                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 06-17-2026, 06:09 AM
                              0 responses
                              26 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-09-2026, 11:58 AM
                              0 responses
                              44 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              48 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-04-2026, 08:59 AM
                              0 responses
                              50 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...