Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • retrieve gene name

    Hi everybody,
    I'm fabio and I wished to ask you if anybody knows how to retrieve from the genomic locations the gene name. Up to now I'm using the USCS genome browser, but It's quite complex to retrieve the gene name one by one! Thanks a lot,
    Fabio

  • #2
    Hey Fabio,

    Keep in mind I'm not a programmer, so I'm sure someone else here has a better solution! But it's pretty easy to retrieve gene names (or anything really) using the Table Browser at UCSC combiend with some simple perl. I've used the following subroutine to get info about any gene (from the "knownGene" table) given the chromosome, start and end position. It will undoubtedly need updating as it's a few years old, and could certainly be coded better (it uses LWP::Simple)

    Code:
    sub knownGene{
        my %knowngene;
        #my $location = "chr" . $7 . ":" . $9 . "-" . $10;
        my ($chr,$start,$end) = @_;
        my $location = "chr" . $chr . ":" . $start . "-" . $end;
    
        my $p = "http://genome.cse.ucsc.edu/cgi-bin/hgText?";
        my $q = "db=hg16&table=hg16.knownGene&phase=Get+all+fields&position=$location&submit=submit&";
    
        my $c = get ("$p"."$q");
    
        my @b = split ('\n',$c);
    
        foreach my $line (@b) {
             if ( $line =~ /^\#/){
             next;
         }
         if ($line =~ /^(\w+)\s+(\w+)\s+([-+])\s+(\w+)\s+(\w+)\s+(\w+)\s+(\w+)\s+(\w+)\s+([\w\,]+)\s+([\w\,]+)/){
             $knowngene{'name'} = $1;
             $knowngene{'chr'} = $2;
             $knowngene{'strand'} = $3;
             $knowngene{'txStart'} = $4;
             $knowngene{'txEnd'} = $5;
             $knowngene{'cdsStart'} = $6;
             $knowngene{'cdsEnd'} = $7;
             $knowngene{'exonCount'} = $8;
             $knowngene{'exonStarts'} = $9;
             $knowngene{'exonEnds'} = $10;         
         }
         
        
         if ($knowngene{'name'}) { 
             return \%knowngene;
         }else{
             return undef;
         }
         }
    }
    I can't do it right now, but it's pretty easy to adapt this to read in a list of "chr:XXXXXX-YYYYYY" data and output the genes. Hope that helps.

    Comment


    • #3
      hi eco thank you very much for your reply. My problem is that I'm not familiar with Perl scripting, and so I'll start to learn it. Untill now I worked only in R and bioconductor, but unfortunately I didn't find any package to manage properly chip-seq data. Sorry for the stupid question...where do you insert the PERL code???

      Comment


      • #4
        Dear Fabio,

        this question is more complex that it seems at first glance.
        When having large numbers of regions from a NGS experiment a big number of regions won´t fall into annotated regions. Then, is gene name really what you want or is it rather the transcript or exon, or promoter, or UTR, or..., or...
        NGS not alway is strand specific, so you need to look at the sense strand and anti-sense strand, both upstream and downstream.

        An easy way to get all this annotation for a bed-file is RegionMiner

        If you are interested in just the gene names overlaping with your regions, ECO´s script might help

        Cheers

        Klaus

        Comment


        • #5
          hi Kmay,
          thank you very much for your help. I was trying to use the RegionMiner (genomatix), but my bed file (raw data)was to0 big, aroung 60 Mb and the server told me that I cannot up-load it. Then I up-load the .wig file (analyzed by someone other else) in uscs browser and then I downloaded it as bed file, but the table browser didn't insert the data points, only the chromosonal locations. Do you know how I can do?

          Comment


          • #6
            Fabio,

            before uploading the data, you have to cluster the raw data into regions of significant tag enrichment. Annotating the raw data will most likely give you almost every gene in the genome.
            You cannot upload all raw data tags in the on-line version for visualization nor annotation ( as said, the latter seems not very useful to me). For such you would need to have GGA on site.
            Our clustering is available only on the GGA.
            However, you might give Shirely Liu´s MACS a try and upload the cluster regions thereafter.

            Cheers

            Klaus

            Comment


            • #7
              Originally posted by fabio25 View Post
              hi eco thank you very much for your reply. My problem is that I'm not familiar with Perl scripting, and so I'll start to learn it. Untill now I worked only in R and bioconductor, but unfortunately I didn't find any package to manage properly chip-seq data. Sorry for the stupid question...where do you insert the PERL code???
              Hey Fabio. Klaus is right, there are more comprehensive solutions out there, but they are costly, and rarely let you do the exact analysis you need.

              If you are interested in learning perl (which will undoubtedly help you at some point), there are a ton of great resources out there for learning it free...like here: http://www.perl.com/pub/a/2000/10/begperl1.html

              You'll need some sort of interpreter if you're working on windows...ActivePerl is a good place to start. Good luck!

              Comment


              • #8
                Hi Fabio,
                I never used GALAXY for NGS data but you can have a try:
                Galaxy is a community-driven web-based analysis platform for life science research.
                gabriele bucci

                Comment


                • #9
                  by fabio

                  hi Gbucci
                  thanks a lot for your advise. I tried one time to work with it but it dxoesn't work so fine with custom track in wiz format. probably it's me and I'll try again. May I ask you what do you use usually?

                  Comment


                  • #10
                    fabio,

                    can I ftp your data? I´ll do a quick run on them and send you the results. Will take about 15 minutes.

                    if it helps...

                    Klaus

                    Comment


                    • #11
                      If your organism is in Ensembl you can use the Biomart tool to extract genes (or other elements) by location.

                      Comment


                      • #12
                        hi dcfargo
                        i did that, but it's not so precised. I retrieves me even te genes around doing it in R. probably I 'll have to try on the website.

                        Comment


                        • #13
                          hi kmay,
                          I would like to do that, but the data are not mine and I cannot send them.
                          However, I'm ostinate to find an open source way how to deal with these data, but if I'm not able I'll work with GGA, how you suggested me before. Thank you very much.

                          Comment


                          • #14
                            Fabio,

                            Galaxy and the UCSC tables browser should do exactly what you need. Use some basic logic before trying to do it all in one go. I would:

                            1) Choose a subset of my query data e.g grep -w "chr1" file.bed > chr1.mydata.bed
                            2) Go to UCSC tables browser
                            3) Select the Gene Table
                            4) Select the Union/Intersection option
                            5) Intersect the chr1.mydata.bed with the Genes track
                            6) output the intersection results in comma/tab separated format
                            7) Import file into MS Excel or some spreadsheet program

                            If this can work then u just need to generalize it to you whole dataset and not try to do too many steps at once. THis is only one possible solution and there are probably more elegant open source methods.

                            Comment


                            • #15
                              Originally posted by fabio25 View Post
                              hi Gbucci
                              thanks a lot for your advise. I tried one time to work with it but it dxoesn't work so fine with custom track in wiz format. probably it's me and I'll try again. May I ask you what do you use usually?
                              Hi Fabio,
                              when I deal with long list of [chr\tstart\tend\tstrand] genomic coordinates I use a perl script pretty like the one ECO suggested you. The script parses your file, reading in the coords and passes them to the UCSC remote database, using a mysql query.
                              I'm quite sure that does exist a Bioconductor's way of doing it, but I can't tell you more since I never experimented it. You may have a look in the BioC mailing list.

                              Ask if you need help with perl scripting.

                              My Best

                              G
                              gabriele bucci

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                The Impact of AI in Genomic Medicine
                                by seqadmin



                                Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                                02-26-2024, 02:07 PM
                              • seqadmin
                                Multiomics Techniques Advancing Disease Research
                                by seqadmin


                                New and advanced multiomics tools and technologies have opened new avenues of research and markedly enhanced various disciplines such as disease research and precision medicine1. The practice of merging diverse data from various ‘omes increasingly provides a more holistic understanding of biological systems. As Maddison Masaeli, Co-Founder and CEO at Deepcell, aptly noted, “You can't explain biology in its complex form with one modality.”

                                A major leap in the field has
                                ...
                                02-08-2024, 06:33 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Today, 06:12 AM
                              0 responses
                              13 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 02-23-2024, 04:11 PM
                              0 responses
                              67 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 02-21-2024, 08:52 AM
                              0 responses
                              70 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 02-20-2024, 08:57 AM
                              0 responses
                              61 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X