Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Converting DNA position to transcript position

    Hi Friends,

    I have a simple problem that many of you must have considered before me. I have a DNA position showing variation (~SNP) within an exon of a gene/transcript. Is there already a script out there to convert a DNA position to a "transcript position" given a GTF file? Would be really happy to use that script in that case.

    Thanks!
    Boel

  • #2
    I think you mean you have chromosomal position such as chr1:222222 and dna change A>T and you want to know the coding sequence change with respect to start of a coding sequence like ATG. If this isn't what you want, then give example. If this is what you want, there is problem in that there may be more than one version of the coding sequence called isoform you have to decide which isoform you want thats probably why no tool will do this automatically. I have done it by myself based on data from ensembl definition of exons, i found errors in ucsc browser which is another place you can go. The problem is I want highly accurate manually annotated exons ensembl worked best for me. There are alot of other issues that I won't go into. its not as straght forward as seems to be most people have genes of interest in which case you have to prepare it yourself.

    Comment


    • #3
      Hi husamia, and thanks for your reply.
      No, I am not interested in the coding consequence, just interested in the position in the transcript, in the mRNA sequence.

      Like if the DNA pos. is chr1:30000, and this falls within the gene X's first exon, that I want to know the position in the mRNA position (pos 1 if gene X start at pos chr1:30000) . If a gene has several isoforms this will be reflected in my GTF file. A fairly simple mathematical exercise, just very nitty gritty to do, hence just wanted to hear if someone had a simple script. Thanks though.

      Comment


      • #4
        I had to do this exact exercise myself (though going further, to the amino acid as husamia described). I wrote my own script but it is not simple. It makes use of the BioPerl module Bio::Coordinate::GeneMapper which is meant for these types of transformations between coordinate spaces. But to use it everything must be a Bio::SeqFeature object. Since I was working in Arabidopsis I already had a Bio:B::SeqFeature database of TAIR9 set up (back end for GBrowse). If you are conversant with some serious BioPerl I could offer some guidance.

        Comment


        • #5
          Hi kmcarr,

          I'm looking into biopython, and there is some functionality there. Might cross over to BioPerl if I feel the need later on. Thanks a lot.

          Comment


          • #6
            drop me an email @ joachim dot deschrijver at ugent dot be

            I have such a script ready in Perl that you could use

            Comment


            • #7
              Ensembl's variant effect predictor may be of use, here. If you enter in a genomic position and allele(s) it will let you know the position in the cDNA and the protein (if there is one) and the amino acid change. Have a look at the example:



              It's available online, or through the API:



              Email us at [email protected] for more help.

              Comment


              • #8
                Originally posted by Giulietta View Post
                Ensembl's variant effect predictor may be of use, here. If you enter in a genomic position and allele(s) it will let you know the position in the cDNA and the protein (if there is one) and the amino acid change. Have a look at the example:



                It's available online, or through the API:



                Email us at [email protected] for more help.
                The link [http://uswest.ensembl.org/info/website/upload/var.html] gives 404 error but I think the correct link is [http://uswest.ensembl.org/Homo_sapie...oadVariations]

                Comment


                • #9
                  Originally posted by husamia View Post
                  Sorry about the broken link- we will endeavor to fix it.

                  The link at www.ensembl.org is working:



                  Try to change uswest to www (and go back to the UK site if it redirects you again!) The UploadVariations link you quote is not quite the one I was trying to point you to.

                  Cheers.

                  Comment


                  • #10
                    Originally posted by Boel View Post
                    Hi kmcarr,

                    I'm looking into biopython, and there is some functionality there. Might cross over to BioPerl if I feel the need later on. Thanks a lot.
                    Hi Boel, could you share the biopython functionality you used for converting the genomic coordinates to transcript coordinates? I have gff file where I would like to convert the genomic coordinates of utr and cds to transcript coordinates, but I am having a hard time finding a script or function that could do this. Thanks!

                    Comment


                    • #11
                      Ensembl VEP is a best bet for custom annotation (fast, robust, reliable, and easily automated)


                      Comment


                      • #12
                        Originally posted by m_two View Post
                        As far as I understand from the documentation, the ensembl vep requires variant information as input. The sites I would like to convert are not SNP positions, but miRNA target sites-- so I could not use vep for that conversion.

                        Comment


                        • #13
                          You basically need to subtract the position of the transcription start site from the position of the variant. This info is in several places. The source I use is the UCSC Table Browser.



                          The values for clade genome asssembly should be obvious.

                          Group Genes and Gene Predictions
                          Track RefSeq Genes
                          table refGene
                          output format all fields from selected table
                          output file refGene_human (or whatever your organism is)
                          file type returned gzip (speeds up download a lot)

                          Unzip the file and either load it into an SQL table set up with the refGene schema (click the button describe table schema for info) or programmatically search the unzipped text file for your gene to pull its TSS.

                          If you don't know databases, searching the plain text will be faster in the short run. But, if this is part of a major pipeline you will be running a lot, it would be worthwhile to become comfortable with a relational database system and embedding calls to that database inside your language of choice. That may sound like a major hurdle, but all the info you need is on the web. Message me, if you need help getting started to find the resources to learn this.

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Latest Developments in Precision Medicine
                            by seqadmin



                            Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                            Somatic Genomics
                            “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                            Yesterday, 01:16 PM
                          • seqadmin
                            Recent Advances in Sequencing Analysis Tools
                            by seqadmin


                            The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                            05-06-2024, 07:48 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, Yesterday, 07:15 AM
                          0 responses
                          13 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 05-23-2024, 10:28 AM
                          0 responses
                          17 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 05-23-2024, 07:35 AM
                          0 responses
                          19 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 05-22-2024, 02:06 PM
                          0 responses
                          10 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X