Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • UCSC refSeq Gene and hg19 coordinate

    Hello,
    I have a list of mRNA NM_ numers.
    In UCSC, hg19->refGene table, I can get exons and cds coordinates for every NM_.

    However, when I pull out a subsequence from hg19 based on refGene coordinates, the result seems to be not correct for reverse strand. Reverse complement of the pulled exons dosn't work as well.

    -------
    example:
    I have a: NM_012345.3
    From UCSC i know, that for NM_012345 the first CDS is beetwen 50000:50100, strand: "-", chr1
    Then I use:
    Code:
    samtools faidx /path/hg19.fa chr1:50000-50100
    The result doesn't start with ATG (and it should starts).


    Where is the problem? I know that UCSC doesn't use the version (NM_012345 instead of NM_012345.3) but it should work.

    (hg19 is downloaded from http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/)

  • #2
    NM_012345 is on chromosome 13 (check the genome browser). I expect you're either reading something wrong or got the wrong refGene table.

    Comment


    • #3
      Originally posted by dpryan View Post
      NM_012345 is on chromosome 13 (check the genome browser). I expect you're either reading something wrong or got the wrong refGene table.
      heh, it was an abstract example 012345 is like abcdef
      I test ~3000 genes in such way. 1600 works good, they are "+" strand.
      ~1400 are "-" and when I use samtools faidx, I can't get correct mRNA, CDS.

      Comment


      • #4
        Ah, in the future, always give working examples

        Remember that anything on the "-" strand should end in ATG (actually, CAT), rather than start with it.

        Comment


        • #5
          ok, real example:
          I have gene IL10, NM_000572.2.
          Based on NM_000572 from UCSC I get:

          name: NM_000572
          chrom: chr1
          strand: -
          txStart: 206940947
          txEnd: 206945839
          cdsStart: 206941980
          cdsEnd: 206945780
          exonStarts: 206940947,206943173,206944251,206944700,206945615,
          exonEnds: 206942073,206943239,206944404,206944760,206945839,
          name2: IL10

          so first CDS is from 206941980 to 206942073

          then I use:
          Code:
          samtools faidx hg19.fa chr1:206941978-206942075
          ( I added +2 to each side because UCSC is 0-based, hg19 1-based)
          the output:
          GTCTCAGTTTCGTATCTTCATTGTCATGTAGGCTTCTATGTAGTTGATGAAGATGTCAAACTCACTCATGGCTTTGTAGATGCCTTTCTCTTGGAGCT

          no ATG, and TAC in here;/

          Comment


          • #6
            It's on the '-' strand, so you're grabbing the end, rather than the beginning

            Comment


            • #7
              Originally posted by dpryan View Post
              It's on the '-' strand, so you're grabbing the end, rather than the beginning
              heh yes, I've just realised it.
              If starnd is "-", start codon is cdsEnd and end codon is cdsStart! Very confusing!
              + 1 to experience

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Understanding Genetic Influence on Infectious Disease
                by seqadmin




                During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

                Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
                09-09-2024, 10:59 AM
              • seqadmin
                Addressing Off-Target Effects in CRISPR Technologies
                by seqadmin






                The first FDA-approved CRISPR-based therapy marked the transition of therapeutic gene editing from a dream to reality1. CRISPR technologies have streamlined gene editing, and CRISPR screens have become an important approach for identifying genes involved in disease processes2. This technique introduces targeted mutations across numerous genes, enabling large-scale identification of gene functions, interactions, and pathways3. Identifying the full range...
                08-27-2024, 04:44 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Today, 06:25 AM
              0 responses
              13 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 01:02 PM
              0 responses
              12 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 09-18-2024, 06:39 AM
              0 responses
              14 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 09-11-2024, 02:44 PM
              0 responses
              14 views
              0 likes
              Last Post seqadmin  
              Working...
              X