Header Leaderboard Ad

Collapse

Who let M's and R's in the Genome

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Who let M's and R's in the Genome

    Just looking for other peoples input on something

    Has anyone notice there are two R and one M IUPAC codes on chromosome 3 in the reference genomes, both NCBI36 and GRCh37. Maybe not surprisingly they sit in the FHIT gene.

    Does this worry anyone for genome indexing or the such, seems minor in 3/3billion.
    Attached Files

  • #2
    Yes I noticed that, they are quite odd... I just replace them with N's when doing analysis.
    SpliceMap: De novo detection of splice junctions from RNA-seq
    Download SpliceMap Comment here

    Comment


    • #3
      Surprisingly, I noticed this yesterday too!
      Not expected, but no clue if it affects things downstream...
      --
      bioinfosm

      Comment


      • #4
        I know in some cases they use N to mask sequence that are repeats.

        Comment


        • #5
          the other letters in the code

          m= a or c, as in amino

          r= g or a, as in purine

          n= a or g or c or t/u, unknown, or other

          Comment


          • #6
            Originally posted by Joann View Post
            m= a or c, as in amino

            r= g or a, as in purine

            n= a or g or c or t/u, unknown, or other
            ah ha, good to know, thanks!

            Comment


            • #7
              Wait! there's more...

              Please see Annex C, Appendix 2, Table 1, page 16 at, for example,

              http://www.noip.gov.vn/noip/resource.nsf/vwResourceList/B4F5E35FA26A8AA4472577360013F1D3/$FILE/Standards%20%E2%80%93%20ST25.pdf

              for a complete list of nucleotide letter symbols in use per a current international standard.

              See also
              An extended IUPAC nomenclature code for polymorphic
              nucleic acids
              doi:10.1093/bioinformatics/btq098
              Last edited by Joann; 01-21-2011, 02:09 PM. Reason: update

              Comment


              • #8
                Interesting to know something apart from A,T,G,C,N.

                But UCSC has them as "N"s

                Comment


                • #9
                  These additional letters are sometimes called 'ambiguiety codes'. Back in the day when a 30X human genome sequence cost a billion dollars instead of several thousand, every piece of sequence information was much more precious. Knowing a position was a purine was better than calling it an N. The codes are also useful for reporting heterozygous genotype information as a single letter. The fact that they still occur in reference genomes is mostly just a nuisance for bioinformatics and thus some resources such as UCSC convert them to N's. I believe human genome sequences retrieved via Ensembl may still contain them though.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    How RNA-Seq is Transforming Cancer Studies
                    by seqadmin



                    Cancer research has been transformed through numerous molecular techniques, with RNA sequencing (RNA-seq) playing a crucial role in understanding the complexity of the disease. Maša Ivin, Ph.D., Scientific Writer at Lexogen, and Yvonne Goepel Ph.D., Product Manager at Lexogen, remarked that “The high-throughput nature of RNA-seq allows for rapid profiling and deep exploration of the transcriptome.” They emphasized its indispensable role in cancer research, aiding in biomarker...
                    09-07-2023, 11:15 PM
                  • seqadmin
                    Methods for Investigating the Transcriptome
                    by seqadmin




                    Ribonucleic acid (RNA) represents a range of diverse molecules that play a crucial role in many cellular processes. From serving as a protein template to regulating genes, the complex processes involving RNA make it a focal point of study for many scientists. This article will spotlight various methods scientists have developed to investigate different RNA subtypes and the broader transcriptome.

                    Whole Transcriptome RNA-seq
                    Whole transcriptome sequencing...
                    08-31-2023, 11:07 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 09:05 AM
                  0 responses
                  13 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 09-21-2023, 06:18 AM
                  0 responses
                  10 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 09-20-2023, 09:17 AM
                  0 responses
                  12 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 09-19-2023, 09:23 AM
                  0 responses
                  28 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X