Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • phastCons download from UCSC

    Hi, there,
    I want to do some analysis on a list of genomic regions in mouse and access their conservation by using phastCons scores. I'm a bit confused with different files available on UCSC table browser. For mm9, there are

    1. Vertebrate Cons (phastCons30way) and

    2. Vertebrate El (phastConsElements30way)

    What's the difference between the two? Their table schema is also different.

    The Vertebrate Cons has 14 fields and I'm not sure which field has the phastCons score...

    The Vertebrate El has only 6 fields with the last filed being score and I guess this is phastCons score?

    Also, can I add or average scores from different regions?

    Thanks so much!

  • #2
    Output "data points" from the Table Browser to get conservation score data from the phastCons30way table.

    However, I think that you are better off downloading the mm9 phastCons scores directly from UCSC's website. Compression of data presented by the UCSC browser can introduce errors, particularly if you are aggregating scores over multiple regions. Data obtained directly from file downloads are not compressed or modified.

    Note also that phastCons and phyloP conservation scores are not generated for alignment gaps and unaligned nucleotides. You may want to filter those regions before aggregation.

    Another good place to ask these sorts of questions is the UCSC Genome mailing list.

    Comment


    • #3
      Originally posted by AlexReynolds View Post
      Output "data points" from the Table Browser to get conservation score data from the phastCons30way table.

      However, I think that you are better off downloading the mm9 phastCons scores directly from UCSC's website. Compression of data presented by the UCSC browser can introduce errors, particularly if you are aggregating scores over multiple regions. Data obtained directly from file downloads are not compressed or modified.

      Note also that phastCons and phyloP conservation scores are not generated for alignment gaps and unaligned nucleotides. You may want to filter those regions before aggregation.

      Another good place to ask these sorts of questions is the UCSC Genome mailing list.
      I downloaded the scores directly from the site you mentioned. It's something I was looking for. Thanks a lot for that!

      I was looking at the data points from the table browser and I only saw the 2nd last column is the sumData and it says "sum of the data points, for average and stddev calc" in the description. I think it's the sum of the individual nucleotide scores in that region. Do you know how they decide on the range to calculate the sumData? I think the table browser has the data in wiggle format and my question is probably the same as how they decide on where to draw the line on the genome to make the wiggle file.

      Thanks so much!!!
      Last edited by gene_x; 03-07-2013, 12:45 PM.

      Comment


      • #4
        Also, what's the difference between Vertebrate Cons(phastCons30way) and Vertebrate El (phastConsElements30way)?

        Comment


        • #5
          I'd recommend that you put these questions up on the UCSC Genome mailing list, which is run by UCSC staff who are more familiar with how the data are prepared before posting on the genome browser.

          Comment


          • #6
            Originally posted by AlexReynolds View Post
            I'd recommend that you put these questions up on the UCSC Genome mailing list, which is run by UCSC staff who are more familiar with how the data are prepared before posting on the genome browser.
            OK.. I'll do that.

            Comment


            • #7
              I figured it out.. the Vertebrate El track is the elements predicted to be conserved, it's basically regions with continuously high phastCons scores.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM
              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              18 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              22 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              17 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              49 views
              0 likes
              Last Post seqadmin  
              Working...
              X