Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bisulfite seq format file, help!

    Hi everyone,
    I'm new in methylation analysis and I downloaded a public bisulfite seq data, but I cannot tell what the file format is .
    The file ends in a bs-call.basecall.

    The file contents is:
    chr1 131398 CC 12 0 -
    chr1 131399 CC 12 0 -
    chr1 131400 CC 13 0 +
    chr1 131401 CC 13 0 +
    chr1 131402 CG 2 11 +
    chr1 131403 CG 4 10 -
    chr1 131404 CA 13 0 +
    chr1 131407 CC 13 0 +
    chr1 131408 CC 13 0 +
    chr1 131409 CC 15 0 +
    chr1 131410 CA 15 0 +
    chr1 131412 CA 15 0 +


    Do you know this kind of file?

    Thank a lot!!!!!
    Mimmy
    Last edited by Mimmy86; 10-25-2013, 07:20 AM.

  • #2
    It would help if you gave a URL. Particularly if this is a GEO dataset, there's probably a description somewhere. Having said that, it looks like: chromosome, position, context, unmethylated count, methylated count, strand. Context gives the nucleotide position following the C in question (these days, you'd see CpG, CHG, or CHH rather than what you have).

    Comment


    • #3
      From this GEO record: http://www.ncbi.nlm.nih.gov/geo/quer...?acc=GSM922329 comes following tidbit

      Supplementary_files_format_and_content: Methylation calls files for C's in the bsmap alignment files were generated using methratio.py
      Seems to be similar to what Mimmy86 is reporting.

      Comment


      • #4
        Hi,
        thaks for your replay!
        As you said i took this data from GEO GSM922329.
        I examined methratio.py manual in bsmap and I find an output file description that confused me.
        The description is
        Output format: tab delimited txt file with the following columns:
        1) chromorome
        2) coordinate (1-based)
        3) strand
        4) sequence context (2nt upstream to 2nt downstream in Watson strand direction)
        5) methylation ratio, calculated as #C_counts / #eff_CT_counts
        6) number of effective total C+T counts on this locus (#eff_CT_counts)
        CT_SNP="no action", #eff_CT_counts = #CT_counts
        CT_SNP="correct", #eff_CT_counts = #CT_counts * (#rev_G_counts / #rev_GA_counts)
        7) number of total C counts on this locus (#C_counts)
        8) number of total C+T counts on this locuso (#CT_counts)
        9) number of total G counts on this locus of reverse strand (#rev_G_counts)
        10) number of total G+A counts on this locus of reverse strand (#rev_GA_counts)
        11) lower bound of 95% confidence interval of methylation ratio, calculated by Wilson score interval for binomial proportion.
        12) upper bound of 95% confidence interval of methylation ratio, calculated by Wilson score interval for binomial proportion.

        Comment


        • #5
          If the description on GEO (...produced by methratio.py) and the actual format are not matching you should probably contact the authors directly (also so that they can update the description on GEO). Looking at the file I am pretty sure though that Devon's assessment is correct.

          Comment


          • #6
            thanks, I also think that Devon's assessment is correct. I tried to calculate the Cmethylation frequency using the 4th and 5th columns. do you think that i can continue on this frequencies?

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM
            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            30 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            32 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            28 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            52 views
            0 likes
            Last Post seqadmin  
            Working...
            X