Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BSmoothed data analysis

    Hello,
    I am a bioinformatician working at the University of Pennsylvania and I just recently started the analysis of BiSulfite-seq data. I have with me a library of condition A paired-end sequenced in 2 different lanes and the same for condition B (i.e. 4 fastq files per condition).
    I have used Bismark to align the reads and make the methylation calls. I then used BSmooth to smooth my data and call differentially methylated regions. After I do this and plot the smoothed data there are a few discrepancies which I am not sure what they mean.
    1) I see straight lines (i.e. constant methylation %) in the smoothed plots of a lot of DMRs leading me to believe they are outliers of some kind. How do I know if these are real?
    2) I see a lot of variability in the methylation % between the same library sequenced in different lanes. Is this a consequence of BSmooth/Bismark? Has anyone seen something similar happen?
    3) Are there defined stringencies/cutoffs used for calling intergenic vs promoter DMRs at any point in the BSmooth pipeline?
    Any help would be much appreciated!

  • #2
    1) You'd need to show an example, though straight lines typically occur where there's no data. Always plot raw signals along with smoothed signals so you know how realistic the results are.
    2) You should really merge these before extracting methylation calls. Lower coverage will lead to increased variability and not merging these will lead to an improperly increased N.
    3) Not that I know of, though perhaps someone else will reply with some.

    Comment


    • #3
      1) I have attached an image file with one of the examples. If there are straight lines when there is no data how do I see them showing me a high methylation %? Does that mean only a few points are driving this plot? Thanks for the raw signal advice, will do that and see how it looks.

      2) I did think of this but if I do end up merging these then BSmooth gives me errors when computing its T-statistic. Merging these I end up with only 2 files which are not enough for BSmooth to compute the T-statistic.

      Thanks!
      Attached Files

      Comment


      • #4
        If you only have two files then you shouldn't be using BSmooth. As is, any results you get are unreliable and if you try to publish the results they should be rejected. You're experiment can only provide pilot data for future experiments and nothing else.

        Comment


        • #5
          Well we do have 3 more replicates on the way but here is the problem with that. All the replicates were pooled into a single library which was then sequenced in multiple lanes (the reason for that is beyond me, but so it is) and this experiment was to make sure I could get the bioinformatics to work. So with me right now I have (or will have soon) 5 datasets which all came from the SAME pooled library. Does it make sense to use BSmooth then? Or is this an exercise in futility since I have lost all the replicate information at the pooling step?

          Comment


          • #6
            This is an exercise in futility

            Comment


            • #7
              Yeah I thought the same but they wanted me to try anyway. Thanks a lot!

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM
              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Today, 08:47 AM
              0 responses
              12 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              60 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              59 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              54 views
              0 likes
              Last Post seqadmin  
              Working...
              X