Unconfigured Ad

simonandrews · 10-15-2014, 12:53 AM

Originally posted by kentawan View Post

Hi all, are there any options that forces SeqMonk to use multiple processor cores?

I was running a intensity difference filter over a methylation call datasets from bismark. I was comparing two different individuals with a genome size of approx 1Gb. The entire comparison took two weeks and it gave me zero hits, haha. Just want to trial and error on this software so I was wondering any ways to force the analysis to use up more resources so that it is faster?

Also, is this the correct way of doing differentially methylated region analysis on SeqMonk?

Wow! I'm kind of impressed you waited that long for it to finish, but unfortunately, as you found, it wasn't really an appropriate test to use anyway, and if it took that long to run I guess you were using a huge number of probes, so even if you had found things which were somewhat significant they would have been extremely unlikely to survive multiple testing correction for that many tests.

The intensity difference filter is only useful in a case where:

You have a measure where the size of the measure and the noise are correlated - generally anything which is a direct manipulation of a sequence count
You expect that the majority of the data will not be changing such that it makes sense to look for outliers after creating a model over the rest of the data.

..and unfortunately neither of these really applies to methlyation data.

There are a few different ways to look at methlyation data depending on what you're looking for, but a typical recipe we'd use would be something like this:

Decide where you're going to measure. It almost never makes sense to analyse individual cytosines as the coverage for each will generally be poor and differences will never be significant. If you're targeting something specific like gene bodies, promoters or CpG islands then put probes over those, if not then tile probes of an appropriate size (our default is 3-5kb if we have reasonably good coverage - go larger if your coverage isn't great). Use whichever probe generator is appropriate to make probes over these regions.
Quantitate your data using the bisulphite quantitation pipeline using the probes you made in the step above. I'd suggest setting the minimum coverage depth and the measures per feature both to 1 to start with (this isn't the default at the moment, but it will be in the next release)
Decide on the minimum absolute methylation change you're prepared to care about. In well measured regions you might find that a change of 1% or less will be significant, but you want to consider the biological context and whether this is likely to mean anything. We'd probably normally start looking at a minimum 5-10% change.
Run a contingency based test (Filtering > Statistical Test > Chi Square > For/Rev) and select the two datasets you want to compare. Set the minimum difference to the value you selected in the last step. This will give you an initial hit list.
Select the hit list you got from the contingency test and then filter this using your current quantitation using the differences filter (Filter > Value difference > Individual probes). Select both of your datasets in both option lists and set the minimum difference to be the cutoff you selected before.

You should now have a list of probes which show a methylation change which is both statistically and biologically significant. If you did tiled probes then the next step would be to try to relate the positions you found to biological features to try to understand why they were selected.

There are many other ways to go about this, but hopefully this will at least get you started.

kentawan · 10-14-2014, 05:17 PM

Hi all, are there any options that forces SeqMonk to use multiple processor cores?

I was running a intensity difference filter over a methylation call datasets from bismark. I was comparing two different individuals with a genome size of approx 1Gb. The entire comparison took two weeks and it gave me zero hits, haha. Just want to trial and error on this software so I was wondering any ways to force the analysis to use up more resources so that it is faster?

Also, is this the correct way of doing differentially methylated region analysis on SeqMonk?

All guidance and criticisms are highly welcomed! Newbie here just transitioning between wet and dry lab.

BTW, my computing cluster is allocated 8 cores of [email protected]. Total memory was 31.4Gb.

fkrueger · 06-13-2014, 11:56 AM

Originally posted by kshankar View Post

I had a quick question about importing Bismark methylation extractor output into SeqMonk (as in Under File -> Import Data -> Bismark). Is this feature not included in version 27 anymore? I am sure we have used this feature in our previous publications with earlier versions. Is there a workaround?

thanks
--Kartik

The File -> Import Data -> Bismark import only ever worked with the vanilla Bismark output (pre-SAM/BAM). Files from the methylation extractor may be imported via the Generic Text Import filter. Once there, select col 3 as chromosome, col 4 as both start and end positions, and col 2 as strand (= methylation state).

kshankar · 06-13-2014, 08:40 AM

Importing Bismark txt file into Seqmonk

I had a quick question about importing Bismark methylation extractor output into SeqMonk (as in Under File -> Import Data -> Bismark). Is this feature not included in version 27 anymore? I am sure we have used this feature in our previous publications with earlier versions. Is there a workaround?

thanks
--Kartik

edere · 05-14-2014, 12:13 PM

Originally posted by simonandrews View Post

That's interesting - the seqmonk data should be pretty unambiguous since it will show all of the raw calls put out by the methylation extractor. Just to be sure have you done a simple read count in the seqmonk data track and confirmed that the counts for those regions really are 0? I know that internally we fixed a bug where some reads didn't show up, but I'm pretty sure that only affected the development version which isn't released.

There are plenty of options for how to turn a set of calls into methylation percentages (and there will be some new options in the next seqmonk release) but there shouldn't be any ambiguity over the presence/absence of reads.

Another sanity check would be to take the location of one of the disputed regions and then use something like samtools to extract the reads for that region from the BAM files produced by bismark. That would let you see if there should have been CpG calls in the region and if there were then we could track down at which subsequent point they disappeared and why.

Hi Simon,

I managed to track down the problem. It was something that I did that I forgot to take into account when comparing the analysis methods. I forgot that I used the --ignore argument when running the methylation extraction.

When I ran the methylation extraction again without the --ignore option and compared the calls in Seqmonk and methylkit, everything matched.

Thanks for your suggestions though.

Ed

simonandrews · 05-12-2014, 07:59 AM

Originally posted by edere View Post

When I randomly look at the methylRaw values can compare the locations of methylated or unmethylated cytosines and compare them with data loaded into Seqmonk, I’m finding that there are methylRaw values have data for locations that do not correspond with the data loaded into Seqmonk.

That's interesting - the seqmonk data should be pretty unambiguous since it will show all of the raw calls put out by the methylation extractor. Just to be sure have you done a simple read count in the seqmonk data track and confirmed that the counts for those regions really are 0? I know that internally we fixed a bug where some reads didn't show up, but I'm pretty sure that only affected the development version which isn't released.

There are plenty of options for how to turn a set of calls into methylation percentages (and there will be some new options in the next seqmonk release) but there shouldn't be any ambiguity over the presence/absence of reads.

Another sanity check would be to take the location of one of the disputed regions and then use something like samtools to extract the reads for that region from the BAM files produced by bismark. That would let you see if there should have been CpG calls in the region and if there were then we could track down at which subsequent point they disappeared and why.

edere · 05-12-2014, 07:53 AM

I’ve been working with a mRRBS dataset and have a bit of time using both methylKit and Seqmonk. I’ve used Bismark to align my bisulfite sequence reads, sorted the outputted SAM files by genomic location and then loaded the data into R using the read.bismark function in methylKit and looking only at the CpG context.

Using the same SAM files, I used Bismark’s methylation extractor to generate the CpG context OT and OB methylation calls, and then loaded those files into Seqmonk.

When I randomly look at the methylRaw values can compare the locations of methylated or unmethylated cytosines and compare them with data loaded into Seqmonk, I’m finding that there are methylRaw values have data for locations that do not correspond with the data loaded into Seqmonk.

Has anyone else experienced this and/or have a reason why methylKit and Seqmonk don’t have the same results?

Thanks,
Ed

simonandrews · 04-28-2014, 12:19 AM

Originally posted by Aspadia View Post

Hi Simon,

I had a look at the v28 development of Seqmonk and indeed I was able to visualize the heatmap without any filtering (even obs/exp = 0) which is great and useful for my application!
As I mentioned in my previous post I would like to substract interactions found for one dataset (one kind of linker combination) from interactions found for another dataset (another kind of linker combination).
Am I right that the interactions that seqmonk finds are only 'present' in the HiC heatmap and not in e.g. the genome view? Because in the genome view with the probes showing enrichment I can substract one dataset from the other but this is then not visualised in the HiC heatmap (or the report from the HiC heatmap). Is there any other way of getting a report for the interactions found besides the report from the heatmap?
Please let me know if it is possible to substract interactions for one dataset from another (and how)!

Thank you very much in advance!

You're right that the normal subtraction of probe lists won't be of any use for subtracting HiC interaction lists. At the moment there isn't a way to do this within SeqMonk. I've been planning a proper fix for this for a while - the idea would be that the calculation of interaction lists would be a separate process which would introduce another folder in the data view of interaction sets. These could then be put into views such as the heatmap view (meaning you don't have to re-calculate each time) and would also then be able to be combined and filtered in different ways. Unfortunately this is a lot of work as it messes quite deeply with the core structure of seqmonk and I just haven't had time to work on it.

For the time being therefore any subtraction would have to be done by exporting the lists from the heatmap view and then using an external script to match up the ids to find the common and different points.

Aspadia · 04-22-2014, 12:58 AM

Hi Simon,

I had a look at the v28 development of Seqmonk and indeed I was able to visualize the heatmap without any filtering (even obs/exp = 0) which is great and useful for my application!
As I mentioned in my previous post I would like to substract interactions found for one dataset (one kind of linker combination) from interactions found for another dataset (another kind of linker combination).
Am I right that the interactions that seqmonk finds are only 'present' in the HiC heatmap and not in e.g. the genome view? Because in the genome view with the probes showing enrichment I can substract one dataset from the other but this is then not visualised in the HiC heatmap (or the report from the HiC heatmap). Is there any other way of getting a report for the interactions found besides the report from the heatmap?
Please let me know if it is possible to substract interactions for one dataset from another (and how)!

Thank you very much in advance!

simonandrews · 03-28-2014, 03:29 AM

Sorry to take so long to reply to you - I'm a bit behind on my mails right now!

The statistical model for HiC in SeqMonk is based on the expectation that random interactions should be distributed in line with the level of fragment end coverage within each region, so that regions which have a large number of fragment ends in them should also be more likely to form part of any random interactions. This works well for methods where you expect fairly even fragment end coverage (such as normal HiC) but is less good where you have uneven distributions (any kind of intentional enrichment). In these cases the method still works, but if your enrichment is very strong then you may find that you get highly significant results from small numbers of interactions with generally depleted regions, which may not be biologically sensible.

If you want to view the heatmap without applying any filtering then you should be able to set the cutoffs to not filter (p<1 and diff >0). I might be mis-remembering but I may have changed some of this code recently to allow this kind of construction (I think the filters may have been restricted to showing only interactions with Obs/Exp > 1 whatever you selected). You could try the development snapshot at http://www.bioinformatics.babraham.a...28.0_devel.zip which has some fixes and other HiC improvements which might be useful.

Aspadia · 03-24-2014, 01:19 AM

ChIA-PET Hi-C maps

Hi Simon,

I have been using Seqmonk succesfully for analysis of ChIP-seq data and now I have ChIA-PET data to analyse. I have the reads mapped in a BAM file, I have seperate BAM files for the different linker combinations generated by ChIA-PET. I manage to visualise them in a Hi-C heatmap in Seqmonk but got a bit worried here because there is some statistics applied before the heatmap is generated. I think the statistics used for Hi-C experiments vs statistics for ChIA-PET experiments must be different as ChIA-PET is a ChIP (and therefore enriched for loci to begin with) derived method. I have been looking for the exact statistics but I cannot find them anywhere, only a brief mentioning in one of the tutorial video's. Can you tell me where I can find what statistics are applied? Is there any option to visualize the data in a heatmap without applying the statistics?
I would also like to substract interactions found for one linker combination from interactions found for another linker combination. Is this possible in Seqmonk?
Thank you very much for your help!

edere · 03-05-2014, 06:53 AM

I'm not exactly sure how to run the Chi-squared test in Seqmonk. When I add the two data stores that I want to compare, in this case they are my replicate sets for my control and treatment conditions, they get populated into the "Pairs" box, but I can't run the filter.

Clearly I haven't done something correctly. How do I set up the Chi-squared test to compare the my treatment replicates against the control replicates for each probe (cytosine)?

Originally posted by simonandrews View Post

In all of the statistical tests in SeqMonk you should be able to set the cutoff to 1 to generate a full list of hits regardless of significance. I just tried this in a few tests and whilst it worked OK in most of them, it failed for the replicate set stats in a few cases. Looking at the cases where it failed it was probes where all of the values in all cases were exactly the same, so there was no absolute difference and no variance. This is probably a corner case which produces an infinite value which doesn't get caught and I'll look into that.

Since you're trying to run this on RRBS data then you're much more likely to hit this condition since I guess it will be fairly common to have repeated methylation values (especially 0 or 100%). Quantitative tests such as the replicate set stats aren't really suitable for this kind of data unless you have huge numbers of replicates (central limit theorem and all that). We'd normally use the contingency based tests (Chi-Square in SeqMonk) to do a count based significance assessment, along with a subsequent filter for a sensible level of absolute difference.

simonandrews · 03-05-2014, 12:43 AM

Originally posted by edere View Post

Hi,
I've been using Seqmonk to process and visualize my RRBS data. The data I've imported has replicates for treatment and control conditions, and I want to export a list of all the probes (CpGs) and their q-values to plot the distribution of q-values. I've tried filtering my data using the Replicate Set Stats and setting the p-value cutoff to 1 but this generates a list of the probes with p-values < 1.

In all of the statistical tests in SeqMonk you should be able to set the cutoff to 1 to generate a full list of hits regardless of significance. I just tried this in a few tests and whilst it worked OK in most of them, it failed for the replicate set stats in a few cases. Looking at the cases where it failed it was probes where all of the values in all cases were exactly the same, so there was no absolute difference and no variance. This is probably a corner case which produces an infinite value which doesn't get caught and I'll look into that.

Since you're trying to run this on RRBS data then you're much more likely to hit this condition since I guess it will be fairly common to have repeated methylation values (especially 0 or 100%). Quantitative tests such as the replicate set stats aren't really suitable for this kind of data unless you have huge numbers of replicates (central limit theorem and all that). We'd normally use the contingency based tests (Chi-Square in SeqMonk) to do a count based significance assessment, along with a subsequent filter for a sensible level of absolute difference.

edere · 03-04-2014, 12:59 PM

Hi,

I've been using Seqmonk to process and visualize my RRBS data. The data I've imported has replicates for treatment and control conditions, and I want to export a list of all the probes (CpGs) and their q-values to plot the distribution of q-values. I've tried filtering my data using the Replicate Set Stats and setting the p-value cutoff to 1 but this generates a list of the probes with p-values < 1.

Is there anyway a full list of probes and q-values?

Thanks

simonandrews · 02-24-2014, 11:26 AM

Originally posted by Skissors View Post

Hi,

I used the RNA-seq pipeline in SeqMonk to process my data. My initial probe list was 26127 in size. So I performed a filter by statistical test using intensity difference. In the pop-up box where it indicates "Probes per sample", when I try to input 26127, it defaults to 13036. Is there any reason why it would default to this value?

I'm new to using SeqMonk and NGS in general, so I have much to learn.

Thanks for your help!

It should normally default to 1% of the probe set size, so even 13,000 is way too big. This value is the sample size the program uses to make up a custom distribution to judge the significance of each point - you need to make it considerably smaller than the total probe set size, and unless you have a good reason it's as well to leave it set to the default value which the program suggests.

Hope this helps

Topics	Statistics	Last Post
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, Today, 10:09 AM	0 responses 9 views 0 reactions	Last Post by SEQadmin2 Today, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, Yesterday, 08:59 AM	0 responses 17 views 0 reactions	Last Post by SEQadmin2 Yesterday, 08:59 AM
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, 06-02-2026, 12:03 PM	0 responses 25 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 12:03 PM
DNA Methylation Study Reveals How Epigenetic Changes Pass Between Generations by SEQadmin2 Started by SEQadmin2, 06-02-2026, 11:40 AM	0 responses 21 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 11:40 AM

Unconfigured Ad

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News