Unconfigured Ad

**simonandrews** · 09-13-2011, 11:52 PM

If you're looking to compare modification enrichment in genomic features then there are a couple of ways to do this.

You could put probes over your feature of interest and then do an enrichment quantitation and compare either the means or the distributions between your two samples. This would tell you if one sample was more enriched than another on average. The problem with this approach is that you may well see overall differences in enrichment which come from technical effects (how well the ChIP worked) rather than biological. These effects should be global though, so you could, for example, compare enrichment in promoters vs exons.

Alternatively you could make a simpler comparison by simply counting the number of promoters which showed enrichment and then comparing values between your samples. In many cases a simple quantitation of corrected read counts will show a nice bivalent distribution where you can easily set a threshold to separate the enriched from non-enriched populations. You could then apply this to your two samples and compare the number of promoters which pass the filter. This might not work well if there isn't a clear distinction between enriched and non-enriched in your sample though.

The probe trend plot probably isn't best suited to this kind of analysis. Its strength is in showing the pattern of enrichment to see if that changes, rather than judging the strength of enrichment which is normally better handled by the conventional quantitation tools. If you do want to use the trend plot to do this then you will need to use the cumulative distribution plot, but beware that (as the docs you quoted state), this is susceptible to bias from extreme outliers since it just sums the counts across all probes and makes no distinction between them in the final plot.

**Neuromancer** · 09-14-2011, 12:53 AM

Thanks for this comprehensive answer!!
I'll try that and let you know, how/what has worked.

Many Thanks!

**simonandrews** · 09-22-2011, 02:12 AM

I've just released SeqMonk v0.17.0 onto our project's web site. This is the biggest release we've made for an awfully long time and has lots of improvements and new toys to play with. The biggest changes are:

Support for HiC data sets, and a new HiC heatmap view to visualise them
New program launchers (now with a proper native windows exe) which will automatically configure optimal memory settings.
Support for gzipped data in all import filters
A new MA plot view
Support for very large annotation sets (millions of features)
A z-score transformation option in the quantitation tools
An option to match distributions exactly in the quantitaiton options
A new statistical filter for pairwise comparison of data stores without the requirement for replicates.

..plus many other smaller improvements and general tidying up. I'll hopefully be adding some more videos to our site in the near future to help illustrate the usage of some of the new tools.

**Neuromancer** · 09-26-2011, 12:35 AM

Dear Simon,

When I want to start seqmonk v0.17.0 on my iMac, it simply does not start. When I looked in the console I saw the following error message:

9/26/11 10:27:00 AM [0x0-0x1b01b].SeqMonk[971] Could't parse physical memory from the output of top at /Users/Shared/NGS/Programs/SeqMonk/SeqMonk.app/Contents/MacOS/seqmonk line 72.

However on my MacBook the v0.17.0 works fine...!
The iMac is a managed workstation (16GB RAM), so I'm not using it with limited read/write access, could that be a problem? Based on the error, I guess it has to do with configuring memory settings by the new automatic launcher...?

edit:
When I launched the seqmonk binary that is mentioned in the error message it says the following:

/Users/Shared/NGS/Programs/SeqMonk/SeqMonk.app/Contents/MacOS/seqmonk ; exit;
$ /Users/Shared/NGS/Programs/SeqMonk/SeqMonk.app/Contents/MacOS/seqmonk ; exit;
Memory ceiling is 8192
Could't parse physical memory from the output of top at /Users/Shared/NGS/Programs/SeqMonk/SeqMonk.app/Contents/MacOS/seqmonk line 72.

**simonandrews** · 09-26-2011, 12:43 AM

Sorry to hear this failed. Can you please try running the following command in a terminal and let me know what output you get:

top -l 1 -n 0

I thought top was always available on a mac, which may not be true, or it might be that the formatting is substantially different on some systems.

**Neuromancer** · 09-26-2011, 12:58 AM

Originally posted by simonandrews View Post

Sorry to hear this failed. Can you please try running the following command in a terminal and let me know what output you get:

top -l 1 -n 0

I thought top was always available on a mac, which may not be true, or it might be that the formatting is substantially different on some systems.

bash-3.2$ top -l 1 -n 0
Processes: 54 total, 2 running, 52 sleeping, 260 threads
2011/09/26 10:57:39
Load Avg: 0.11, 0.07, 0.06
CPU usage: 0.0% user, 25.0% sys, 75.0% idle
SharedLibs: 4944K resident, 12M data, 0B linkedit.
MemRegions: 6256 total, 543M resident, 12M private, 291M shared.
PhysMem: 599M wired, 664M active, 822M inactive, 2085M used, 14G free.
VM: 126G vsize, 1041M framework vsize, 46601(0) pageins, 0(0) pageouts.
Networks: packets: 19761/13M in, 11216/1895K out.
Disks: 40572/1282M read, 25716/882M written.

edit:
runs on SnowLeopard, if that is of any help!

**simonandrews** · 09-26-2011, 01:03 AM

Ah OK. When you have that much memory some of the values are reported in Gb rather than Mb so the parser fails to recognise the memory settings.

It should be an easy fix. I'll put out an updated version which fixes this.

In the mean time I think you can work round it by running:

/Users/Shared/NGS/Programs/SeqMonk/SeqMonk.app/Contents/MacOS/seqmonk -m 8000

..which should bypass the automatic memory calibration.

**Neuromancer** · 09-26-2011, 01:11 AM

Great! That works! Thanks a lot!

**simonandrews** · 09-27-2011, 02:11 AM

I've just put out an update to SeqMonk (v0.17.1) which fixes the OSX launcher bug on systems with large amounts of RAM. It also fixes a crash in the HiC plot when using more than 45k probes and adds some more controls to the HiC plot view.

**kshankar** · 10-14-2011, 08:50 AM

I am trying to import a large file with (~ 450 -500 million Illumina single 36 bp reads) into SeqMonk. We have 48 GB of memory on the machine and have assigned 8 GB for Seqmonk. However, after ~ 330 million reads, we inevitably find 99% of memory being used up and the software slowing down considerably. Is there any way to increase the memory any more, perhaps in the latest Java environment. We are using JRE b1.6.0_24 and the latest SeqMonk (v0.17.1). BTW, the software is immensely useful. great work Simon.

**fkrueger** · 10-14-2011, 09:18 AM

The biggest problem of very large datasets is the initial data import since all reads have to be held in memory temporarily until the all reads mapping to displayed chromosomes can be cached onto the disk. Once a file has been cached I don't think that 450M reads would be a considerable problem to deal with (BS-Seq data us much larger than that). So the easiest option would probably be to split the file up into 2-4 smaller chunks, and then import the files individually. Once imported, you can then create a data group in Seqmonk and 'merge' the fileparts into a single dataset (group) again.

The trouble with Java (according to Simon) is that if you allow it to use stupidly high amounts of RAM then it will spend ages trying to clear up the garbage collection etc. while trying to free memory, thereby effectively making everything slower the more memory you give it to play with (I got 16GB of memory on my machine and Simon wouldn't 'allow' me to use more than 8GB either

). Splitting files up should definitely work though.

**simonandrews** · 10-14-2011, 11:02 AM

Originally posted by kshankar View Post

I am trying to import a large file with (~ 450 -500 million Illumina single 36 bp reads) into SeqMonk. We have 48 GB of memory on the machine and have assigned 8 GB for Seqmonk. However, after ~ 330 million reads, we inevitably find 99% of memory being used up and the software slowing down considerably. Is there any way to increase the memory any more, perhaps in the latest Java environment. We are using JRE b1.6.0_24 and the latest SeqMonk (v0.17.1). BTW, the software is immensely useful. great work Simon.

If you have a dataset with that many reads then I'm guessing that you've merged together several runs into a single file. Instead of doing this outside the program the way to do this is to import the files individually and then merge them together within SeqMonk by creating a Data Group. This will be hugely more memory efficient than trying to import everything from one file.

Basically the reason for this is that SeqMonk has an efficient caching mechanism which reduces the amount of data which needs to be held in memory. During normal operation only one chromosome's worth of data is in memory. Whilst loading in data however the program needs to temporarily store all of the data for one dataset in memory so it can sort it and write out the cache files. If all of your data comes in one dataset then it will all end up in memory whilst being loaded. If the data comes in smaller chunks then these can be cached separately which will reduce the overhead. As you've found, with 8GB RAM you'll start getting problems over about 250 million sequences in one data set, but if you split your file into 10 datasets of 50 million sequences each and then imported these you could handle this on a ~2GB machine.

**simonandrews** · 10-14-2011, 11:03 AM

I really should read to the end of a thread before replying. I should have known Felix would have got there before me :-)

**simonandrews** · 11-22-2011, 02:43 AM

I've just put SeqMonk v0.18.0 up onto the project web site. This release greatly improves the tools for HiC analysis which were a little clunky in their initial incarnation. It also adds a specific RNA-Seq analysis pipeline which allows for simple analysis of RNA-Seq data at the level of transcripts rather than exons.

I've also made changes so that people on multi-CPU machines should see a noticeable decrease in data loading time, as well as making numerous other improvements throughout the program.

**kshankar** · 11-23-2011, 01:24 PM

Is there any way for SeqMonk to show the % methylation calls in the .txt file (coming out of BisMark's Methylation_extractor). The calls can be seen in IGV but not in SeqMonk. Any way to input this information?

Topics	Statistics	Last Post
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, 06-02-2026, 12:03 PM	0 responses 19 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 12:03 PM
DNA Methylation Study Reveals How Epigenetic Changes Pass Between Generations by SEQadmin2 Started by SEQadmin2, 06-02-2026, 11:40 AM	0 responses 14 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 11:40 AM
MetaBeeAI Helps Scientists Process Research Literature Faster by SEQadmin2 Started by SEQadmin2, 05-28-2026, 11:40 AM	0 responses 29 views 0 reactions	Last Post by SEQadmin2 05-28-2026, 11:40 AM
Scientists Solve a 25-Year Mystery in RNA Interference by SEQadmin2 Started by SEQadmin2, 05-26-2026, 10:12 AM	0 responses 31 views 0 reactions	Last Post by SEQadmin2 05-26-2026, 10:12 AM

Unconfigured Ad

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News