Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • simonandrews
    Simon Andrews
    • May 2009
    • 870

    #31
    If you're looking to compare modification enrichment in genomic features then there are a couple of ways to do this.

    You could put probes over your feature of interest and then do an enrichment quantitation and compare either the means or the distributions between your two samples. This would tell you if one sample was more enriched than another on average. The problem with this approach is that you may well see overall differences in enrichment which come from technical effects (how well the ChIP worked) rather than biological. These effects should be global though, so you could, for example, compare enrichment in promoters vs exons.

    Alternatively you could make a simpler comparison by simply counting the number of promoters which showed enrichment and then comparing values between your samples. In many cases a simple quantitation of corrected read counts will show a nice bivalent distribution where you can easily set a threshold to separate the enriched from non-enriched populations. You could then apply this to your two samples and compare the number of promoters which pass the filter. This might not work well if there isn't a clear distinction between enriched and non-enriched in your sample though.

    The probe trend plot probably isn't best suited to this kind of analysis. Its strength is in showing the pattern of enrichment to see if that changes, rather than judging the strength of enrichment which is normally better handled by the conventional quantitation tools. If you do want to use the trend plot to do this then you will need to use the cumulative distribution plot, but beware that (as the docs you quoted state), this is susceptible to bias from extreme outliers since it just sums the counts across all probes and makes no distinction between them in the final plot.

    Comment

    • Neuromancer
      Member
      • Aug 2011
      • 28

      #32
      Thanks for this comprehensive answer!!
      I'll try that and let you know, how/what has worked.

      Many Thanks!

      Comment

      • simonandrews
        Simon Andrews
        • May 2009
        • 870

        #33
        I've just released SeqMonk v0.17.0 onto our project's web site. This is the biggest release we've made for an awfully long time and has lots of improvements and new toys to play with. The biggest changes are:
        • Support for HiC data sets, and a new HiC heatmap view to visualise them
        • New program launchers (now with a proper native windows exe) which will automatically configure optimal memory settings.
        • Support for gzipped data in all import filters
        • A new MA plot view
        • Support for very large annotation sets (millions of features)
        • A z-score transformation option in the quantitation tools
        • An option to match distributions exactly in the quantitaiton options
        • A new statistical filter for pairwise comparison of data stores without the requirement for replicates.


        ..plus many other smaller improvements and general tidying up. I'll hopefully be adding some more videos to our site in the near future to help illustrate the usage of some of the new tools.

        Comment

        • Neuromancer
          Member
          • Aug 2011
          • 28

          #34
          Dear Simon,

          When I want to start seqmonk v0.17.0 on my iMac, it simply does not start. When I looked in the console I saw the following error message:

          9/26/11 10:27:00 AM [0x0-0x1b01b].SeqMonk[971] Could't parse physical memory from the output of top at /Users/Shared/NGS/Programs/SeqMonk/SeqMonk.app/Contents/MacOS/seqmonk line 72.

          However on my MacBook the v0.17.0 works fine...!
          The iMac is a managed workstation (16GB RAM), so I'm not using it with limited read/write access, could that be a problem? Based on the error, I guess it has to do with configuring memory settings by the new automatic launcher...?

          edit:
          When I launched the seqmonk binary that is mentioned in the error message it says the following:

          /Users/Shared/NGS/Programs/SeqMonk/SeqMonk.app/Contents/MacOS/seqmonk ; exit;
          $ /Users/Shared/NGS/Programs/SeqMonk/SeqMonk.app/Contents/MacOS/seqmonk ; exit;
          Memory ceiling is 8192
          Could't parse physical memory from the output of top at /Users/Shared/NGS/Programs/SeqMonk/SeqMonk.app/Contents/MacOS/seqmonk line 72.
          Last edited by Neuromancer; 09-26-2011, 12:39 AM.

          Comment

          • simonandrews
            Simon Andrews
            • May 2009
            • 870

            #35
            Sorry to hear this failed. Can you please try running the following command in a terminal and let me know what output you get:

            top -l 1 -n 0

            I thought top was always available on a mac, which may not be true, or it might be that the formatting is substantially different on some systems.

            Comment

            • Neuromancer
              Member
              • Aug 2011
              • 28

              #36
              Originally posted by simonandrews View Post
              Sorry to hear this failed. Can you please try running the following command in a terminal and let me know what output you get:

              top -l 1 -n 0

              I thought top was always available on a mac, which may not be true, or it might be that the formatting is substantially different on some systems.
              bash-3.2$ top -l 1 -n 0
              Processes: 54 total, 2 running, 52 sleeping, 260 threads
              2011/09/26 10:57:39
              Load Avg: 0.11, 0.07, 0.06
              CPU usage: 0.0% user, 25.0% sys, 75.0% idle
              SharedLibs: 4944K resident, 12M data, 0B linkedit.
              MemRegions: 6256 total, 543M resident, 12M private, 291M shared.
              PhysMem: 599M wired, 664M active, 822M inactive, 2085M used, 14G free.
              VM: 126G vsize, 1041M framework vsize, 46601(0) pageins, 0(0) pageouts.
              Networks: packets: 19761/13M in, 11216/1895K out.
              Disks: 40572/1282M read, 25716/882M written.



              edit:
              runs on SnowLeopard, if that is of any help!

              Comment

              • simonandrews
                Simon Andrews
                • May 2009
                • 870

                #37
                Ah OK. When you have that much memory some of the values are reported in Gb rather than Mb so the parser fails to recognise the memory settings.

                It should be an easy fix. I'll put out an updated version which fixes this.

                In the mean time I think you can work round it by running:

                /Users/Shared/NGS/Programs/SeqMonk/SeqMonk.app/Contents/MacOS/seqmonk -m 8000

                ..which should bypass the automatic memory calibration.

                Comment

                • Neuromancer
                  Member
                  • Aug 2011
                  • 28

                  #38
                  Great! That works! Thanks a lot!

                  Comment

                  • simonandrews
                    Simon Andrews
                    • May 2009
                    • 870

                    #39
                    I've just put out an update to SeqMonk (v0.17.1) which fixes the OSX launcher bug on systems with large amounts of RAM. It also fixes a crash in the HiC plot when using more than 45k probes and adds some more controls to the HiC plot view.

                    Comment

                    • kshankar
                      Member
                      • Jul 2010
                      • 12

                      #40
                      I am trying to import a large file with (~ 450 -500 million Illumina single 36 bp reads) into SeqMonk. We have 48 GB of memory on the machine and have assigned 8 GB for Seqmonk. However, after ~ 330 million reads, we inevitably find 99% of memory being used up and the software slowing down considerably. Is there any way to increase the memory any more, perhaps in the latest Java environment. We are using JRE b1.6.0_24 and the latest SeqMonk (v0.17.1). BTW, the software is immensely useful. great work Simon.

                      Comment

                      • fkrueger
                        Senior Member
                        • Sep 2009
                        • 627

                        #41
                        The biggest problem of very large datasets is the initial data import since all reads have to be held in memory temporarily until the all reads mapping to displayed chromosomes can be cached onto the disk. Once a file has been cached I don't think that 450M reads would be a considerable problem to deal with (BS-Seq data us much larger than that). So the easiest option would probably be to split the file up into 2-4 smaller chunks, and then import the files individually. Once imported, you can then create a data group in Seqmonk and 'merge' the fileparts into a single dataset (group) again.

                        The trouble with Java (according to Simon) is that if you allow it to use stupidly high amounts of RAM then it will spend ages trying to clear up the garbage collection etc. while trying to free memory, thereby effectively making everything slower the more memory you give it to play with (I got 16GB of memory on my machine and Simon wouldn't 'allow' me to use more than 8GB either). Splitting files up should definitely work though.

                        Comment

                        • simonandrews
                          Simon Andrews
                          • May 2009
                          • 870

                          #42
                          Originally posted by kshankar View Post
                          I am trying to import a large file with (~ 450 -500 million Illumina single 36 bp reads) into SeqMonk. We have 48 GB of memory on the machine and have assigned 8 GB for Seqmonk. However, after ~ 330 million reads, we inevitably find 99% of memory being used up and the software slowing down considerably. Is there any way to increase the memory any more, perhaps in the latest Java environment. We are using JRE b1.6.0_24 and the latest SeqMonk (v0.17.1). BTW, the software is immensely useful. great work Simon.
                          If you have a dataset with that many reads then I'm guessing that you've merged together several runs into a single file. Instead of doing this outside the program the way to do this is to import the files individually and then merge them together within SeqMonk by creating a Data Group. This will be hugely more memory efficient than trying to import everything from one file.

                          Basically the reason for this is that SeqMonk has an efficient caching mechanism which reduces the amount of data which needs to be held in memory. During normal operation only one chromosome's worth of data is in memory. Whilst loading in data however the program needs to temporarily store all of the data for one dataset in memory so it can sort it and write out the cache files. If all of your data comes in one dataset then it will all end up in memory whilst being loaded. If the data comes in smaller chunks then these can be cached separately which will reduce the overhead. As you've found, with 8GB RAM you'll start getting problems over about 250 million sequences in one data set, but if you split your file into 10 datasets of 50 million sequences each and then imported these you could handle this on a ~2GB machine.

                          Comment

                          • simonandrews
                            Simon Andrews
                            • May 2009
                            • 870

                            #43
                            I really should read to the end of a thread before replying. I should have known Felix would have got there before me :-)

                            Comment

                            • simonandrews
                              Simon Andrews
                              • May 2009
                              • 870

                              #44
                              I've just put SeqMonk v0.18.0 up onto the project web site. This release greatly improves the tools for HiC analysis which were a little clunky in their initial incarnation. It also adds a specific RNA-Seq analysis pipeline which allows for simple analysis of RNA-Seq data at the level of transcripts rather than exons.

                              I've also made changes so that people on multi-CPU machines should see a noticeable decrease in data loading time, as well as making numerous other improvements throughout the program.

                              Comment

                              • kshankar
                                Member
                                • Jul 2010
                                • 12

                                #45
                                Is there any way for SeqMonk to show the % methylation calls in the .txt file (coming out of BisMark's Methylation_extractor). The calls can be seen in IGV but not in SeqMonk. Any way to input this information?

                                Comment

                                Latest Articles

                                Collapse

                                • SEQadmin2
                                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                  by SEQadmin2


                                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                  ...
                                  06-02-2026, 10:05 AM
                                • SEQadmin2
                                  Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                  by SEQadmin2


                                  With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                  Introduction

                                  Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                  05-22-2026, 06:42 AM
                                • SEQadmin2
                                  Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                  by SEQadmin2

                                  Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                  Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                  05-06-2026, 09:04 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by SEQadmin2, 06-02-2026, 12:03 PM
                                0 responses
                                19 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-02-2026, 11:40 AM
                                0 responses
                                14 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 05-28-2026, 11:40 AM
                                0 responses
                                29 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 05-26-2026, 10:12 AM
                                0 responses
                                31 views
                                0 reactions
                                Last Post SEQadmin2  
                                Working...