Announcement

Collapse

Welcome to the New Seqanswers!

Welcome to the new Seqanswers! We'd love your feedback, please post any you have to this topic: New Seqanswers Feedback.
See more
See less

Erange

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Erange

    Dear all,

    I'm trying to get the Ali Mortazavi and Wold lab software package up and running for analyzing mRNA-Seq data. Its been a struggle and wondering if anyone has some tips for getting this going.

    Many thanks in advance.

    John

  • #2
    was struggling with getting it to run recently as well, but i think i finally got it running now. at which part of the installation do you get stuck?

    Comment


    • #3
      Also struggling with ERANGE...

      Not sure if this thread is still going, but i'll give it a shot.

      I'm also trying to get ERANGE (v2.1) off the ground and having some major problems.

      I'm running through the shell on Mac OS X 10.5.6 with Python 2.5.1. I installed all of the necessary prereqs, including Cistematic as per the instructions on the Wold lab site.

      For starters, I've been trying to test ERANGE on the Wold Liver sample dataset. I get stuck right at the beginning:

      python geneMrnacounts.py mouse proj/genome/SAMPLEDATA/mm9Liver1.uniqs.bed mm9Liver1.uniqs.count mm9Liver1.nomatch.bed
      geneMrnacounts.py: version 3.3
      Traceback (most recent call last):
      File "geneMrnacounts.py", line 16, in <module>
      from cistematic.genomes import Genome
      ImportError: No module named cistematic.genomes
      ..$ python
      Python 2.5.1 (r251:54863, Jul 23 2008, 11:00:16)
      [GCC 4.0.1 (Apple Inc. build 5465)] on darwin
      Type "help", "copyright", "credits" or "license" for more information.
      >>>

      It seems to be missing this "cistematic.genomes" module if I try running other scripts as well. I've had some very talented python folks take a look and they're stumped as well.

      Any advice on this? Has anyone run into similar problems?

      Any help would be VERY MUCH appreciated. THANKS!

      Comment


      • #4
        Re: Also struggling with ERANGE...

        Hi,

        The error message is about not finding cistematic (cistematic.caltech.edu), which is a required package to use ERANGE for RNA-seq !

        If you've already downloaded cistematic (and the appropriate genome directories e.g. H_sapiens or M_Musculus) and saved them into a directory such as /my/favorite/dir, then simply set (assuming bash syntax)

        export PYTHONPATH=/my/favorite/dir
        export CISTEMATIC_ROOT=/my/favorite/dir

        I hope this helps!

        Ali

        Comment


        • #5
          Hi ALL,

          I want to do some exercise of chip-seq and chip-chip analysis. For this analysis I want to use WOLD lab's ERANGE. But, I am not finding the way to use it properly. If anybody has tried the ERANGE. before. please advice me the away to start with it.

          Thanks in advance,

          ~Vivek

          Comment


          • #6
            Hi All,
            I am new to chip-seq analysis and ERANGE. I am trying to run findall.py script as
            "python /ERANGEPATH/findall.py PEAK unique.rds unique.region.txt -listPeak -revbackground"
            where unique.rds is input rds file and unique.region.txt is the output region file.
            script is running for very long(20+ hrs) and then exiting with an error as
            ------------------------------------------
            Traceback (most recent call last):
            File "/root/Desktop/genotypic/commoncode/findall.py", line 397, in <module>
            hitDict = mockRDS.getReadsDict(fullChrom=True, chrom=achrom, withWeight=True, doMulti=useMulti, findallOptimize=True)
            NameError: name 'mockRDS' is not defined
            ____________________________________-

            Does anybody have an experience with the similar error ?
            It will be great if anybody can suggest me the way to get rid of this error, and other precaution to run the script without error.

            With Thanks,
            Vivek

            Comment


            • #7
              ERANGE performance, etc...

              I can see how my rather sparse documentation could lead people astray.

              Performance-wise, you need to make sure that you have 3 things under control:
              1) You need to allocate as much cache as possible to your rds file as possible. This is a sqlite parameter that needs to be set once per file, but can be overidden in most of the script. It should be at about 2/3 the max amount of RAM that you want to use. If you have 2-4 Gb, a value of 1 million would be appropriate. You can set it up with the following command:

              python $ERANGEPATH/rdsmetadata.py myfavorite.rds -defaultcache 1000000

              2) You need to make sure that your RDS file is indexed (if findall.py tells you that the file is not indexed, just control-C and fix it). If you forgot to do so when loading the last lane, you can force it with the following command:

              python $ERANGEPATH/rdsmetadata.py myfavorite.rds -index

              you could have combined 1 & 2 in one command, i.e.

              python $ERANGEPATH/rdsmetadata.py myfavorite.rds -index -defaultcache 1000000

              3) sqlite can be *unbearably* slow over NFS. If you cannot store the RDS file on a local drive, then you need to force local caching. For ChIP-seq, it means explicitly giving the "-cache someValue" option to trigger copying to a local temp drive (/tmp by default, but it can be redirected to anywhere pointed to by the environmental variable CISTEMATIC_TEMP). If someValue is below the defaultcache size, it will ignore the value but still copy locally.

              For RNA-seq, you really should do #1, #2, and use the shell script runStandardAnalysisNFS.sh (or at least use the command line arguments in there)

              Fyi, if everything is optimal, findall.py for ChIP-seq should be done within 30 minutes max, and an RNA-seq analysis should take from a couple of hours to overnight, depending on the size of the dataset.

              For vix_z, you should run findall.py without the -revbackground option, since you never did specify a control file, e.g.

              python $ERANGEPATH/findall.py PEAK unique.rds unique.region.txt -listPeak -cache 1000000

              if you have a background (aka "control") library, then you can specify it this way:

              python $ERANGEPATH/findall.py PEAK unique.rds unique.region.txt -control myControl.rds -listPeak -revbackground -cache 1000000

              I hope this helps!

              Ali

              Comment


              • #8
                Hi Alim,

                Thanks a lot for your reply....
                It is really helpful for me to know about the importance of cache in RDS files and the way to handle it.
                Is this the RAM only that makes the findall.py execution very fast ?
                Or we can think of some other way of making it fast.

                I want to know about the procedure used in ERANGE for Chip-seq analysis, Can you suggest some documentation to understand the algorithm used in ERANGE for this purpose ?

                I am new to this field, any reply in this regard will be highly appreciated.

                With lot of thanks!!!!!

                ~Vix

                Comment


                • #9
                  Hi Vix,

                  Honestly, compared to the RNA-seq pipeline, the ChIP-seq pipeline is pretty fast (but maybe I'm not objective about this!)

                  The basis of the original algorithm is still as described in the NRSF ChIP-seq Science paper from 2007. Just look for pileups of reads using a greedy algorithm, and check that they are enriched compared to the same region in the control. It's fundamentally a region caller rather than a "summit"-caller.

                  However, many of the details have changed & are continuing to change. As I come across increasingly more datasets, I've introduced various parameters to filter out false positive peaks. As I finally gave in and started reporting a summit for peaks, I've come across datasets that have shifts so large that they require an explicit shift. Version 3.1 of ERANGE will introduce that any day now.

                  One thing I will say about the ChIP-seq capabilities of ERANGE is that calling regions and summits is a beginning, not an end. The other scripts in the package (which depend on Cistematic) are actually designed to find motifs in the regions, find the genes associated with the regions & do a GO analysis of these genes for example.

                  Ali

                  Comment


                  • #10
                    ERANGE error message

                    Hi all-

                    I've used successfully used ERANGE3.0.1 in the past for some RNA-seq analysis. I'm now running into some problems getting through the RunStandardAnalysis script.

                    My reads are Bowtie-aligned (single-end) and built into an appropriate RDS file. I'm working with the mouse genome.

                    When running RunStandardAnalysis.sh, the first few steps (geneMrnacounts.py, normalizeExpandedExonic.py) go without any problems. However, geneMrnaCountsWeighted.py starts off fine but then starts pouring out errors, as shown below:


                    /proj/genome/commoncode3.0.1/geneMrnaCountsWeighted.py: version 3.7
                    dataset Sample.bowtie.rds
                    metadata:
                    bowtie_mapped True
                    dataType RNA
                    genome mm9
                    rdsVersion 1.1
                    readsize 36

                    9756706 unique reads, 1097314 spliced reads and 3706354 multireads
                    default cache size is 2000000 pages
                    found index

                    1 read 100000 read 200000 read 300000 read 400000 read 500000
                    10 read 600000 read 700000 read 800000 read 900000 read 1000000
                    11 read 1100000 read 1200000 read 1300000 read 1400000 read 1500000 read 1600000 read 1700000 read 1800000 read 1900000
                    12 read 2000000 read 2100000 read 2200000 read 2300000 read 2400000
                    13 read 2500000 read 2600000
                    14 read 2700000 read 2800000 read 2900000
                    Python(3391,0xa02bb720) malloc: *** mmap(size=100663296) failed (error code=12)
                    *** error: can't allocate region
                    *** set a breakpoint in malloc_error_break to debug
                    15 read 3000000 read 3100000 read 3200000 read 3300000 gid 12419 not in gidReadDict
                    Python(3391,0xa02bb720) malloc: *** mmap(size=100663296) failed (error code=12)
                    *** error: can't allocate region
                    *** set a breakpoint in malloc_error_break to debug
                    gid 12419 not in gidReadDict
                    Python(3391,0xa02bb720) malloc: *** mmap(size=100663296) failed (error code=12)
                    *** error: can't allocate region
                    *** set a breakpoint in malloc_error_break to debug
                    gid 12419 not in gidReadDict
                    Python(3391,0xa02bb720) malloc: *** mmap(size=100663296) failed (error code=12)
                    *** error: can't allocate region
                    *** set a breakpoint in malloc_error_break to debug
                    gid 12419 not in gidReadDict
                    Python(3391,0xa02bb720) malloc: *** mmap(size=100663296) failed (error code=12)
                    *** error: can't allocate region
                    *** set a breakpoint in malloc_error_break to debug
                    gid 12419 not in gidReadDict
                    Python(3391,0xa02bb720) malloc: *** mmap(size=100663296) failed (error code=12)
                    *** error: can't allocate region
                    *** set a breakpoint in malloc_error_break to debug
                    gid 12419 not in gidReadDict
                    Python(3391,0xa02bb720) malloc: *** mmap(size=100663296) failed (error code=12)
                    *** error: can't allocate region
                    *** set a breakpoint in malloc_error_break to debug
                    gid 12419 not in gidReadDict


                    These go on for pages and pages, with different values for "gid" as it goes on.

                    Has anyone seen this problem? I can't figure out what I've done differently compared to my past successful runs. Any assistance would be MUCH appreciated.

                    Thanks!

                    Comment


                    • #11
                      Hi,

                      So I would take the error message to heart that malloc (the unix memory allocation call) is failing - which implies that you are running out memory. Could you (or someone else) be using up significant amounts of memory at the same time as you are running ERANGE ? Or are you running this on another machine with less RAM ?

                      By the way, I highly recommend upgrading to ERANGE 3.1 to pick up some of the other bugs I have fixed over the last 3 months! You won't have to rebuild the RDS files or anything.

                      Ali

                      Comment


                      • #12
                        Thanks Ali -

                        I'm running ERANGE locally on a MacBook Pro with 4gb RAM, and not running ANYTHING else during the analysis. I've got the cache set to 2000000 for the RDS file.

                        This has been more than enough memory in the past...though these are more reads than i've tried previously.

                        I'll try the upgrade as well.

                        Thanks!

                        Comment


                        • #13
                          4 Gb RAM should be enough for the amount of reads you have.... especially if you have enough virtual memory (could you also be going low on disk space ?)

                          If it won't work with this much RAM, then simply drop the cache size down to a smaller value (e.g. 1000000), which will free up some extra RAM.

                          Ali

                          Comment


                          • #14
                            Some queries for chip-seq analysis

                            Hi Alim,

                            Thanks for your previous help regarding chip-seq analysis.
                            I have few more queries for the same:
                            Can you tell me how does the RAM requirement varies with number of reads(or data size), while doing Chip-seq analysis ?
                            Does it increases continuously or gets some saturation in terms of RAM, I mean further increase in RAM will not help in processing the reads.
                            How can I view the content of rds files ?
                            Also can you suggest some links to get various sizes of sample data, that can be used for chip-seq analysis ?

                            Looking for your reply.
                            With THANKS!!
                            Vix

                            Comment


                            • #15
                              Hey Ali-

                              Just wanted to say thanks for your help. Reducing the cache size seemed to help get through the analysis without memory errors, despite taking a bit longer.

                              I've got some additional datasets with way more reads that unfortunately can't be handled on the 4gb of RAM. Looks like I'm going to need to find a bigger box.

                              Thanks again for providing so much support.

                              Comment

                              Working...
                              X