Seqanswers Leaderboard Ad

**florian** · 12-12-2008, 04:49 AM

was struggling with getting it to run recently as well, but i think i finally got it running now. at which part of the installation do you get stuck?

**griffon42** · 02-20-2009, 04:49 PM

Also struggling with ERANGE...

Not sure if this thread is still going, but i'll give it a shot.

I'm also trying to get ERANGE (v2.1) off the ground and having some major problems.

I'm running through the shell on Mac OS X 10.5.6 with Python 2.5.1. I installed all of the necessary prereqs, including Cistematic as per the instructions on the Wold lab site.

For starters, I've been trying to test ERANGE on the Wold Liver sample dataset. I get stuck right at the beginning:

python geneMrnacounts.py mouse proj/genome/SAMPLEDATA/mm9Liver1.uniqs.bed mm9Liver1.uniqs.count mm9Liver1.nomatch.bed
geneMrnacounts.py: version 3.3
Traceback (most recent call last):
File "geneMrnacounts.py", line 16, in <module>
from cistematic.genomes import Genome
ImportError: No module named cistematic.genomes
..$ python
Python 2.5.1 (r251:54863, Jul 23 2008, 11:00:16)
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>

It seems to be missing this "cistematic.genomes" module if I try running other scripts as well. I've had some very talented python folks take a look and they're stumped as well.

Any advice on this? Has anyone run into similar problems?

Any help would be VERY MUCH appreciated. THANKS!

**alim** · 02-23-2009, 01:57 PM

Re: Also struggling with ERANGE...

Hi,

The error message is about not finding cistematic (cistematic.caltech.edu), which is a required package to use ERANGE for RNA-seq !

If you've already downloaded cistematic (and the appropriate genome directories e.g. H_sapiens or M_Musculus) and saved them into a directory such as /my/favorite/dir, then simply set (assuming bash syntax)

export PYTHONPATH=/my/favorite/dir
export CISTEMATIC_ROOT=/my/favorite/dir

I hope this helps!

Ali

**VIX_Z** · 03-17-2009, 01:28 AM

Hi ALL,

I want to do some exercise of chip-seq and chip-chip analysis. For this analysis I want to use WOLD lab's ERANGE. But, I am not finding the way to use it properly. If anybody has tried the ERANGE. before. please advice me the away to start with it.

Thanks in advance,

~Vivek

**VIX_Z** · 03-30-2009, 09:46 PM

Hi All,
I am new to chip-seq analysis and ERANGE

. I am trying to run findall.py script as
"python /ERANGEPATH/findall.py PEAK unique.rds unique.region.txt -listPeak -revbackground"
where unique.rds is input rds file and unique.region.txt is the output region file.
script is running for very long(20+ hrs) and then exiting with an error as
------------------------------------------
Traceback (most recent call last):
File "/root/Desktop/genotypic/commoncode/findall.py", line 397, in <module>
hitDict = mockRDS.getReadsDict(fullChrom=True, chrom=achrom, withWeight=True, doMulti=useMulti, findallOptimize=True)
NameError: name 'mockRDS' is not defined
____________________________________-

Does anybody have an experience with the similar error ?

It will be great if anybody can suggest me the way to get rid of this error, and other precaution to run the script without error.

With Thanks,

Vivek

**alim** · 04-14-2009, 08:18 AM

ERANGE performance, etc...

I can see how my rather sparse documentation could lead people astray.

Performance-wise, you need to make sure that you have 3 things under control:
1) You need to allocate as much cache as possible to your rds file as possible. This is a sqlite parameter that needs to be set once per file, but can be overidden in most of the script. It should be at about 2/3 the max amount of RAM that you want to use. If you have 2-4 Gb, a value of 1 million would be appropriate. You can set it up with the following command:

python $ERANGEPATH/rdsmetadata.py myfavorite.rds -defaultcache 1000000

2) You need to make sure that your RDS file is indexed (if findall.py tells you that the file is not indexed, just control-C and fix it). If you forgot to do so when loading the last lane, you can force it with the following command:

python $ERANGEPATH/rdsmetadata.py myfavorite.rds -index

you could have combined 1 & 2 in one command, i.e.

python $ERANGEPATH/rdsmetadata.py myfavorite.rds -index -defaultcache 1000000

3) sqlite can be *unbearably* slow over NFS. If you cannot store the RDS file on a local drive, then you need to force local caching. For ChIP-seq, it means explicitly giving the "-cache someValue" option to trigger copying to a local temp drive (/tmp by default, but it can be redirected to anywhere pointed to by the environmental variable CISTEMATIC_TEMP). If someValue is below the defaultcache size, it will ignore the value but still copy locally.

For RNA-seq, you really should do #1, #2, and use the shell script runStandardAnalysisNFS.sh (or at least use the command line arguments in there)

Fyi, if everything is optimal, findall.py for ChIP-seq should be done within 30 minutes max, and an RNA-seq analysis should take from a couple of hours to overnight, depending on the size of the dataset.

For vix_z, you should run findall.py without the -revbackground option, since you never did specify a control file, e.g.

python $ERANGEPATH/findall.py PEAK unique.rds unique.region.txt -listPeak -cache 1000000

if you have a background (aka "control") library, then you can specify it this way:

python $ERANGEPATH/findall.py PEAK unique.rds unique.region.txt -control myControl.rds -listPeak -revbackground -cache 1000000

I hope this helps!

Ali

**VIX_Z** · 04-14-2009, 09:09 PM

Hi Alim,

Thanks a lot for your reply....

It is really helpful for me to know about the importance of cache in RDS files and the way to handle it.
Is this the RAM only that makes the findall.py execution very fast ?
Or we can think of some other way of making it fast.

I want to know about the procedure used in ERANGE for Chip-seq analysis, Can you suggest some documentation to understand the algorithm used in ERANGE for this purpose ?

I am new to this field

, any reply in this regard will be highly appreciated.

With lot of thanks!!!!!

~Vix

**alim** · 04-16-2009, 02:36 PM

Hi Vix,

Honestly, compared to the RNA-seq pipeline, the ChIP-seq pipeline is pretty fast (but maybe I'm not objective about this!)

The basis of the original algorithm is still as described in the NRSF ChIP-seq Science paper from 2007. Just look for pileups of reads using a greedy algorithm, and check that they are enriched compared to the same region in the control. It's fundamentally a region caller rather than a "summit"-caller.

However, many of the details have changed & are continuing to change. As I come across increasingly more datasets, I've introduced various parameters to filter out false positive peaks. As I finally gave in and started reporting a summit for peaks, I've come across datasets that have shifts so large that they require an explicit shift. Version 3.1 of ERANGE will introduce that any day now.

One thing I will say about the ChIP-seq capabilities of ERANGE is that calling regions and summits is a beginning, not an end. The other scripts in the package (which depend on Cistematic) are actually designed to find motifs in the regions, find the genes associated with the regions & do a GO analysis of these genes for example.

Ali

**griffon42** · 04-24-2009, 09:24 AM

ERANGE error message

Hi all-

I've used successfully used ERANGE3.0.1 in the past for some RNA-seq analysis. I'm now running into some problems getting through the RunStandardAnalysis script.

My reads are Bowtie-aligned (single-end) and built into an appropriate RDS file. I'm working with the mouse genome.

When running RunStandardAnalysis.sh, the first few steps (geneMrnacounts.py, normalizeExpandedExonic.py) go without any problems. However, geneMrnaCountsWeighted.py starts off fine but then starts pouring out errors, as shown below:

/proj/genome/commoncode3.0.1/geneMrnaCountsWeighted.py: version 3.7
dataset Sample.bowtie.rds
metadata:
bowtie_mapped True
dataType RNA
genome mm9
rdsVersion 1.1
readsize 36

9756706 unique reads, 1097314 spliced reads and 3706354 multireads
default cache size is 2000000 pages
found index

1 read 100000 read 200000 read 300000 read 400000 read 500000
10 read 600000 read 700000 read 800000 read 900000 read 1000000
11 read 1100000 read 1200000 read 1300000 read 1400000 read 1500000 read 1600000 read 1700000 read 1800000 read 1900000
12 read 2000000 read 2100000 read 2200000 read 2300000 read 2400000
13 read 2500000 read 2600000
14 read 2700000 read 2800000 read 2900000
Python(3391,0xa02bb720) malloc: *** mmap(size=100663296) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
15 read 3000000 read 3100000 read 3200000 read 3300000 gid 12419 not in gidReadDict
Python(3391,0xa02bb720) malloc: *** mmap(size=100663296) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
gid 12419 not in gidReadDict
Python(3391,0xa02bb720) malloc: *** mmap(size=100663296) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
gid 12419 not in gidReadDict
Python(3391,0xa02bb720) malloc: *** mmap(size=100663296) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
gid 12419 not in gidReadDict
Python(3391,0xa02bb720) malloc: *** mmap(size=100663296) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
gid 12419 not in gidReadDict
Python(3391,0xa02bb720) malloc: *** mmap(size=100663296) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
gid 12419 not in gidReadDict
Python(3391,0xa02bb720) malloc: *** mmap(size=100663296) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
gid 12419 not in gidReadDict

These go on for pages and pages, with different values for "gid" as it goes on.

Has anyone seen this problem? I can't figure out what I've done differently compared to my past successful runs. Any assistance would be MUCH appreciated.

Thanks!

**alim** · 04-24-2009, 09:36 AM

Hi,

So I would take the error message to heart that malloc (the unix memory allocation call) is failing - which implies that you are running out memory. Could you (or someone else) be using up significant amounts of memory at the same time as you are running ERANGE ? Or are you running this on another machine with less RAM ?

By the way, I highly recommend upgrading to ERANGE 3.1 to pick up some of the other bugs I have fixed over the last 3 months! You won't have to rebuild the RDS files or anything.

Ali

**griffon42** · 04-24-2009, 11:28 AM

Thanks Ali -

I'm running ERANGE locally on a MacBook Pro with 4gb RAM, and not running ANYTHING else during the analysis. I've got the cache set to 2000000 for the RDS file.

This has been more than enough memory in the past...though these are more reads than i've tried previously.

I'll try the upgrade as well.

Thanks!

**alim** · 04-24-2009, 12:36 PM

4 Gb RAM should be enough for the amount of reads you have.... especially if you have enough virtual memory (could you also be going low on disk space ?)

If it won't work with this much RAM, then simply drop the cache size down to a smaller value (e.g. 1000000), which will free up some extra RAM.

Ali

**VIX_Z** · 04-26-2009, 09:24 PM

Some queries for chip-seq analysis

Hi Alim,

Thanks for your previous help regarding chip-seq analysis.
I have few more queries for the same:

Can you tell me how does the RAM requirement varies with number of reads(or data size), while doing Chip-seq analysis ?

Does it increases continuously or gets some saturation in terms of RAM, I mean further increase in RAM will not help in processing the reads.

How can I view the content of rds files ?

Also can you suggest some links to get various sizes of sample data, that can be used for chip-seq analysis ?

Looking for your reply.
With THANKS!!

Vix

**griffon42** · 04-30-2009, 12:11 PM

Hey Ali-

Just wanted to say thanks for your help. Reducing the cache size seemed to help get through the analysis without memory errors, despite taking a bit longer.

I've got some additional datasets with way more reads that unfortunately can't be handled on the 4gb of RAM. Looks like I'm going to need to find a bigger box.

Thanks again for providing so much support.

Topics	Statistics	Last Post
Gene Misexpression in the Healthy Human Population by seqadmin Started by seqadmin, 07-25-2024, 06:46 AM	0 responses 9 views 0 likes	Last Post by seqadmin 07-25-2024, 06:46 AM
New Method for Rapid Genetic Diagnosis of Mendelian Disorders by seqadmin Started by seqadmin, 07-24-2024, 11:09 AM	0 responses 26 views 0 likes	Last Post by seqadmin 07-24-2024, 11:09 AM
Advancing Nanopore Technology for Portable Sensing Devices by seqadmin Started by seqadmin, 07-19-2024, 07:20 AM	0 responses 160 views 0 likes	Last Post by seqadmin 07-19-2024, 07:20 AM
New RNA-Based Gene Writing Technology Achieves Precise Gene Integration by seqadmin Started by seqadmin, 07-16-2024, 05:49 AM	0 responses 127 views 0 likes	Last Post by seqadmin 07-16-2024, 05:49 AM

Seqanswers Leaderboard Ad

Announcement

Erange

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News