Seqanswers Leaderboard Ad

**frozenlyse** · 07-07-2010, 08:45 PM

Hi Simon - this looks pretty neat, Im installing it now and pester you with questions!

**frozenlyse** · 07-07-2010, 11:15 PM

First problem I've overcome is some strange incompatibility between parallel python (python-pp version 1.5.7-1) and numpy using the Ubuntu 10.04 repository versions, I solved this by installing version 1.6.0-RC5 of parallel python from here and I am now up and running the included example using using meme, Weeder, MDmodule, gadem

Which version of parallel python are you developing with? It could be a bug specific to my system as it hasnt had a clean install since Ubuntu 8.04

**simonvh** · 07-08-2010, 12:08 AM

Hmm that's strange. I'm using version 1.5.7 of pp in combination with numpy version 1.4.1, and that works fine. Which version of numpy is in the Ubuntu repositories? Are you running Python 2.6?
Was it similar to this bug: http://www.parallelpython.com/compon...9/topic,413.0?

Let me know if using pp 1.6.0 resolves the issue.

**frozenlyse** · 07-08-2010, 12:20 AM

Yeah that link is where I got the idea to install pp 1.6.0 (ubuntu numpy is only version 1.3.0, if I have more troubles I'll try upgrading that next), all using python 2.6

I've run into a few bugs in gimmemotifs that I'm fixing along the way, you should see a pull request on your github soon! (though I'm no python developer)

**frozenlyse** · 07-08-2010, 02:51 AM

Ok I've gotten it to successfully run the included example - what I had to do was remove the Ubuntu versions of numpy (therefore matplotlib), scipy and parallel python and install from source

numpy-1.4.1
scipy-0.8.0rc1
pp-1.5.7 (doesn't work with pp-1.6.0rc5)
matplotlib-0.99.3

Its now running on one of my .bed files output from MACS - I had to remove trim it down to a 3 column bed to get it to work, what does gimmemotifs use the 4th column for?

But so for this looks pretty useful, thanks for releasing it

**simonvh** · 07-08-2010, 03:20 AM

Thanks for finding and fixing some of the bugs

I will have a look at the input format. I should fix it, so that any file in valid BED format is accepted. The fourth column is used to sort the peaks (we usually have the nr of reads in there). This is for the benefit of MDmodule, which actually uses the ranking of the sequences in the motif search. However, if there is no numerical value in the fourth column, it should just be left unused, instead of choking on that input.

**krobison** · 07-08-2010, 06:27 AM

Please add an entry in the software wiki; otherwise you're stuck with what I put there!

**simonvh** · 07-08-2010, 11:09 PM

Ah, yes, that was on my to-do list, it's good to be reminded. Done

**simonvh** · 11-18-2010, 05:51 AM

I just wanted to let you know that GimmeMotifs has been accepted for publication in Bioinformatics:
doi: 10.1093/bioinformatics/btq636.

The installation procedure has been simplified, and packages for Ubuntu, Debian and Fedora are now available. If you need motif prediction for ChIP-seq data, give it a try and let me know what you think: http://www.ncmls.nl/bioinfo/gimmemotifs/.

**krespim** · 05-07-2014, 12:05 AM

Hi Simon,

first of all thank you for the tool. I am now preparing to try it out but since my data is a tad tricky I was wondering if you could give some hints on how to best set-up the run.

The issue is that the peaks are not from ChIP-seq but from DamID-seq. This means that the motif might not not be necessarily located in middle of the peak and the peaks - if one can called them that - can be quite broad (from a 100bp to >5kb). This is for a transcription factor btw.

So the question is, do you have any recommendations when analysing data from this type of experiment (or similar)? At the moment what I am selecting peaks less than 1kb to use as an input.

**simonvh** · 05-12-2014, 11:10 PM

This is indeed trickier than a typical ChIP-seq run, but most likely not impossible. Basically there's two important things here. First is, the fact that the motif is not located in the center of the peak. Most motif programs that are run by GimmeMotifs do not take the location of the motif in the sequence into account. However, by default GimmeMotifs truncates the input sequences to 200 basepairs. This is probably too strict in your case. So I would change the -w parameter to 1000 to use 1kb sequences for searching. Otherwise, even if your input sequences are 1kb, only 200bp would be used as input.
Second is the "peak" size. If you have enough regions smaller than 1kb, I would indeed use these for motif searching. You can later always check the presence of the motif in the larger sequences. Otherwise you can just use all regions as input, as GimmeMotifs will truncate the larger sequences. If there's enough sequences that contain a motif, this should not be that big of a problem.

**krespim** · 05-13-2014, 07:15 AM

Originally posted by simonvh View Post

This is indeed trickier than a typical ChIP-seq run, but most likely not impossible. Basically there's two important things here. First is, the fact that the motif is not located in the center of the peak. Most motif programs that are run by GimmeMotifs do not take the location of the motif in the sequence into account. However, by default GimmeMotifs truncates the input sequences to 200 basepairs. This is probably too strict in your case. So I would change the -w parameter to 1000 to use 1kb sequences for searching. Otherwise, even if your input sequences are 1kb, only 200bp would be used as input.
Second is the "peak" size. If you have enough regions smaller than 1kb, I would indeed use these for motif searching. You can later always check the presence of the motif in the larger sequences. Otherwise you can just use all regions as input, as GimmeMotifs will truncate the larger sequences. If there's enough sequences that contain a motif, this should not be that big of a problem.

Thanks a lot for the suggestions. After posing the question, I selected regions up to 500bpand also up to 1kb (always setting the -w parameter). And got a similar motifs with both which is comforting. The pwmscan.py also came in handy.

Just another couple of things:

1. I looked at the manual, could not find a description of the output of pwmscan.py.

2. The results I have for my best motif look good from my interpretation of the report. Is this correct? Here are the results:
random
enrichment 6.00
p-value 0.00
ROC_AUC 0.703
MNCP 4.116

genomic_matched
enrichment 2.25
p-value 0.00
ROC_AUC 0.695
MNCP 1.808

The p-value=0 is the one that is bugging me.

**dzavallo** · 07-21-2016, 07:21 AM

Dear Simon

We are contacting you as user of your gimmemotif pipeline.
We are trying to use the roc.py and cluster.py scripts with a file (PWMFILE) which is not derived from gimmemotif. Instead the matrix I am trying to run is composed by results I ve got with another predictor scripts. The error message I ve got in trying to run the ROC script is:

comand:
gimme roc -o kentaro_roc.pdf kentaro2_julio2016 nuevalista_junio2016.fasta 10000_random_promoters_1500pb_masked_not_E011.fasta

error:
failed to initialize cache
global name 'make_region' is not defined
Traceback (most recent call last):
File "/tools/anaconda2/bin/gimme", line 469, in <module>
args.func(args)
File "/tools/anaconda2/lib/python2.7/site-packages/gimmemotifs/commands/roc.py", line 40, in roc
for scores in s.best_score(fg_file):
File "/tools/anaconda2/lib/python2.7/site-packages/gimmemotifs/scanner.py", line 270, in best_score
for matches in self.scan(seqs, 1, scan_rc, cutoff=0):
File "/tools/anaconda2/lib/python2.7/site-packages/gimmemotifs/scanner.py", line 355, in scan
for result in it:
File "/tools/anaconda2/lib/python2.7/site-packages/gimmemotifs/scanner.py", line 418, in _scan_sequences
motif_digest = self.checksum[motif_file]
KeyError: 'kentaro2_julio2016.txt'

In a previous version of gimmemotif, I was able to do this, but I noticed that after GM update the input file (PWMFILE) is not recognized. I attached here the mentioned matrix for you to see whether the error could be.

In trying to bypass this trouble, I started from the very beginning running the whole gimmemotif pipeline (including all predictors). However, in the step where I have to give a fasta file with the whole genome sequence to take as background, I failed in indexing the whole tomato genome (my samples are from this species). The error message I ve got in this opportunity is:

comand:
gimme background -i SL_todoscrom.fa -f SL.fa -g 2.3 -n 1

error:
background: error: too few arguments

Thank you very much in advance for your help with this. Your comments and suggestions are more than welcome.

Best wishes,

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 31 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 33 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

GimmeMotifs: a ChIP-seq motif prediction pipeline

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News