Seqanswers Leaderboard Ad

captainentropy · 11-02-2010, 03:29 PM

mudshark, if you include the "-advanced" option in the QuEST command you will have the option to configure all of the parameters such as bandwidth, region size, mappable genome fraction, ChIP enrichment, peak shift, etc. You can also control the peak collapsing parameters too.

I recommend using the advanced option and at a minimum change the mappable genome fraction to something more accurate. The fraction is a function of the read length and percentage of mappable sequence (i.e. the non-repetitive sequences) for your genome. The longer the read length the larger the mappable fraction. The default in QuEST is 0.75 which (for hg18) corresponds to a readlength of 26nt. We are getting read lengths of 38nt and longer which increases the fraction to 0.82 and up. Using this number has resulted in a noticeable increase in number of peaks called.

mudshark · 11-02-2010, 04:59 AM

Originally posted by captainentropy View Post

@mudshark. You've tried quite a few programs. More than I have. Since I use QuEST the most could you tell me why you think it doesn't perform well? Did you use default or advanced parameter settings? This paper found most peak-calling programs to have high agreement between high-value peaks and qPCR verification data http://www.plosone.org/article/info:...l.pone.0011471

based on my prior-knowledge system i can do a very good performance estimation of the tools. basically i know all the binding sites genome-wide without having to do ChIP mappings.

of course, I am working in Drosophila and QuEST e.g. has been 'optimized' for mouse/human whatever that means. but in essence, QuEST has a very low sensitivity and given the low sensitivity a rather bad specificity COMPARED to other tools such as SICER, spp, and MACS.

my experience.. other people might have a different one.

(and of course i find QuEST very poorly documented - what are the advanced parameters?)

mudshark · 11-02-2010, 04:29 AM

Originally posted by Chema76 View Post

Hi,
In my hands PICS works, but you have to install the GSL library http://wiki.rglab.org/index.php?titl..._library_and_R

i have gsl installed but a) PICS takes ages to run (using snowfall, 8 cpus; mutlicore crashes) b) does have a very very poor performance c) is very poorly documented.

i might try it again one day, but at present.. no

Chema76 · 11-02-2010, 02:36 AM

Thanks for using CSAR.

CSAR rely in permutations to obtain the FDR thresholds, therefore to obtain good results you should obtain a high number of permutated values (several millions).

CSAR has two different normalization procedures, one of them is to normalized the dataset depending the number of reads sequenced. This is set with the parameter “norm”. The second one is to normalize the coverage distribution between sample and control.

You can disable the first normalization step setting norm=-1, Anyway, if you want to scale your data depending of the number of sequenced reads, it is advised to set the value of norm>number of sequenced reads for the control * w (DNA fragmentation length)

CSAR reports the width of the regions considered bound, so users can easily filter out the regions that are too short if they wanted to

mudshark · 11-02-2010, 02:13 AM

Originally posted by Chema76 View Post

Hi,
In my hands PICS works, but you have to install the GSL library http://wiki.rglab.org/index.php?titl..._library_and_R

We have developed an R package CSAR ( http://www.bioconductor.org/packages...html/CSAR.html )

It is described in
http://www.nature.com/nprot/journal/....2009.244.html

i just tried CSAR. works out of the box (on Drosophila), that's great. i will keep it in my toolbox and do several tests. at first sight it seems a little bit too sensitive (high false positive rate). maybe one should try different ways of defining thresholds.
thanks for the tip!

(sorry i have to mention that i have a system that allows me to identify the 'true' false positives - not just the numerical ones)

edit: you allow very short regions (1bp) to be considered bound. based on the experimental procedure this is a bit counter-intuitive as most of the DNA fragments ChIPped are anyway 200-300 bp. in fact if i filter the CSAR results just by width the overall result already looks much better.

captainentropy · 11-01-2010, 05:53 PM

@mudshark. You've tried quite a few programs. More than I have. Since I use QuEST the most could you tell me why you think it doesn't perform well? Did you use default or advanced parameter settings? This paper found most peak-calling programs to have high agreement between high-value peaks and qPCR verification data http://www.plosone.org/article/info:...l.pone.0011471

The first one I tried two years ago was CisGenome, mainly because it installed on windows and I had a real powerful workstation. Never got it to work. I built another computer and installed linux (Mint) on it and tried FindPeaks and QuEST. Too many problems with FindPeaks just to get it running with a training set. QuEST I had working almost immediately though. I'm quite computer savvy. I was an engineering major in college until I switched to biochem, so I'm comfortable with unix/linux, at least moreso than most people based on my experience, and these programs and their documentation are often confusing.

@ETHANol. I've tried other programs for the very reasons you cite - some being better that others. The problem I have is figuring out under what metrics one program is better than another. The two publications I've read that compare multiple programs basically conclude they are all pretty similar in most regards but can perform better with some datasets than others (here's the other http://www.biomedcentral.com/1471-2164/10/618). But of course that is based on TFs with defined binding motifs. What of histone PTMs, TFs with non-canonical binding, other DNA binding factors with no defined binding sequence or simply factors that don't actually touch DNA but are part of a complex? Finding a program that works "best" smells a little like doctor shopping. Decide on a result you want and find a method/settings to validate that preconceived notion.

Those are not questions directed to any one person in particular just general questions.

With my protein of interest it doesn't have a defined sequence of binding but a sequence context. Not all the peaks found with ChIP-seq match our predictions nor do the bona fide ones always show up in the results. Frustrating. Hahaha, wouldn't be science if it wasn't

ETHANol · 11-01-2010, 06:26 AM

Yes, it's a pity the documentation for most application is so poor. For most programs it seems like the documentation took less then an afternoon to put together. I would be helpful if every app had a 'user group'/forum where you could ask or at least search for issues. That being said, I got USeq to work and I'm not a computer guy at all. MACS I have a problem with. It returns my files with ".fa" appended on the chromosome which make the results incompatible with any other down stream program which is a pain. In general you need to learn some basic UNIX commands which should take very little time with proper instruction or a little more without (as in my case). I think some fluency in R would help out a lot more.

To put it in perspective for biologists, it's like some guy follows a protocol he finds in the materials and methods section of a paper and it doesn't work for his experiment. There may be variables in your certain experiment that need to be fine tuned. So you have to have a basic understanding of what is going on. Same with the available ChIP seq apps.

One thing to keep in mind is some peak callers work better with some data sets then others. So it's best to remain flexible and willing to use the tool that works best.

Chema76 · 11-01-2010, 06:10 AM

Hi,
In my hands PICS works, but you have to install the GSL library http://wiki.rglab.org/index.php?titl..._library_and_R

We have developed an R package CSAR ( http://www.bioconductor.org/packages...html/CSAR.html )

It is described in

303 See Other

http://www.nature.com/nprot/journal/v5/n3/abs/nprot.2009.244.html

mudshark · 11-01-2010, 05:53 AM

hi captainentropy.

I agree with much of what you say. Many tools are poorly documented, but most importantly many tools just do not perform well (some of them even not at all) if one uses other datasets, model organisms etc. than the ones the authors used.

my experience: FSeq, PICS do not work. CCAT, cisgenome, QuEST, USeq do not perform well. MACS, spp, SISSR and SICER seem best.

of course, all depends on the dataset. I e.g. see that the total read number changes a lot of things: spp performance (false positives) seems to drop dramatically with increasing mapped reads. SISSR is robust. etc..

in the end we should mainly care about the validity of the result. frequently the tools are abused as black box systems: "i don't know what's going on, but at least i have peaks and a p-value". in this case i rather recommend not to touch any tool. please! that's just spoiling science.

captainentropy · 10-29-2010, 10:53 AM

My $0.02. USeq is a mess. Poor documentation of the commands and options. NOT for a beginner. Galaxy - I finally got peak-calling to work on it. Terrible documentation. The screencasts, while probably helpful to some, do not, IMO, present a cogent procedure one would want to have if starting as a beginner and you have some Illumina sequence data in your hand be it unmapped sequence or your sequencing mapped with bowtie or eland. I haven't tried MACS as a standalone program but I'm sure I could have got it to work faster than it took to figure out Galaxy.

I first tried Findpeaks a long time ago but being java-based I had all sorts of problems and just gave up (all of this is on Linux). Maybe the user community is better now, I don't know. CisGenome - tried it, couldn't get it to work properly either. IMO, if you want beginners to use your programs, make a detailed protocol for its use. Assume we're third graders, and we don't speak the language very well. If you don't want beginners (i.e. people without advanced degrees in Computer Science or Bioinformatics) to use your software then don't change a thing.

IMO, the easiest to use so far has been QuEST. While I wish the "tutorial" on the QuEST website (Sidow lab at Stanford) was better at explaining all the details of the different output files it was MUCH easier to get it installed and performing peak-calling than all the others.

Like a lot of us who are, or were, beginners we are probably starting with ChIP-seq data, i.e. ELAND/Bowtie aligned files.

We want, no, need, easy to use tools to analyze the data and get us on the path of discovery and understanding. There is no reason we should have to get degrees in Computer Science to do this analysis.

mkatari · 09-28-2010, 09:58 AM

Thanks, that worked. It complained that it wasn't a standard name but it did it anyway.

ETHANol · 09-28-2010, 09:18 AM

I don't think it matters. I think you can use your favorite nomenclature but I could be wrong. It will tell you that you entered an something like an non-standard genome but I don't think it will stop the program from processing your data.

Maybe it is important for viewing on IGB, so I'd use what IGB uses. TAIR8???

mkatari · 09-28-2010, 09:09 AM

Originally posted by ETHANol View Post

Your reads need to be mapped before you put them into USeq.

Yes, I have already done that using Bowtie. But still when I run Chip-seq program it asks me for a reference genome version (it is option -v). UCSC doesn't have arabidopsis so i'm not sure what to put here.

-v Genome version (e.g. H_sapiens_Feb_2009, M_musculus_Jul_2007), see UCSC FAQ,
http://genome.ucsc.edu/FAQ/FAQreleases.

Thanks

ETHANol · 09-28-2010, 05:28 AM

Your reads need to be mapped before you put them into USeq.

mkatari · 09-27-2010, 08:16 PM

I would like to give Useq a try as well but I am having trouble finding versioned Arabidopsis database to use. Can someone please point me to the right direction.
Thanks

Topics	Statistics	Last Post
Bacterial Timeline Study Suggests Oxygen Use Preceded Photosynthesis by seqadmin Started by seqadmin, Yesterday, 12:59 PM	0 responses 7 views 0 reactions	Last Post by seqadmin Yesterday, 12:59 PM
New Software Simplifies 3D Gene Expression Mapping by seqadmin Started by seqadmin, 04-02-2025, 10:17 AM	0 responses 9 views 0 reactions	Last Post by seqadmin 04-02-2025, 10:17 AM
AI Tool Creates High-Resolution 3D Maps of the Mouse Brain by seqadmin Started by seqadmin, 03-20-2025, 05:03 AM	0 responses 49 views 0 reactions	Last Post by seqadmin 03-20-2025, 05:03 AM
Studying Microbial Gene Transfer with RNA Barcoding by seqadmin Started by seqadmin, 03-19-2025, 07:27 AM	0 responses 60 views 0 reactions	Last Post by seqadmin 03-19-2025, 07:27 AM

Seqanswers Leaderboard Ad

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News