Seqanswers Leaderboard Ad

**apfejes** · 04-05-2009, 04:23 PM

Hi Kevin,

That does look like an eland Extended file: that row looks like it mapped to chr10, position 103778, on the reverse strand. FindPeaks can convert these into bed files or wig files, which can be viewed using the UCSC browser. I think MACS can as well.

Anthony

**kevinlu** · 04-07-2009, 09:53 AM

Anthony,
Thanks. I've been trying to use FindPeaks3.3.1.1, starting off with the 22.test.eland file and instructions you gave in the 3.2.2 manual (only online documentation I could find...a bit out of date) to run it through and display on the genome browser. Unfortunately, when I load the wig output the UCSC website keeps on giving me this error message:

"Error File '22test_triangle_standard.wig' - track load error (track name='ct_22testduplicatesstandardlentriangle'):
Couldn't find size of chromosome 22 (note: chrom names are case sensitive)"

I went in and appended the wig file from "chrom=22" to "chrom=chr22" thinking it would help, but it didn't do anything. So frustrated.

**apfejes** · 04-07-2009, 10:08 AM

Hi Kevin,

First of all, I should let you know that the whole 3.3.x line is the "unstable" line towards version 4.0. I recommend getting the 3.3.1.8 version, which as a LOT of bugs fixed, compared to 3.3.1.1, which I took off the FindPeaks web page a LONG time ago.

I strongly recommend running a more current version. You can get them here:

https://sourceforge.net/project/platformdownload.php?group_id=232586

If you'd like to be notified of new releasese, I do announce it to the mailing list, (https://sourceforge.net/mail/?group_id=232586), and you can subscribe at (https://lists.sourceforge.net/lists/...ortr-findpeaks)

To solve the problemyou're seeing above, you'll probably want to use the flag "-prepend chr". The problem you're seeing is that each fixedStep line has the name of the chromosome in it (which is the wig file standard), so you'd have to change all of the "fixedStep" lines through-out the file. Hence the -prepend option that does it for you.

I'll add that to the manual to make it clear that it's required in the test example.

Let me know if you run into any other problems, though. I really do try to keep on top of problems people find with the code - and I'm always happy to see it improve.

Anthony

**apfejes** · 04-07-2009, 10:14 AM

I should also add that the documentation is online in a wiki for 3.3/4.0:

http://vancouvershortr.wiki.sourceforge.net/FindPeaks4

You can also find it by googling FindPeaks4.

**kevinlu** · 04-07-2009, 11:30 AM

Worked like a charm. Thank you.

I have another data set that when run through eland (unfortunately) left unaligned reads in the file. You have outlined a quick way to get rid of them if using Linux/Unix, but we don't have any of those machines in our lab. Do you know of another simple way to do this?

**apfejes** · 04-07-2009, 11:44 AM

We aim to please. (-;

As for removing the reads in a non linux/unix system, I'm a little stumped. (I haven't really used windows since ~2001.) I'm sure you could build an environment or get a linux/unix emulator going, although that seems a bit excessive.

If you have access to a Mac, the instructions should work the same way.

Although, personally, I'd just be tempted to download a liveCD for Ubuntu or another distribution and just use that to access and process the data. For the cost of burning a CD and the bandwidth, you'd probably get the biggest bang for your buck. Unfortunately, the method for doing this is pretty easy, but you'd probably be best off if there's someone nearby to help with getting it set up, since things work a little bit differently under linux than in windows. It's not hard, but different, so this might not be an ideal solution either.

I've asked a couple of people in the lab if there's any way to do this in windows, and none of them seem to know off hand. There seem to be rumours of free grep (qgrep?) programs available, though.

**Chipper** · 04-08-2009, 11:11 AM

Anthony, why not just include a filter on U(012) in the preprocessing, or better yet to allow direct use of .export files? Would probably increas runtime sligtly but it is plenty fast anyway.

**apfejes** · 04-08-2009, 11:34 AM

Hi Chipper,

Actually, FindPeaks does already support the export file, under the anachronistic name of "elandextended". I suppose I should probably just do a complete rename on that, at this point.

I'm now up to about 25kloc, so occasionally I forget to go back and change strings unless someone reminds me. (-;

As for providing the filtering, I could do that in the SortFiles.jar. I guess I had just assumed that anyone doing bioinformatics would have access to a linux live CD or linux box these days. Bad assumption on my part! I'll make these changes when I get a chance, and hopefully include them in the next tag.

**Chipper** · 04-08-2009, 12:16 PM

Originally posted by apfejes View Post

Hi Chipper,

As for providing the filtering, I could do that in the SortFiles.jar. I guess I had just assumed that anyone doing bioinformatics would have access to a linux live CD or linux box these days. Bad assumption on my part! I'll make these changes when I get a chance, and hopefully include them in the next tag.

Probably correct assumption, it's just that a lot of non-bioinformaticians want to do ChIP-seq...

Kevin, if your PC has perl installed it can be fixed with a few lines, if not, install it and try to learn the basics and your (sequencing) life will be easier. As long as you don't ask Anthony for advice on it

**apfejes** · 04-08-2009, 12:26 PM

(=

Or you could install python... but you probably still don't want to ask for my advice. I've only ever done a few simple scripts - like greping and sorting files with it. (-;

Say, how about this script?

Code:

import os, sys, re

readfile = file('c:\input\filename.eland', "r")
writefile = file('c:\filtered_file.eland', "w")

Unique = re.compile (r"U[012]", re.VERBOSE)

for line in readfile:
	if Unique.match(line):
		writefile.write(line)
	else:
		pass
readfile.close()
writefile.close()

I should mention that I haven't actually tested this script out... use at your own risk.

**vschulz** · 04-13-2009, 07:21 AM

An easy way to get some linux functionality for windows is to use UnxUtils, see

Native Win32 ports of some GNU utilities

http://unxutils.sourceforge.net/

This is easy to install and has very low overhead. Basically, you can run unix commands (like grep “U[012]” Input.eland > Input.um.eland) in the dos command window. You could also use cygwin, but that has more overhead.

Vince

**kevinlu** · 04-13-2009, 07:35 PM

Anthony, thanks for the script. It's been edited a bit and works smoothly.
The new script is below...with the spaces on the lines all messed up.

#!/usr/bin/python
import os, sys, re

files = ('F:\\path\\to\\files')
regex = re.compile (r"[GTAC]\tU[012]", re.VERBOSE)

for filepath in files:
rfobj = file(filepath, 'r')
wfobj = file(("%s_out.txt" % filepath.split('.')[0]), 'w')
for l in rfobj:
if regex.search(l): wfobj.write(l)
rfobj.close()
wfobj.close()

You can grep multiple files at once if desired. Just separate their paths using a comma.

**apfejes** · 04-13-2009, 08:26 PM

Hi Kevin,

Thanks - that's much cleaner than what I'd done.. As I said, I really haven't done much in python before. That's a great resource for anyone else who's looking to do filtering on eland files.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 25 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 24 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Displaying ChIP-seq data

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News