Unconfigured Ad

**SES** · 09-03-2012, 06:38 AM

You will need to figure out which field in the first file is the value you want in your wiggle file (I just assumed it was the 5th field below). If you have a file called "Chip-Seq.txt" you could do something like:

Code:

grep "chr11" Chip-Seq.txt | cut -f1-3,5 > Chip-Seq_chr11.wig

That will not create the header, but I would just manually type that in instead of writing a program for this one small job. You'll surely want to the create specific files for each chromosome and the first file is oddly not sorted that way, so I'm unsure what format that is exactly.

**el_Davido** · 09-03-2012, 11:40 AM

Thanks for your answer, SES.

The thing is, it's not just a matter of re-formatting. When I sort the file of interest in terms of chromosome and read-position, it looks like this (Sorry for not mentioning this before):

chr1 3001356 3001381 U0 0 -
chr1 3001356 3001381 U0 0 -
chr1 3001356 3001381 U0 0 -
chr1 3001356 3001381 U0 0 -
chr1 3007329 3007354 U0 0 +
chr1 3013242 3013267 U0 0 -
chr1 3016493 3016518 U0 0 -
chr1 3016493 3016518 U0 0 -
chr1 3016493 3016518 U0 0 -
chr1 3053811 3053836 U0 0 +
chr1 3053930 3053955 U0 0 -
chr1 3053930 3053955 U0 0 -
chr1 3053930 3053955 U0 0 -

So the task is: Realize, that chr1 3001356 to 3001381 is measured 4 times and give an according score etc.

Any advice how to continue?

PS: I got the original file from http://www.ncbi.nlm.nih.gov/geo/quer...i?acc=GSE27158

**SES** · 09-03-2012, 11:56 AM

Thanks for the information. What are you wanting to do with the 4 scores for each position? From this snippet it looks like the scores are the same and you could just remove the duplicate entries in the original file. Is this what you want? If so, you could just add one command to the original (assuming you are on a *nix machine).

Original file:

Code:

$ cat Chip-Seq.txt
chr1	3001356	3001381	U0	0	-
chr1	3001356	3001381	U0	0	-
chr1	3001356	3001381	U0	0	-
chr1	3001356	3001381	U0	0	-
chr1	3007329	3007354	U0	0	+
chr1	3013242	3013267	U0	0	-
chr1	3016493	3016518	U0	0	-
chr1	3016493	3016518	U0	0	-
chr1	3016493	3016518	U0	0	-
chr1	3053811	3053836	U0	0	+
chr1	3053930	3053955	U0	0	-
chr1	3053930	3053955	U0	0	-
chr1	3053930	3053955	U0	0	-

Command to create wiggle file:

Code:

$ uniq Chip-Seq.txt | grep "chr1" | cut -f1-3,5 > Chip-Seq_chr1_uniq.wig
chr1	3001356	3001381	0
chr1	3007329	3007354	0
chr1	3013242	3013267	0
chr1	3016493	3016518	0
chr1	3053811	3053836	0
chr1	3053930	3053955	0

Note that I'm not sure this is what you want, so let us know if there is a different output you had in mind.

**Chipper** · 09-03-2012, 12:53 PM

The data is in BED format with one line per alignment. You can upload it to UCSC as a BED file but to get it in wiggle format you need to use some type of peak finder. There are online tools that can use the SRA URL, I know cistrome.org accepts BED file and can produce wiggle files using the MACS peak finder.

**el_Davido** · 09-03-2012, 01:00 PM

For me as a layperson it seems that the number of reads does correspond to the signal intensity. Is that correct?

The file comes from a paper where they performed chromatin-immunoprecipitation for STAT5, followed by sequencing.

Is it a valid approach to count the number of individual reads?

Code:

chr1 3001356 3001381 U0 0 -
chr1 3001356 3001381 U0 0 -
chr1 3001356 3001381 U0 0 -
chr1 3001356 3001381 U0 0 -
chr1 3007329 3007354 U0 0 +
chr1 3013242 3013267 U0 0 -
chr1 3016493 3016518 U0 0 -
chr1 3016493 3016518 U0 0 -
chr1 3016493 3016518 U0 0 -

Would become (in my view)

Code:

chr1 3001356 3001381 4
chr1 3007329 3007354 1
chr1 3013242 3013267 1
chr1 3016493 3016518 3

Thanks for your support so far, SES!

**SES** · 09-03-2012, 01:04 PM

Originally posted by el_Davido View Post

Is it a valid approach to count the number of individual reads?

Code:

chr1 3001356 3001381 U0 0 -
chr1 3001356 3001381 U0 0 -
chr1 3001356 3001381 U0 0 -
chr1 3001356 3001381 U0 0 -
chr1 3007329 3007354 U0 0 +
chr1 3013242 3013267 U0 0 -
chr1 3016493 3016518 U0 0 -
chr1 3016493 3016518 U0 0 -
chr1 3016493 3016518 U0 0 -

Would become (in my view)

Code:

chr1 3001356 3001381 4
chr1 3007329 3007354 1
chr1 3013242 3013267 1
chr1 3016493 3016518 3

Thanks for your support so far, SES!

You are welcome, but unfortunately, I am not sure this is the right thing to do with this data. You might want to follow up with the suggestion from @Chipper above. Good luck.

**Chipper** · 09-03-2012, 01:21 PM

You want to get the count of all reads surrounding a bindingsite, in your example you will just count all reads with the same alignment position. Email the author and ask him to send the processed data, if it is not available as supplementary to the article.

**SES** · 09-03-2012, 05:57 PM

Originally posted by el_Davido View Post

For me as a layperson it seems that the number of reads does correspond to the signal intensity. Is that correct?

The file comes from a paper where they performed chromatin-immunoprecipitation for STAT5, followed by sequencing.

Is it a valid approach to count the number of individual reads?

Code:

chr1 3001356 3001381 U0 0 -
chr1 3001356 3001381 U0 0 -
chr1 3001356 3001381 U0 0 -
chr1 3001356 3001381 U0 0 -
chr1 3007329 3007354 U0 0 +
chr1 3013242 3013267 U0 0 -
chr1 3016493 3016518 U0 0 -
chr1 3016493 3016518 U0 0 -
chr1 3016493 3016518 U0 0 -

Would become (in my view)

Code:

chr1 3001356 3001381 4
chr1 3007329 3007354 1
chr1 3013242 3013267 1
chr1 3016493 3016518 3

Based on the comment by @Chipper, it sounds like you were spot on with your approach. Here is a small script that can produce this output.

Call this script "bed2wig.pl" (or whatever you want):

Code:

#!/usr/bin/env perl

use strict;
use warnings;
use Getopt::Long;

my $usage = "$0 -i BEDfile -o wiggle_file\n";
my $infile;
my $outfile;

GetOptions(
           'i|infile=s'  => \$infile,
           'o|outfile=s' => \$outfile,
           );

die $usage if !$infile or !$outfile;

open(my $in, '<', $infile) or die "\nERROR: Could not open file: $infile\n";
open(my $out, '>', $outfile) or die "\nERROR: Could not open file: $outfile\n";

my %mapped;

print $out "track type=wiggle_0 name=\"DHS WT naive\" color=0,0,0\n"; # change this line to whatever you need

while (my $line = <$in>) {
    chomp $line;
    my @line = split('\s+', $line);
    my $read = join("|",@line);
    $mapped{$read}++;
}

for my $key (sort keys %mapped) {
    my @pos = split(/\|/, $key);
    print $out join("\t",($pos[0],$pos[1],$pos[2],$mapped{$key})),"\n";
}

close($in);
close($out);

Run the script like this:

Code:

perl bed2wig.pl -i Chip-Seq.txt -o Chip-Seq.wig

Original file (Chip-Seq.txt):

Code:

$ cat Chip-Seq.txt
chr1 3001356 3001381 U0 0 -
chr1 3001356 3001381 U0 0 -
chr1 3001356 3001381 U0 0 -
chr1 3001356 3001381 U0 0 -
chr1 3007329 3007354 U0 0 +
chr1 3013242 3013267 U0 0 -
chr1 3016493 3016518 U0 0 -
chr1 3016493 3016518 U0 0 -
chr1 3016493 3016518 U0 0 -

Output file (Chip-Seq.wig):

Code:

$ cat Chip-Seq.wig 
track type=wiggle_0 name="DHS WT naive" color=0,0,0
chr1	3001356	3001381	4
chr1	3007329	3007354	1
chr1	3013242	3013267	1
chr1	3016493	3016518	3

You'll certainly have to modify the script based on the exact format of your input and the output you need, but it should be a start for you.

**el_Davido** · 09-04-2012, 01:02 PM

Thanks for your ideas.
I'll start with cistrome.org and see how far I get!

**Jim Robinson** · 09-05-2012, 07:15 AM

You can use igvtools to compute coverage from a bed file. I think bedtools can also do this.

**el_Davido** · 09-12-2012, 08:08 AM

My problem is solved, many thanks for your advice.

www.cistrome.org offers a GUI and is just the right thing, if one just wants to convert a couple of BED files.

Topics	Statistics	Last Post
Study Captures the First Moments of DNA Replication by SEQadmin2 Started by SEQadmin2, Yesterday, 12:17 PM	0 responses 14 views 0 reactions	Last Post by SEQadmin2 Yesterday, 12:17 PM
Chemotherapy Leaves Detectable DNA Signatures in Childhood Tumors by SEQadmin2 Started by SEQadmin2, 07-23-2026, 11:41 AM	0 responses 15 views 0 reactions	Last Post by SEQadmin2 07-23-2026, 11:41 AM
Single-Cell Atlases Skew Toward European Ancestry, Analysis Finds by SEQadmin2 Started by SEQadmin2, 07-20-2026, 11:10 AM	0 responses 23 views 0 reactions	Last Post by SEQadmin2 07-20-2026, 11:10 AM
UC San Diego Bioengineers Map Gene Function in Human Stem Cells by SEQadmin2 Started by SEQadmin2, 07-13-2026, 10:26 AM	0 responses 37 views 0 reactions	Last Post by SEQadmin2 07-13-2026, 10:26 AM

Unconfigured Ad

Beginner Question: Generate WIG file

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News