Unconfigured Ad

**gene_x** · 02-26-2013, 02:02 PM

Originally posted by rkk View Post

Hello,

I have a file like the following

chr1 1234
chr1 2345
chr2 94837
chr2 73457

how can I split this data into two files

chr1.txt

chr1 1234
chr1 2345

chr2.txt

chr2 94837
chr2 73457

Thanks in advance.

$ awk '$1 =="chr1"' file > file1
$ awk '$1 =="chr2"' file > file2

This in therapy should work..

**gokhulkrishnakilaru** · 02-26-2013, 02:10 PM

Code:

awk '{print > $1".txt"}' input

**alexdobin** · 02-26-2013, 02:11 PM

A more "universal" way to do it:
awk '{print > $1 ".txt"}' Input.file.txt

**gene_x** · 02-26-2013, 02:21 PM

Good to learn a easier way to do this.. can you explain a bit how did it work?

**rkk** · 02-26-2013, 02:43 PM

awk: syntax error at source line 1
context is
{print > $1 >>> ".txt" <<<
awk: illegal statement at source line 1

I am getting the above error...

**gene_x** · 02-26-2013, 02:45 PM

Originally posted by rkk View Post

awk: syntax error at source line 1
context is
{print > $1 >>> ".txt" <<<
awk: illegal statement at source line 1

I am getting the above error...

make sure you pay attention to sigle quote, double quote, brackets etc. It worked for me.

**rkk** · 02-26-2013, 02:48 PM

$head -5 test.txt

1 9992
1 9992
1 9993
1 9994
1 9994

$awk '{print > $1 ".txt"}' test.txt

awk: syntax error at source line 1
context is
{print > $1 >>> ".txt" <<<
awk: illegal statement at source line 1

This is what I get for my test.txt file

**gene_x** · 02-26-2013, 02:51 PM

Originally posted by rkk View Post

$head -5 test.txt

1 9992
1 9992
1 9993
1 9994
1 9994

$awk '{print > $1 ".txt"}' test.txt

awk: syntax error at source line 1
context is
{print > $1 >>> ".txt" <<<
awk: illegal statement at source line 1

This is what I get for my test.txt file

It worked for me.... not sure why it's not working for you.

**gokhulkrishnakilaru** · 02-26-2013, 02:55 PM

Originally posted by rkk View Post

$head -5 test.txt

1 9992
1 9992
1 9993
1 9994
1 9994

$awk '{print > $1 ".txt"}' test.txt

awk: syntax error at source line 1
context is
{print > $1 >>> ".txt" <<<
awk: illegal statement at source line 1

This is what I get for my test.txt file

Where r u running it on?

Are you on linux server or running at your Mac's terminal?

Try using nawk or gawk instead of awk.

**gokhulkrishnakilaru** · 02-26-2013, 03:00 PM

Originally posted by gene_x View Post

Good to learn a easier way to do this.. can you explain a bit how did it work?

Code:

awk '{print > $1".txt"}' input

$1 refers to the first column.

for each distinct column1,

Code:

print

to another file

Code:

with the same column name

Code:

$1

**gene_x** · 02-26-2013, 03:02 PM

Originally posted by gokhulkrishnakilaru View Post

Code:

awk '{print > $1".txt"}' input

$1 refers to the first column.

for each distinct column1,

Code:

print

to another file

Code:

with the same column name

Code:

$1

I can understand print to another file with the same column name. What I don't get is where the separation based on first column contents happened..

**rkk** · 02-26-2013, 03:27 PM

I should use that command in LINUX...

Now, I have another issue

I have a file like following..I need to bin the first column in 100bp regions and count the second column value for that bin
10175 1
10179 1
10189 1
10191 1
10201 1
10243 1
10249 1
10262 1
10313 1
10414 1
10485 1
10499 1

The output should be something like this..

10101-10200 4
10201-10300 4
10301-10400 1
10401-10500 3

Can someone help with this..

Thanks in advance..

**gokhulkrishnakilaru** · 02-26-2013, 04:01 PM

Originally posted by rkk View Post

I should use that command in LINUX...

Now, I have another issue

I have a file like following..I need to bin the first column in 100bp regions and count the second column value for that bin
10175 1
10179 1
10189 1
10191 1
10201 1
10243 1
10249 1
10262 1
10313 1
10414 1
10485 1
10499 1

The output should be something like this..

10101-10200 4
10201-10300 4
10301-10400 1
10401-10500 3

Can someone help with this..

Thanks in advance..

Do you already know your bins?

If not, what are your start values and end values to consider bins at 100bp?

**rkk** · 02-26-2013, 04:04 PM

command has to identify min and max value from col1 values.. and then bin that into 100bp regions...

Topics	Statistics	Last Post
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 13 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 48 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 107 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 125 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM

Unconfigured Ad

Awk command

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News