Awk command - SEQanswers

syfo replied

02-28-2013, 06:34 AM
Originally posted by francois.sabot View Post

What about a simple grep ?

grep 'chr1' FILE > chr1.txt
grep 'chr2' FILE > chr2.txt

A more generic grep solution could be something like

Code:

for i in `cut -d" " -f1 input | sort -u`; do grep -w $i input > $i.txt ; done

But the awk alternative is better.
Leave a comment:
syfo replied

02-28-2013, 06:30 AM
OK good, thanks Alex for the precision. Both commands should work then, I don't see any reason for an error either. Maybe try \awk instead of awk in case of some alias or shortcut?

Rkk, let me know if there is any issue with my one-liner for your second task.
Leave a comment:
alexdobin replied

02-28-2013, 06:02 AM
Originally posted by syfo View Post

The other awk command that was posted almost at the same time has a space in the output file name after "$1", it should not change anything but if you got an error try it as quoted here.

The space in $1 ".txt" is perfectly valid and cannot cause any problems. When you concatenate strings in awk, you separate them by spaces in the right-hand side: http://www.gnu.org/software/gawk/man...atenation.html
Leaving space out in this case does not cause a problem, however it is a better practice to have space between concatenated strings. For example, if you concatenate several awk variables, you have to have space between them: v3=v1 v2. Of course, v3=v1v2 will not work.
Leave a comment:
syfo replied

02-28-2013, 02:56 AM
Originally posted by gokhulkrishnakilaru View Post

Code:

awk '{print > $1".txt"}' input

This is the correct and the best answer to the original question of the thread. The other awk command that was posted almost at the same time has a space in the output file name after "$1", it should not change anything but if you got an error try it as quoted here.

As for the second problem, since you already know the resolution you want you don't need to compute min and max. Everything in one step:

Code:

awk '{bin[int($1/100)]+=$2}END{for (i in bin)print i*100+1"-"(i+1)*100,bin[i]}' input

This line should give exactly the output you want. Pipe it on a sort -n if needed and/or change the separator "-".
Leave a comment:
gokhulkrishnakilaru replied

02-27-2013, 04:09 AM
Originally posted by francois.sabot View Post

What about a simple grep ?

grep 'chr1' FILE > chr1.txt
grep 'chr2' FILE > chr2.txt

Francois,

Grep is a handy tool. But, you have to repeat that command for each chromosome in ur first column. And with awk, a simple command when used once, will do the task easily.

After all, it's a life worth counting on the clock. No one wants to sit there typing each chromosome, at least myself.
Leave a comment:
francois.sabot replied

02-27-2013, 02:12 AM
Originally posted by rkk View Post

Hello,

I have a file like the following

chr1 1234
chr1 2345
chr2 94837
chr2 73457

how can I split this data into two files

chr1.txt

chr1 1234
chr1 2345

chr2.txt

chr2 94837
chr2 73457

Thanks in advance.

What about a simple grep ?

grep 'chr1' FILE > chr1.txt
grep 'chr2' FILE > chr2.txt
Leave a comment:

gokhulkrishnakilaru replied

02-26-2013, 06:18 PM

Originally posted by rkk View Post

I should use that command in LINUX...

Now, I have another issue

I have a file like following..I need to bin the first column in 100bp regions and count the second column value for that bin
10175 1
10179 1
10189 1
10191 1
10201 1
10243 1
10249 1
10262 1
10313 1
10414 1
10485 1
10499 1

The output should be something like this..

10101-10200 4
10201-10300 4
10301-10400 1
10401-10500 3

Can someone help with this..

Thanks in advance..

@rkk,

Your input can have two solutions

Code:

[COLOR="DarkOrchid"]Solution 1(Considering your minimum and maximum value from col1:

cat input
10175	1
10179	1
10189	1
10191	1
10201	1
10243	1
10249	1
10262	1
10313	1
10414	1
10485	1
10499	1[/COLOR]

Code:

awk 'NR == 1 {max=$1 ; min=$1} $1 >= max {max = $1} $1 <= min {min = $1} END { print min"\t"max}' 1 | awk '{ print $1, i=$1+100;while(i++<$2) print i, i+=99}' > intermediate

Code:

cat intermediate

10175 10275
10276 10375
10376 10475
10476 10575

Now, consider the above intermediate file and run the following code

Code:

awk 'NR==FNR{
   C[NR]=$1 " " $2
   L[C[NR]]=0
   next
}
{
 for (t in C) {
    split(C[t],v," ")
    if($1>=v[1] && $1<=v[2])
       L[C[t]]+=$2
 }
}
END {
   for(i=1;i in C;i++)
       print C[i] " " L[C[i]]
}' intermediate input > output

Code:

cat output

10175 10275 8
10276 10375 1
10376 10475 1
10476 10575 2

###########################################

Code:

Solution 2 (Considering minimum value and nearest 100 and maximum value and nearest 100 from column1):

cat input
10175	1
10179	1
10189	1
10191	1
10201	1
10243	1
10249	1
10262	1
10313	1
10414	1
10485	1
10499	1

Code:

awk '{       
    min=$1<min||!min?$1:min
    max=$1>max||!max?$1:max
}      
END {
  s=int(min/100)*100
  e=int(max/100)*100+100
  print s " " s+100
  for(i=s+101;i<e;i+=100)
     print i " " i+99
}' input > intermediate

Code:

cat intermediate
10100 10200
10201 10300
10301 10400
10401 10500

Now, consider the above intermediate file and run the following code

Code:

awk 'NR==FNR{
   C[NR]=$1 " " $2
   L[C[NR]]=0
   next
}
{
 for (t in C) {
    split(C[t],v," ")
    if($1>=v[1] && $1<=v[2])
       L[C[t]]+=$2
 }
}
END {
   for(i=1;i in C;i++)
       print C[i] " " L[C[i]]
}' intermediate input > output

Code:

cat output

10100 10200 4
10201 10300 4
10301 10400 1
10401 10500 3

Leave a comment:

rkk replied

02-26-2013, 05:14 PM
Once minimum value is identified.. then nearest 100 should be calculated.. for example in this case min value is 10175 so the bins starting value should be 10100.. hope this helps
Leave a comment:
gokhulkrishnakilaru replied

02-26-2013, 04:07 PM
Originally posted by rkk View Post

command has to identify min and max value from col1 values.. and then bin that into 100bp regions...

I am afraid then your bins would be like this

Code:

10175-10275 8 10276-10375 1 10376-10475 1 10476-10575 2
Leave a comment:
rkk replied

02-26-2013, 04:04 PM
command has to identify min and max value from col1 values.. and then bin that into 100bp regions...
Leave a comment:
gokhulkrishnakilaru replied

02-26-2013, 04:01 PM
Originally posted by rkk View Post

I should use that command in LINUX...

Now, I have another issue

I have a file like following..I need to bin the first column in 100bp regions and count the second column value for that bin
10175 1
10179 1
10189 1
10191 1
10201 1
10243 1
10249 1
10262 1
10313 1
10414 1
10485 1
10499 1

The output should be something like this..

10101-10200 4
10201-10300 4
10301-10400 1
10401-10500 3

Can someone help with this..

Thanks in advance..

Do you already know your bins?

If not, what are your start values and end values to consider bins at 100bp?
Leave a comment:
rkk replied

02-26-2013, 03:27 PM
I should use that command in LINUX...

Now, I have another issue

I have a file like following..I need to bin the first column in 100bp regions and count the second column value for that bin
10175 1
10179 1
10189 1
10191 1
10201 1
10243 1
10249 1
10262 1
10313 1
10414 1
10485 1
10499 1

The output should be something like this..

10101-10200 4
10201-10300 4
10301-10400 1
10401-10500 3

Can someone help with this..

Thanks in advance..
Leave a comment:
gene_x replied

02-26-2013, 03:02 PM
Originally posted by gokhulkrishnakilaru View Post

Code:

awk '{print > $1".txt"}' input

$1 refers to the first column.

for each distinct column1,

Code:

print

to another file

Code:

>

with the same column name

Code:

$1

I can understand print to another file with the same column name. What I don't get is where the separation based on first column contents happened..
Leave a comment:
gokhulkrishnakilaru replied

02-26-2013, 03:00 PM
Originally posted by gene_x View Post

Good to learn a easier way to do this.. can you explain a bit how did it work?

Code:

awk '{print > $1".txt"}' input

$1 refers to the first column.

for each distinct column1,

Code:

print

to another file

Code:

>

with the same column name

Code:

$1
Leave a comment:
gokhulkrishnakilaru replied

02-26-2013, 02:55 PM
Originally posted by rkk View Post

$head -5 test.txt

1 9992
1 9992
1 9993
1 9994
1 9994

$awk '{print > $1 ".txt"}' test.txt

awk: syntax error at source line 1
context is
{print > $1 >>> ".txt" <<<
awk: illegal statement at source line 1

This is what I get for my test.txt file

Where r u running it on?

Are you on linux server or running at your Mac's terminal?

Try using nawk or gawk instead of awk.
Leave a comment:

Previous 1 2 template Next

Exploring the Dynamics of the Tumor Microenvironment

by seqadmin

The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
- Channel: Articles
07-08-2024, 03:19 PM

Topics	Statistics	Last Post
Gene Misexpression in the Healthy Human Population by seqadmin Started by seqadmin, Yesterday, 06:46 AM	0 responses 9 views 0 likes	Last Post by seqadmin Yesterday, 06:46 AM
New Method for Rapid Genetic Diagnosis of Mendelian Disorders by seqadmin Started by seqadmin, 07-24-2024, 11:09 AM	0 responses 24 views 0 likes	Last Post by seqadmin 07-24-2024, 11:09 AM
Advancing Nanopore Technology for Portable Sensing Devices by seqadmin Started by seqadmin, 07-19-2024, 07:20 AM	0 responses 159 views 0 likes	Last Post by seqadmin 07-19-2024, 07:20 AM
New RNA-Based Gene Writing Technology Achieves Precise Gene Integration by seqadmin Started by seqadmin, 07-16-2024, 05:49 AM	0 responses 127 views 0 likes	Last Post by seqadmin 07-16-2024, 05:49 AM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News