Seqanswers Leaderboard Ad

**rflrob** · 10-18-2012, 12:44 PM

Assuming you want to keep the seq number, it could be done with a moderately simple python script:

Code:

fh = open('file_name')
print fh.readline() # Clear the header
best_lines = {}
for line in fh:
    id, fpkm = line.strip().split()
    fpkm = float(fpkm)  # Turn into a number
    id_base, id_seqnum = id.rsplit('_', 1) # Assume everything before _seq is the same

    if id_base not in best_lines:
        best_lines[id_base] = (fpkm, id_seqnum)
    else:
        if fpkm > best_lines[id_base][0]:
            best_lines[id_base] = (fpkm, id_seqnum)

for id_base in best_lines:
    fpkm, id_seqnum = best_lines[id_base]
    print id_base+"_"+id_seqnum, fpkm

This won't necessarily retain the original order of the file, but will deal with the possibility that, for instance, comp267138_c0_seq1 and comp267138_c0_seq2 aren't in adjacent lines.

**upendra_35** · 10-18-2012, 12:59 PM

Originally posted by rflrob View Post

Assuming you want to keep the seq number, it could be done with a moderately simple python script:

Code:

fh = open('file_name')
print fh.readline() # Clear the header
best_lines = {}
for line in fh:
    id, fpkm = line.strip().split()
    fpkm = float(fpkm)  # Turn into a number
    id_base, id_seqnum = id.rsplit('_', 1) # Assume everything before _seq is the same

    if id_base not in best_lines:
        best_lines[id_base] = (fpkm, id_seqnum)
    else:
        if fpkm > best_lines[id_base][0]:
            best_lines[id_base] = (fpkm, id_seqnum)

for id_base in best_lines:
    fpkm, id_seqnum = best_lines[id_base]
    print id_base+"_"+id_seqnum, fpkm

This won't necessarily retain the original order of the file, but will deal with the possibility that, for instance, comp267138_c0_seq1 and comp267138_c0_seq2 aren't in adjacent lines.

Hi rflrob, it worked perfectly. I have been struggling to write something like this in perl for a while but couldn't get it to work and your script worked like a charm. Don't worry about the order of id's as i am not too worried about them as long as i filter the columns. Thanks a lot again man.

Topics	Statistics	Last Post
Enhanced Neoantigen Detection: Introducing NeoHunter by seqadmin Started by seqadmin, Today, 07:17 AM	0 responses 7 views 0 likes	Last Post by seqadmin Today, 07:17 AM
A Close Examination at Probiotic-Related Bacteremia by seqadmin Started by seqadmin, 05-02-2024, 08:06 AM	0 responses 19 views 0 likes	Last Post by seqadmin 05-02-2024, 08:06 AM
Expanded Genetic Insights into Blood Pressure Regulation by seqadmin Started by seqadmin, 04-30-2024, 12:17 PM	0 responses 20 views 0 likes	Last Post by seqadmin 04-30-2024, 12:17 PM
The Role of Enhancers in Defining Cell Fate by seqadmin Started by seqadmin, 04-29-2024, 10:49 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-29-2024, 10:49 AM

Seqanswers Leaderboard Ad

Announcement

how do i filter rownames based on column value

Comment

Comment

Latest Articles

ad_right_rmr

News