Seqanswers Leaderboard Ad

**shyam_la** · 06-13-2012, 04:43 PM

Did that too:

Traceback (most recent call last):
File "<stdin>", line 4, in <module)
IndexError: list index out of range

Its 100% still tab delimited..

**shyam_la** · 06-13-2012, 04:45 PM

I:\Exome\Annotations>C:\Python27\Python
Python 2.7.3 (default, Apr 10 2012, 23:24:47) [MSC v.1500 64 bit (AMD64)] on win
32
Type "help", "copyright", "credits" or "license" for more information.
>>> inf = open("out2.txt")
>>> outf = open("out2mod.txt",'w')
>>> for line in inf:
... fields = line.strip().split()
... if len(fields) > 6:
... keyTuple = (fields[1],fields[2],fields[7],fields[12],fields[13],
fields[16],fields[17])
... if keyTuple not in uniqueValues:
... uniq[keyTuple] = None
... outf.write(line)
... ^Z

Traceback (most recent call last):
File "<stdin>", line 4, in <module>
IndexError: list index out of range

PS: I typed the indentations correctly, even though they aren't showing here..

**Heisman** · 06-13-2012, 08:02 PM

Originally posted by shyam_la View Post

Yes, I have been using excel to view my results. I have only one sample in so far. So, there aren't multiple files to merge.. Just one.

Just one list of mutations. I am experimenting with the different tools and callers to get a pipeline at the moment. Using the Exome manual here for pre processing and MuTect from Broad gave excellent mutation calls. After annotation, the type of mutations expected (UV signature) were found in huge amounts and also some of the genes to be mutated in this type of tumor were found mutated. I think I have a viable pipeline to run things through, once more sequences start coming in..

Anyway, story aside - few lines as you asked..

1 1653142 G A SNP Hom CDK11B.1 CDK11B mRNA NM_033493 NM_033493.ex.18 3 SYNONYMOUS_CODING D/D gaC/gaT 40 1 2310
1 1653142 G A SNP Hom CDK11B.1 CDK11B mRNA NM_033492 NM_033492.ex.18 3 SYNONYMOUS_CODING D/D gaC/gaT 40 1 2337
1 1653142 G A SNP Hom CDK11B.1 CDK11B mRNA NM_033486 NM_033486.ex.18 3 SYNONYMOUS_CODING D/D gaC/gaT 40 1 2343
1 1653142 G A SNP Hom CDK11B.1 CDK11B mRNA NM_033487 NM_033487.ex.16 3 UTR_5_PRIME: 380 bases from TSS

There are columns, A to U in there. If columns, A, B, J, O, P, S, T are the same, like the first three lines in the example above, I want only one line to be retained and the remaining two to be discarded.

Thank you.

PS: Three columns are mostly empty; thats why you see fewer than U columns there..

So if a column is empty it doesn't give a delimiter? If you use ANNOVAR to annotate variants (which I highly recommend), it will create a .csv file that at least will have a comma for an empty column. Without delimiters it's trickier as column "O" in one row may correspond to column "N" in another, for example.

**shyam_la** · 06-13-2012, 08:39 PM

It gives a delimiter - as in excel and notepad display it correctly. But when I copy the four lines and then paste here, the gap vanishes.

But there is no character to represent a null entry, if thats what you mean.

**Heisman** · 06-13-2012, 08:46 PM

Then I'm curious why you don't just use excel? It has a remove duplicates function where you can select what columns it considers.

**shyam_la** · 06-13-2012, 11:00 PM

That is news! I never thought that would be possible.. Will give it a shot tomorrrow.

Thanks.

**Heisman** · 06-13-2012, 11:02 PM

Originally posted by shyam_la View Post

That is news! I never thought that would be possible.. Will give it a shot tomorrrow.

Thanks.

Yeah, in 2007 or 2010 (and maybe earlier versions) you can click on "Data" and then there is a "Remove Duplicates" button.

**shyam_la** · 06-14-2012, 08:57 AM

Yeah, in 2007 and above.. It worked!!

Thank you so much.

**ucpete** · 06-14-2012, 11:33 AM

In Python, and most programming languages, lists are indexed starting at zero. So if you want the first, second, and seventh items, you want list[0], list[1], and list[6]. You're referencing objects that don't exist.

**shyam_la** · 06-14-2012, 09:06 PM

Originally posted by ucpete View Post

In Python, and most programming languages, lists are indexed starting at zero. So if you want the first, second, and seventh items, you want list[0], list[1], and list[6]. You're referencing objects that don't exist.

Actually, I'm referencing the wrong objects, not objects that don't exist. It should still have executed, shouldn't it?? Just with wrong results..

Anyway, when Excel can do it with a couple of mouse clicks, who needs Python?? :P

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 35 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 29 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News