Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Did that too:

    Traceback (most recent call last):
    File "<stdin>", line 4, in <module)
    IndexError: list index out of range

    Its 100% still tab delimited..

    Comment


    • #17
      I:\Exome\Annotations>C:\Python27\Python
      Python 2.7.3 (default, Apr 10 2012, 23:24:47) [MSC v.1500 64 bit (AMD64)] on win
      32
      Type "help", "copyright", "credits" or "license" for more information.
      >>> inf = open("out2.txt")
      >>> outf = open("out2mod.txt",'w')
      >>> for line in inf:
      ... fields = line.strip().split()
      ... if len(fields) > 6:
      ... keyTuple = (fields[1],fields[2],fields[7],fields[12],fields[13],
      fields[16],fields[17])
      ... if keyTuple not in uniqueValues:
      ... uniq[keyTuple] = None
      ... outf.write(line)
      ... ^Z

      Traceback (most recent call last):
      File "<stdin>", line 4, in <module>
      IndexError: list index out of range

      PS: I typed the indentations correctly, even though they aren't showing here..

      Comment


      • #18
        Originally posted by shyam_la View Post
        Yes, I have been using excel to view my results. I have only one sample in so far. So, there aren't multiple files to merge.. Just one.

        Just one list of mutations. I am experimenting with the different tools and callers to get a pipeline at the moment. Using the Exome manual here for pre processing and MuTect from Broad gave excellent mutation calls. After annotation, the type of mutations expected (UV signature) were found in huge amounts and also some of the genes to be mutated in this type of tumor were found mutated. I think I have a viable pipeline to run things through, once more sequences start coming in..

        Anyway, story aside - few lines as you asked..

        1 1653142 G A SNP Hom CDK11B.1 CDK11B mRNA NM_033493 NM_033493.ex.18 3 SYNONYMOUS_CODING D/D gaC/gaT 40 1 2310
        1 1653142 G A SNP Hom CDK11B.1 CDK11B mRNA NM_033492 NM_033492.ex.18 3 SYNONYMOUS_CODING D/D gaC/gaT 40 1 2337
        1 1653142 G A SNP Hom CDK11B.1 CDK11B mRNA NM_033486 NM_033486.ex.18 3 SYNONYMOUS_CODING D/D gaC/gaT 40 1 2343
        1 1653142 G A SNP Hom CDK11B.1 CDK11B mRNA NM_033487 NM_033487.ex.16 3 UTR_5_PRIME: 380 bases from TSS


        There are columns, A to U in there. If columns, A, B, J, O, P, S, T are the same, like the first three lines in the example above, I want only one line to be retained and the remaining two to be discarded.

        Thank you.

        PS: Three columns are mostly empty; thats why you see fewer than U columns there..
        So if a column is empty it doesn't give a delimiter? If you use ANNOVAR to annotate variants (which I highly recommend), it will create a .csv file that at least will have a comma for an empty column. Without delimiters it's trickier as column "O" in one row may correspond to column "N" in another, for example.

        Comment


        • #19
          It gives a delimiter - as in excel and notepad display it correctly. But when I copy the four lines and then paste here, the gap vanishes.

          But there is no character to represent a null entry, if thats what you mean.
          Last edited by shyam_la; 06-13-2012, 08:43 PM.

          Comment


          • #20
            Then I'm curious why you don't just use excel? It has a remove duplicates function where you can select what columns it considers.

            Comment


            • #21
              That is news! I never thought that would be possible.. Will give it a shot tomorrrow.

              Thanks.

              Comment


              • #22
                Originally posted by shyam_la View Post
                That is news! I never thought that would be possible.. Will give it a shot tomorrrow.

                Thanks.
                Yeah, in 2007 or 2010 (and maybe earlier versions) you can click on "Data" and then there is a "Remove Duplicates" button.

                Comment


                • #23
                  Yeah, in 2007 and above.. It worked!!

                  Thank you so much.

                  Comment


                  • #24
                    In Python, and most programming languages, lists are indexed starting at zero. So if you want the first, second, and seventh items, you want list[0], list[1], and list[6]. You're referencing objects that don't exist.

                    Comment


                    • #25
                      Originally posted by ucpete View Post
                      In Python, and most programming languages, lists are indexed starting at zero. So if you want the first, second, and seventh items, you want list[0], list[1], and list[6]. You're referencing objects that don't exist.
                      Actually, I'm referencing the wrong objects, not objects that don't exist. It should still have executed, shouldn't it?? Just with wrong results..

                      Anyway, when Excel can do it with a couple of mouse clicks, who needs Python?? :P

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Current Approaches to Protein Sequencing
                        by seqadmin


                        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                        04-04-2024, 04:25 PM
                      • seqadmin
                        Strategies for Sequencing Challenging Samples
                        by seqadmin


                        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                        03-22-2024, 06:39 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 04-11-2024, 12:08 PM
                      0 responses
                      32 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 10:19 PM
                      0 responses
                      35 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 09:21 AM
                      0 responses
                      29 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-04-2024, 09:00 AM
                      0 responses
                      53 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X