Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • neavemj
    replied
    Hmm, yep might need a bit of digging. It does seem that the script is requiring headers that look like NCBI / uniprot, e.g:

    <Hit_id>gi|3024260|sp|P56514.1|OPSD_BUFBU</Hit_id>

    Perhaps when you go from phylip to fasta, this header information is lost? You could also open up your xml file and compare the hit information to an xml file that you know works..

    Good luck!

    Matt.

    Leave a comment:


  • CsprsSassyHrly
    replied
    Thank you for your reply, Matt. I guess that's what I'm finding strange... The xml file is being generated using the same BLAST command line that I have used before and haven't had this issue... the only thing I am changing is the query and the database.

    The only thing that is really different is that the fasta file I turned into a database, was converted from a philip file into a fasta file before being turned into a database, while the other files I have turned into a database were downloaded as fasta files from Uniprot. I'll keep playing with it and see if I can figure it out!

    Thanks again,

    Irene

    Leave a comment:


  • neavemj
    replied
    Hi Irene,

    That error is because python is trying to get the second item in a list but the list only contains one item. Looking at the code (line 299), it appears that the script is trying to make a list from the hit definition by splitting is apart at the ">" symbol.

    As you can see from the output, that particular hit 'tr_E9FSX5_Daphnia_pul_Cru_Bra' does not contain a ">" symbol, and, therefore, the resulting list only contains this single item.

    Basically I think the input is just not in the correct format for this script. You could probably change the code a bit to get it to run but perhaps easiest would be to generate another input format? This help is provided in the script:

    # Expecting either this,
    # <Hit_id>gi|3024260|sp|P56514.1|OPSD_BUFBU</Hit_id>
    # <Hit_def>RecName: Full=Rhodopsin</Hit_def>
    # <Hit_accession>P56514</Hit_accession>
    # or,
    # <Hit_id>Subject_1</Hit_id>
    # <Hit_def>gi|57163783|ref|NP_001009242.1| rhodopsin [Felis catus]</Hit_def>
    # <Hit_accession>Subject_1</Hit_accession>
    #
    # apparently depending on the parse_deflines switch
    #
    # Or, with a local database not using -parse_seqids can get this,
    # <Hit_id>gnl|BL_ORD_ID|2</Hit_id>
    # <Hit_def>chrIII gi|240255695|ref|NC_003074.8| Arabidopsis
    # thaliana chromosome 3, complete sequence</Hit_def>
    # <Hit_accession>2</Hit_accession>

    Cheers,

    Matt.

    Leave a comment:


  • CsprsSassyHrly
    started a topic python blastxml_to_tabular.py

    python blastxml_to_tabular.py

    Good afternoon, all. I am relatively new to bioinformatics and am running into an error when I try to convert my xml file to a tabular file. I have used this command line before and didn't have any problems with it, but for some reason, I just can't get these files to be converted.

    Command line used:
    python blastxml_to_tabular.py -o P_jeffreysii_agatoxin.tab -c ext P_jeffreysii_agatoxin.xml

    This is the error that pops up after I hit enter:
    Problem splitting multuple hits?
    'tr_E9FSX5_Daphnia_pul_Cru_Bra'
    --> list index out of range

    I have checked, I am in the folders I'm supposed to be in, I've got the xml to tabular converter file with the xml file. There doesn't seem to be anything glaringly obviously wrong on the fasta file from the database. And I do have results on the xml files. But this is an error I get on every file I try to convert from xml to tabular that is run against this one database, except each time, it's a different sequence.

    I have googled the bananas out of this error and I have yet to find something that is helpful because they're all the "out of range" error but with different programs.

    Since I am pretty new, I'm hoping someone can help me understand what this error means and how I can fix it.

    Thank you in advance!

    Irene

Latest Articles

Collapse

  • seqadmin
    Exploring the Dynamics of the Tumor Microenvironment
    by seqadmin




    The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
    07-08-2024, 03:19 PM
  • seqadmin
    Exploring Human Diversity Through Large-Scale Omics
    by seqadmin


    In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
    06-25-2024, 06:43 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Today, 06:53 AM
0 responses
11 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-10-2024, 07:30 AM
0 responses
33 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-03-2024, 09:45 AM
0 responses
203 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-03-2024, 08:54 AM
0 responses
213 views
0 likes
Last Post seqadmin  
Working...
X