  • Blast problem

    Hi everybody, I am a bit desperate and hope someone can help me. I need to create a subset of the nr-database (for blastx) using a negative or positive gi list. There are several possibilities to do this (are there more??):

    1) read the multifasta nr-file and remove some entries; however, the file is almost 8 GB in size and this takes a lot of time (and you have to create the database afterwards)
    2) use blast+ which has a "negative_gi" option; however, another program's parser expects the old output format which seems to differ from blast+
    3) formatdb has a -L option to create a subset of the database based on a file with a positive gi-list
    4) blastall has a -l option to perform the search based on a file with a positive gi-list, which should produce the same result

    Now, the possibilities 3) and 4) seem to be what I need. Unfortunately, they don't work. The problem looks like this:

    my@computer:/tmp/blast/bin$ ls
    . blastclust drosoph.aa.phr fastacmd impala query.fa
    .. blastpgp formatdb makemat rpsblast
    bl2seq copymat drosoph.aa.psq formatdb.log megablast seedtop
    blastall drosoph.aa formatrpsdb .ncbirc
    my@computer:/tmp/blast/bin$ ./formatdb -i drosoph.aa -F -L subset
    [formatdb] FATAL ERROR: Unable to find

    my@computer:/tmp/blast/bin$ ./formatdb -i drosoph.aa -F ./ -L subset
    [formatdb] FATAL ERROR: Unable to find ./

    my@computer:/tmp/blast/bin$ ./formatdb -i drosoph.aa -F /tmp/blast/bin/ -L subset
    [formatdb] FATAL ERROR: Unable to find /tmp/blast/bin/

    my@computer:/tmp/blast/bin$ ./formatdb -i drosoph.aa -F whatever -L subset
    [formatdb] FATAL ERROR: Unable to find whatever

    my@computer:/tmp/blast/bin$ ./blastall -p blastx -d drosoph.aa -l -i query.fa

    Searching[blastall] ERROR: query1[protein_gi:7290028]: Unable to open file
    [blastall] WARNING: query1[protein_gi:7290028]: Intersection of gilist and BLAST database ID's empty
    I tried the latest blast version (2.2.25) as well as some other ones, on Fedora and on Ubuntu. Can someone reproduce this behavior?

  • #2

    Both formatdb and blastall complained about Can you double check the file? You may post "ls -l" output here.


    • #3
      my@computer:/tmp/blast/bin$ ls -l
      total 94040
      -rwxr-xr-x 1 me me 8 2011-08-07 15:12
      my@computer:/tmp/blast/bin$ head
      The file contains only one line, which is a gi number. I tried setting permissions to 777 for this file, didn't help.


      • #4
        1) This is a strange problem. The file belongs to me/me but the login is my. Do you know why. It should not contribute to your problem but I am just curious.
        2) Can you successfully run blast+ in this case? As to the output format, blast+ allows you to customize output fields. You may pursue this as an alternative.


        • #5
          1) Well, that's because I changed my true name and did it inconsistently.
          2) Is it possible to change the blast+ output in a way that a parser written for the plain-text blast output can read it?? If would be really happy if this was the case. Otherwise I can't use it. (Anyway, I didn't try a blast+ run with a negative gi-list until now; will do tomorrow.)


          • #6
            2) yes. You can specify the fields in tab-delimited format. Check the blast+ manual.


            • #7
              Dear DZhang, thank you very much for your replies! But as far as I can see the program I use expects plain-text blast output, and not the tab-delimited format. And the plain-text blast+ output cannot be parsed.

              So I would like to use the old blast version, as it offers the option I need (according to the documentation, the -l parameter for blastall or the -L parameter for formatdb). Can someone reproduce my problem and make any suggestions?


              • #8
                Ok, we got it. It almost drove me crazy. Finally, my colleague found out by using the - since today my very favorite - command "strace".

                So, here is the solution: the .ncbirc file has to contain the following lines.

                Then the environmental variable is properly set and

                formatdb -i drosoph.aa -F drosoph.gil -L subset
                works like a charm.
                Last edited by sammy07; 08-09-2011, 05:39 AM.


                • #9
                  sammy07, thank you for sharing the solution.


