Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Blast problem

    Hi everybody, I am a bit desperate and hope someone can help me. I need to create a subset of the nr-database (for blastx) using a negative or positive gi list. There are several possibilities to do this (are there more??):

    1) read the multifasta nr-file and remove some entries; however, the file is almost 8 GB in size and this takes a lot of time (and you have to create the database afterwards)
    2) use blast+ which has a "negative_gi" option; however, another program's parser expects the old output format which seems to differ from blast+
    3) formatdb has a -L option to create a subset of the database based on a file with a positive gi-list
    4) blastall has a -l option to perform the search based on a file with a positive gi-list, which should produce the same result

    Now, the possibilities 3) and 4) seem to be what I need. Unfortunately, they don't work. The problem looks like this:

    my@computer:/tmp/blast/bin$ ls
    . blastclust drosoph.aa.phr fastacmd impala query.fa
    .. blastpgp drosoph.aa.pin formatdb makemat rpsblast
    bl2seq copymat drosoph.aa.psq formatdb.log megablast seedtop
    blastall drosoph.aa drosoph.gi.txt formatrpsdb .ncbirc
    my@computer:/tmp/blast/bin$ ./formatdb -i drosoph.aa -F drosoph.gi.txt -L subset
    [formatdb] FATAL ERROR: Unable to find drosoph.gi.txt

    my@computer:/tmp/blast/bin$ ./formatdb -i drosoph.aa -F ./drosoph.gi.txt -L subset
    [formatdb] FATAL ERROR: Unable to find ./drosoph.gi.txt

    my@computer:/tmp/blast/bin$ ./formatdb -i drosoph.aa -F /tmp/blast/bin/drosoph.gi.txt -L subset
    [formatdb] FATAL ERROR: Unable to find /tmp/blast/bin/drosoph.gi.txt

    my@computer:/tmp/blast/bin$ ./formatdb -i drosoph.aa -F whatever -L subset
    [formatdb] FATAL ERROR: Unable to find whatever

    my@computer:/tmp/blast/bin$ ./blastall -p blastx -d drosoph.aa -l drosoph.gi.txt -i query.fa

    Searching[blastall] ERROR: query1[protein_gi:7290028]: Unable to open file drosoph.gi.txt
    [blastall] WARNING: query1[protein_gi:7290028]: Intersection of gilist and BLAST database ID's empty
    I tried the latest blast version (2.2.25) as well as some other ones, on Fedora and on Ubuntu. Can someone reproduce this behavior?

  • #2
    Hi,

    Both formatdb and blastall complained about drosoph.gi.txt. Can you double check the file? You may post "ls -l" output here.

    Comment


    • #3
      my@computer:/tmp/blast/bin$ ls -l
      total 94040
      ...
      -rwxr-xr-x 1 me me 8 2011-08-07 15:12 drosoph.gi.txt
      ...
      my@computer:/tmp/blast/bin$ head drosoph.gi.txt
      7290028
      The file contains only one line, which is a gi number. I tried setting permissions to 777 for this file, didn't help.

      Comment


      • #4
        1) This is a strange problem. The file belongs to me/me but the login is my. Do you know why. It should not contribute to your problem but I am just curious.
        2) Can you successfully run blast+ in this case? As to the output format, blast+ allows you to customize output fields. You may pursue this as an alternative.

        Comment


        • #5
          1) Well, that's because I changed my true name and did it inconsistently.
          2) Is it possible to change the blast+ output in a way that a parser written for the plain-text blast output can read it?? If would be really happy if this was the case. Otherwise I can't use it. (Anyway, I didn't try a blast+ run with a negative gi-list until now; will do tomorrow.)

          Comment


          • #6
            2) yes. You can specify the fields in tab-delimited format. Check the blast+ manual.

            Comment


            • #7
              Dear DZhang, thank you very much for your replies! But as far as I can see the program I use expects plain-text blast output, and not the tab-delimited format. And the plain-text blast+ output cannot be parsed.

              So I would like to use the old blast version, as it offers the option I need (according to the documentation, the -l parameter for blastall or the -L parameter for formatdb). Can someone reproduce my problem and make any suggestions?

              Comment


              • #8
                Ok, we got it. It almost drove me crazy. Finally, my colleague found out by using the - since today my very favorite - command "strace".

                So, here is the solution: the .ncbirc file has to contain the following lines.

                [BLAST]
                BLASTDB=/path/to/db
                Then the environmental variable is properly set and

                formatdb -i drosoph.aa -F drosoph.gil -L subset
                works like a charm.
                Last edited by sammy07; 08-09-2011, 05:39 AM.

                Comment


                • #9
                  sammy07, thank you for sharing the solution.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Best Practices for Single-Cell Sequencing Analysis
                    by seqadmin



                    While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                    06-06-2024, 07:15 AM
                  • seqadmin
                    Latest Developments in Precision Medicine
                    by seqadmin



                    Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                    Somatic Genomics
                    “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                    05-24-2024, 01:16 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Today, 07:23 AM
                  0 responses
                  8 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 06-17-2024, 06:54 AM
                  0 responses
                  12 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 06-14-2024, 07:24 AM
                  0 responses
                  24 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 06-13-2024, 08:58 AM
                  0 responses
                  18 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X