Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • yingeddi2008
    Junior Member
    • Oct 2013
    • 6

    Problems with blastdbcmd with entry ID contains space

    I am a rookie still in this area, this is the first thing I was requested to do: to extract a list of 100% matched reads from a self-generated database. However, the reads' names are not formatted in the regular way. I assume that's what I am encountering now.

    Below is a a list of my reads' names:
    this is part of my entry_batch input file -- ID.txt

    'M00344:4:000000000-A5RU9:1:2119:17016:21751 2:N:0:10'
    'M00344:4:000000000-A5RU9:1:2119:6591:19854 2:N:0:10'
    'M00344:4:000000000-A5RU9:1:2119:11445:14212 2:N:0:10'
    'M00344:4:000000000-A5RU9:1:2119:22676:7504 2:N:0:10'
    'M00344:4:000000000-A5RU9:1:2119:13009:4084 2:N:0:10'
    'M00344:4:000000000-A5RU9:1:2119:14454:4004 2:N:0:10'
    'M00344:4:000000000-A5RU9:1:2118:11021:19828 2:N:0:10'
    'M00344:4:000000000-A5RU9:1:2118:14025:16724 2:N:0:10'
    'M00344:4:000000000-A5RU9:1:2118:25864:15172 2:N:0:10'
    'M00344:4:000000000-A5RU9:1:2118:13018:13673 2:N:0:10'
    'M00344:4:000000000-A5RU9:1:2118:5760:11441 2:N:0:10'
    'M00344:4:000000000-A5RU9:1:2117:24461:19844 2:N:0:10'
    'M00344:4:000000000-A5RU9:1:2117:17300:18233 2:N:0:10'
    'M00344:4:000000000-A5RU9:1:2117:4137:17412 2:N:0:10'
    'M00344:4:000000000-A5RU9:1:2117:2789:15268 2:N:0:10'
    'M00344:4:000000000-A5RU9:1:2117:25164:15029 2:N:0:10'
    'M00344:4:000000000-A5RU9:1:2117:16039:7681 2:N:0:10'
    'M00344:4:000000000-A5RU9:1:2117:8713:5016 2:N:0:10'
    'M00344:4:000000000-A5RU9:1:2116:13795:20195 2:N:0:10'
    'M00344:4:000000000-A5RU9:1:2116:6977:17108 2:N:0:10'

    I used commands below:
    $ blastdbcmd -db seqs.fasta -dbtype nucl -entry_batch ID.txt -out miseq.read.fasta

    Error messages:

    Error: 'M00344:4:000000000-A5RU9:1:1104:13049:19775: OID not found
    Error: 'M00344:4:000000000-A5RU9:1:1104:13044:19758: OID not found
    Error: 'M00344:4:000000000-A5RU9:1:1104:13062:19751: OID not found
    Error: 'M00344:4:000000000-A5RU9:1:1104:11099:18531: OID not found
    Error: 'M00344:4:000000000-A5RU9:1:1104:11118:18521: OID not found
    Error: 'M00344:4:000000000-A5RU9:1:1104:17175:17791: OID not found
    Error: 'M00344:4:000000000-A5RU9:1:1104:17452:17720: OID not found
    Error: 'M00344:4:000000000-A5RU9:1:1104:16737:13751: OID not found
    Error: 'M00344:4:000000000-A5RU9:1:1104:16726:13733: OID not found
    Error: 'M00344:4:000000000-A5RU9:1:1104:19339:9296: OID not found
    Error: 'M00344:4:000000000-A5RU9:1:1104:17187:8943: OID not found
    Error: 'M00344:4:000000000-A5RU9:1:1104:14936:7801: OID not found
    Error: 'M00344:4:000000000-A5RU9:1:1104:21379:6845: OID not found
    Error: 'M00344:4:000000000-A5RU9:1:1104:23493:5643: OID not found
    Error: 'M00344:4:000000000-A5RU9:1:1104:26299:4746: OID not found
    Error: 'M00344:4:000000000-A5RU9:1:1104:23691:4053: OID not found
    Error: 'M00344:4:000000000-A5RU9:1:1104:15699:3766: OID not found
    Error: 'M00344:4:000000000-A5RU9:1:1103:18377:16637: OID not found
    Error: 'M00344:4:000000000-A5RU9:1:1103:16030:10176: OID not found


    I tried changing white space to \s, or add ' before and after each id names, but it didn't help at all. The blastdbcmd program recognizes anything before the space as the id names. Anyone has any idea how to do it? Or I am totally heading in the wrong direction?

    Eddi
  • gringer
    David Eccles (gringer)
    • May 2011
    • 845

    #2
    Here's what I run to generate a BLAST database out of a FASTA file:
    Code:
    makeblastdb -in <input>.fasta -title 'Something Stringy' -taxid <org_taxid> -dbtype nucl -out <dbname_ID>
    It looks like you might be trying to query a database that doesn't exist (or hasn't been generated yet).

    However, if you have an NGS-amount of reads, it's probably better to use something other than BLAST for sequence matching. I'd recommend Bowtie2, but BWA seems to also be commonly used here.

    Here's the command I'd run to generate a Bowtie2 index:
    Code:
    bowtie2-build <input>.fasta <dbname_ID>

    Comment

    • maubp
      Peter (Biopython etc)
      • Jul 2009
      • 1544

      #3
      Originally posted by yingeddi2008 View Post
      I tried changing white space to \s, or add ' before and after each id names, but it didn't help at all. The blastdbcmd program recognizes anything before the space as the id names. Anyone has any idea how to do it? Or I am totally heading in the wrong direction?
      Hi Eddi. This is by partly design - most tools consider everything up to the first space as the ID.

      However there are also some issues with the blastdbcmd, and the exact version of BLAST+ is important, see my blog post:
      The blastdbcmd tool in the BLAST+ suite (replacing fastacmd in the C 'legacy' BLAST suite) lets you do a lot of clever things with a BLAST d...

      Comment

      • yingeddi2008
        Junior Member
        • Oct 2013
        • 6

        #4
        blastdbcmd sucks

        Hi maubp,

        Thank you very much. I read through your blog. I think that's exactly what I have problem now. Then there is no way I can extract sequences from my own custom database?!

        For example, in my database, I have

        Code:
        >M00344:4:000000000-A5RU9:1:1101:17539:1069 1:N:0:14
        AAGAGTTTGATCATGGCTCAGGACGAACGCTGGCGGCGTGCCTAACACATGCAAGTCGAGCGATGAAACCCTTCGGGGTGGATTAGCGGCGGACGGGTGAGTAACACGTGGGCAACCTGCCTCAAAGAGGGGGATAGCCTCCCGAAAGGGAGATTAATACCGCATAATAAGTACTTCTCGCATGGGAAGAACTTTAAAGGAGCAATCCGCTTTGAGATGGGCCCGCGGCGCATTAGCTAGTTGGTGAGGTAAAGGCTCACAAAGGCGACGATGCGTAGCCGACCTGAGAGGGTGATCGGCG
        >M00344:4:000000000-A5RU9:1:1101:17556:1074 1:N:0:14
        AAGAGTTTGATCATGGCTCAGGACGAACGCTGGCGGCGTGCCTAACACATGCAAGTCGAGCGATGAAACCCTTCGGGGTGGATTAGCGGCGGACGGGTGAGTAACACGTGGGCAACCTGCCTCAAAGAGGGGGATAGCCTCCCGAAAGGGAGATTAATACCGCATAATAAGTACTTCTCGCATGGGAAGAACTTTAAAGGAGCAATCCGCTTTGAGATGGGCCCGCGGCGCATTAGCTAGTTGGTGAGGTAAAGGCTCACCAAGGCGACGATGCGTAGCCGAACTGAGAGGGGGATCGGC
        But when I run

        Code:
        $ blastdbcmd -db seq.fasta -entry all -outfmt "OID: %o     TITLE: %t"
        I got nothing back, I don't know whether there is an internal error or it won't recognize any IDs that are not in NCBI format. That is so unfortunate.


        Eddi

        Originally posted by maubp View Post
        Hi Eddi. This is by partly design - most tools consider everything up to the first space as the ID.

        However there are also some issues with the blastdbcmd, and the exact version of BLAST+ is important, see my blog post:
        http://blastedbio.blogspot.co.uk/201...cbi-blast.html

        Comment

        • yingeddi2008
          Junior Member
          • Oct 2013
          • 6

          #5
          Hi gringer,

          Thank you for your advice, I will try Bowtie2 or BWA. I have Illumina Miseq data here. Maybe I should try something else.

          Eddi

          Comment

          • maubp
            Peter (Biopython etc)
            • Jul 2009
            • 1544

            #6
            Originally posted by yingeddi2008 View Post
            Thank you very much. I read through your blog. I think that's exactly what I have problem now.
            Please email them to check (and make sure they know people are having problems with blastdbcmd to help prioritise fixing this). Thanks!
            Originally posted by yingeddi2008 View Post
            Then there is no way I can extract sequences from my own custom database?!
            As long as you still have the FASTA file you made the BLAST database from, you can extract the records from the FASTA file. There are several tools for this (including support in scripting libraries like Biopython, BioPerl, BioRuby etc).

            Comment

            • yingeddi2008
              Junior Member
              • Oct 2013
              • 6

              #7
              Thank you.

              Hi maubp,

              Originally posted by maubp View Post
              Please email them to check (and make sure they know people are having problems with blastdbcmd to help prioritise fixing this). Thanks!
              Who should I email to? These NCBI guys?

              Originally posted by maubp View Post
              As long as you still have the FASTA file you made the BLAST database from, you can extract the records from the FASTA file. There are several tools for this (including support in scripting libraries like Biopython, BioPerl, BioRuby etc).
              I will try those then. Thank you.

              Eddi

              Comment

              • maubp
                Peter (Biopython etc)
                • Jul 2009
                • 1544

                #8
                Originally posted by maubp View Post
                Please email them to check (and make sure they know people are having problems with blastdbcmd to help prioritise fixing this). Thanks!
                blast-help at ncbi.nlm.nih.gov as listed here:
                The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families.

                Comment

                Latest Articles

                Collapse

                • SEQadmin2
                  Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                  by SEQadmin2


                  I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


                  Here are nine questions we think about, in roughly the order they matter, before...
                  06-18-2026, 07:11 AM
                • SEQadmin2
                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                  by SEQadmin2


                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                  ...
                  06-02-2026, 10:05 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by SEQadmin2, 06-17-2026, 06:09 AM
                0 responses
                25 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-09-2026, 11:58 AM
                0 responses
                42 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-05-2026, 10:09 AM
                0 responses
                48 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-04-2026, 08:59 AM
                0 responses
                49 views
                0 reactions
                Last Post SEQadmin2  
                Working...