Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • easolvig
    Junior Member
    • Oct 2012
    • 2

    BLASTp output format problem

    Hello to all the members!

    This is my first post here on SEQanswers.

    Today I was updating the BLASTp application on the nodes of our grid to the latest version and, after running some jobs to test it, I noticed that the output files have different number of lines depending on the output format. (CSV and tabular format.)

    I used the same database and query file for both run, the only difference was the output format parameter:
    blastp -evalue 0.1 -db F10DRD -out test_output_f10drd_180.txt -outfmt '10 qseqid sseqid qstart qend evalue' -query f10drd_180.fas
    blastp -evalue 0.1 -db F10DRD -out test_output_f10drd_180.txt -outfmt '6 qseqid sseqid qstart qend evalue' -query f10drd_180.fas

    The output CSV file contained 1288845, the tabular file contained 1293150.
    I replaced the \t characters with commas in the tabular file and compared the two outputs with diff. It showed that the tabular file contains all lines from the CSV, but has 4305 more.

    I would like to ask if any of you noticed the same problem before.

    Thank you for your time and your answers!
  • Torst
    Senior Member
    • Apr 2008
    • 275

    #2
    I tried to replicate yoru results with BLAST 2.2.27+ using blastn and 1000 sequences:

    Code:
    formatdb -i contigs.fa -p F -o T
    
    blastn -query contigs.fa -db ./contigs.fa -evalue 0.1 -out blast.csv -outfmt '10 qseqid sseqid qstart qend evalue'
    blastn -query contigs.fa -db ./contigs.fa -evalue 0.1 -out blast.tsv -outfmt '6 qseqid sseqid qstart qend evalue'
    
    wc -l blast.*
    14858 blast.csv
    14858 blast.tsv
    The number of lines matched for me. I know this is a one-off, but comforting for me at least.

    Are you using version 2.2.27 ?
    Did you use "-parse_seqids" for makeblastdb? (or -o T for formatdb)
    Are the sequence IDs unique in your database file?

    Comment

    • easolvig
      Junior Member
      • Oct 2012
      • 2

      #3
      Thank you for your reply, Torst!

      Yes, -o T was used for formatdb, the IDs are unique and it is the 2.2.27 version. The problem was caused by something else.

      After spending days with running several tests with different queries to find the source of this problem I found that those test jobs that completed in less than ~3 hours produced the same output in both CSV and tabular format. This led to ask for our computing grid’s error logs from the administrator.

      I finally got the logs and it revealed that the different output files were the result of an incorrectly set CPU limit assigned to our account. It was a recent change what we were unaware of. Now, after they corrected it, the test runs I made gave correct and identically results.


      I am sorry for taking your time with this question.
      Last edited by easolvig; 10-13-2012, 03:27 AM.

      Comment

      • Torst
        Senior Member
        • Apr 2008
        • 275

        #4
        Glad it worked out, and there wasn't a bug in BLAST+.

        Comment

        Latest Articles

        Collapse

        • GATTACAT
          Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
          by GATTACAT
          Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
          07-01-2026, 11:43 AM
        • SEQadmin2
          Nine Things a Sample Prep Scientist Thinks About Before Sequencing
          by SEQadmin2


          I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

          Here are nine questions we think about, in roughly the order they matter, before...
          06-18-2026, 07:11 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by SEQadmin2, 07-02-2026, 11:08 AM
        0 responses
        9 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-30-2026, 05:37 AM
        0 responses
        12 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-26-2026, 11:10 AM
        0 responses
        20 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-17-2026, 06:09 AM
        0 responses
        54 views
        0 reactions
        Last Post SEQadmin2  
        Working...