Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • chriswymant
    replied
    Silent total failure on MacOS

    WARNING: the answers involving `sort -u --merge` fail silently and spectacularly on MacOS (for me at least). Details here https://www.biostars.org/p/144569/#325003

    Leave a comment:


  • rahulvrane
    replied
    Technically you dont need to go that far. Since blastn/blastp scores for latest blast+ packages is already sorted with top hit as the first hit, you can just use the following.

    Code:
    blastn -query transcripts.fa -db Targets.fa [other options you like ] -outfmt 7 -out blast_output.txt
    
    cat blast_output.txt |awk '/hits found/{getline;print}' | grep -v "#" > top_hits.txt
    Rahul
    Last edited by rahulvrane; 03-04-2015, 03:27 PM.

    Leave a comment:


  • kaps
    replied
    Hi,

    Can this -max_target_seqs 1 & also work for commandline blast if I need to output the best hits?

    kaps

    Leave a comment:


  • HSV-1
    replied
    You guys save my ass, thanks!

    Leave a comment:


  • karla_oliveira
    replied
    Hello everybody!

    I am in a kind of problem with my standalone blastp.
    I used parameters from NormSci and worked pretty good but there is one point remained in my case. I have 1000 proteins to compare to my query and at these commands, all of them are showed (even with the message "0 hits found"). This represents 998 of my 1000 but I just want to see the 2 which have hits.

    Does anyone have a tip for me?

    Thank you so much!

    Leave a comment:


  • rhinoceros
    replied
    Originally posted by dariober View Post
    Hi- I'm not sure how/if it can be done by Blast. However you can extract the top hit from the blast ouput (blastout.txt) with the code below:

    Code:
    sort -k1,1 -k12,12nr -k11,11n  blastout.txt | sort -u -k1,1 --merge
    The first sort orders the blast output by query name then by the 12th column in descending order (bit score - I think), then by 11th column ascending (evalue I think).
    The second sort picks the first line from each query. Obviously you can skip the first sort if the output is already sorted in the 'correct' order.

    Hope it helps (make sure it does what you want)...

    Dario
    I've lost count on how many times I've used this sort. However, I just realized that it doesn't work as intended. Sort -n compares according to string numerical value, so for example 1e-10 would be smaller than 2e-100. I believe the command below is 'correct'..

    Code:
    sort -k1,1 -k12,12gr -k11,11g -k3,3gr blastout.txt | sort -u -k1,1 --merge > bestHits
    Although the bitscore sort probably works fine with -k12,12nr too (which might make it somewhat faster perhaps??).

    Note, make sure that your locale settings recognize "." as a decimal separator, e.g.

    Code:
    export LC_ALL=en_US.UTF-8
    export LANG=en_US.UTF-8
    Last edited by rhinoceros; 05-17-2014, 03:08 AM.

    Leave a comment:


  • carmeyeii
    replied
    @Dariober, one needs to also append the -t, parameter to your code snippet , since it is a comma-separated file and the default delimiter for sort is non-blank to blank.

    Code:
    sort -t, -k1,1 -k12,12nr -k11,11n  blastout.txt | sort -u -t, -k1,1 --merge
    -Carmen
    Last edited by carmeyeii; 05-21-2013, 06:21 PM.

    Leave a comment:


  • Tsuyoshi
    replied
    Originally posted by NormSci View Post
    Hi Tsuyoshi,

    After fooling around with blast+ today, I think I've managed to achieve your objective. The command I'm using is:

    blastn -query transcripts.fa -out transcripts.blast.txt -task megablast -db refseq_rna -num_threads 12 -evalue 1e-10 -best_hit_score_edge 0.05 -best_hit_overhang 0.25 -outfmt 7 -perc_identity 50 -max_target_seqs 1 &

    Adding the "-max_target_seqs" flag and setting it to "1" yields what appears to be the best hit in terms of e-value and bit score. I haven't done extensive comparisons, but it appears where multiple matches yield the same e-value (e.g., 0), the match with the highest score is retained.

    Perhaps anyone who has more experience with blast+ can provide further insight.
    Dear NormSci,
    Thank you for your kind reply!
    I learned more useful commands in using blast+ from your reply, such as set the minimum identity percentage and especially the max_targe_seqs. It sounds convenient to use to output the only top match. I would like to try it ASAP.

    Alternatively, I managed to extract the top hit match from excel file using macro tools like
    ****************************************************************
    Sub tophitonly()
    Dim i%
    For i = [a65536].End(3).Row To 1 Step -1
    If Application.CountIf(Range("a:a"), Cells(i, 1)) > 1 Then
    Cells(i, 1).EntireRow.Delete
    End If
    Next i
    End Sub
    ****************************************************************
    Since the queries in result.txt were in an ascending order based on the bit score and evalue, the macro worked well to pick the top hit one of each query. Meanwhile, I could count how many matches were obtained from one blast.
    Thank you.

    Leave a comment:


  • Tsuyoshi
    replied
    Originally posted by dariober View Post
    Hi- I'm not sure how/if it can be done by Blast. However you can extract the top hit from the blast ouput (blastout.txt) with the code below:

    Code:
    sort -k1,1 -k12,12nr -k11,11n  blastout.txt | sort -u -k1,1 --merge
    The first sort orders the blast output by query name then by the 12th column in descending order (bit score - I think), then by 11th column ascending (evalue I think).
    The second sort picks the first line from each query. Obviously you can skip the first sort if the output is already sorted in the 'correct' order.

    Hope it helps (make sure it does what you want)...

    Dario
    Dear Dario,
    Thank you very much for answering this question, and you helped me a lot to resolve the problems of python several days ago.

    In order to extract the best hit in results.txt, I have tried an alternative way in using macro tool of excel. The script for this purpose was written like
    ****************************************************************
    Sub tophitonly()
    Dim i%
    For i = [a65536].End(3).Row To 1 Step -1
    If Application.CountIf(Range("a:a"), Cells(i, 1)) > 1 Then
    Cells(i, 1).EntireRow.Delete
    End If
    Next i
    End Sub
    ****************************************************************
    Anyway, thank you very much. And I would like to try your method in the following days.

    Leave a comment:


  • NormSci
    replied
    Hi Tsuyoshi,

    After fooling around with blast+ today, I think I've managed to achieve your objective. The command I'm using is:

    blastn -query transcripts.fa -out transcripts.blast.txt -task megablast -db refseq_rna -num_threads 12 -evalue 1e-10 -best_hit_score_edge 0.05 -best_hit_overhang 0.25 -outfmt 7 -perc_identity 50 -max_target_seqs 1 &

    Adding the "-max_target_seqs" flag and setting it to "1" yields what appears to be the best hit in terms of e-value and bit score. I haven't done extensive comparisons, but it appears where multiple matches yield the same e-value (e.g., 0), the match with the highest score is retained.

    Perhaps anyone who has more experience with blast+ can provide further insight.

    Leave a comment:


  • dariober
    replied
    Hi- I'm not sure how/if it can be done by Blast. However you can extract the top hit from the blast ouput (blastout.txt) with the code below:

    Code:
    sort -k1,1 -k12,12nr -k11,11n  blastout.txt | sort -u -k1,1 --merge
    The first sort orders the blast output by query name then by the 12th column in descending order (bit score - I think), then by 11th column ascending (evalue I think).
    The second sort picks the first line from each query. Obviously you can skip the first sort if the output is already sorted in the 'correct' order.

    Hope it helps (make sure it does what you want)...

    Dario

    Leave a comment:


  • how to output only one best-hit result using standalone blast

    Hello
    I am running standalone blast now to blastp a protein data set against a genome data.

    Although I have set the best-hits filtering algorithm, namely, using -best_hit_overhang, and the value I set was 0.25 (0.1~0/25 was recommended at http://www.ncbi.nlm.nih.gov/books/NB...stHits_filteri)

    Compared the results without -best_hit_overhang, indeed, the number of matches for each inquery was eliminated a lot, however, what I want is to output just one best hit result (according to the bit score and evalue).

    For example, the output result was like
    ***************************************************************
    gi|270118727|emb|CAT18807.1| gi|323452346|gb|EGB08220.1| 26.15 65 48 0 31 95 835 899 0.05 35.4
    gi|270118727|emb|CAT18807.1| gi|323451569|gb|EGB07446.1| 28.77 73 46 2 226 298 14 80 0.42 31.2
    gi|270118727|emb|CAT18807.1| gi|323445796|gb|EGB02232.1| 30.65 62 43 0 349 410 111 172 1.8 29.3
    gi|270118727|emb|CAT18807.1| gi|323449782|gb|EGB05667.1| 29.27 41 29 0 132 172 105 145 2.1 29.6
    gi|270118727|emb|CAT18807.1| gi|323448614|gb|EGB04510.1| 26.92 52 38 0 189 240 15 66 6.5 27.7
    ***************************************************************

    And I just want the top best one,
    ***************************************************************
    gi|270118727|emb|CAT18807.1| gi|323452346|gb|EGB08220.1| 26.15 65 48 0 31 95 835 899 0.05 35.4
    ***************************************************************

    Would anyone please help to tell me how to do that?

Latest Articles

Collapse

  • seqadmin
    Strategies for Sequencing Challenging Samples
    by seqadmin


    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
    03-22-2024, 06:39 AM
  • seqadmin
    Techniques and Challenges in Conservation Genomics
    by seqadmin



    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

    Avian Conservation
    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
    03-08-2024, 10:41 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 03-27-2024, 06:37 PM
0 responses
13 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-27-2024, 06:07 PM
0 responses
11 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-22-2024, 10:03 AM
0 responses
53 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-21-2024, 07:32 AM
0 responses
69 views
0 likes
Last Post seqadmin  
Working...
X