Silent total failure on MacOS
WARNING: the answers involving `sort -u --merge` fail silently and spectacularly on MacOS (for me at least). Details here https://www.biostars.org/p/144569/#325003
Header Leaderboard Ad
Collapse
how to output only one best-hit result using standalone blast
Collapse
Announcement
Collapse
SEQanswers June Challenge Has Begun!
The competition has begun! We're giving away a $50 Amazon gift card to the member who answers the most questions on our site during the month. We want to encourage our community members to share their knowledge and help each other out by answering questions related to sequencing technologies, genomics, and bioinformatics. The competition is open to all members of the site, and the winner will be announced at the beginning of July. Best of luck!
For a list of the official rules, visit (https://www.seqanswers.com/forum/sit...wledge-and-win)
For a list of the official rules, visit (https://www.seqanswers.com/forum/sit...wledge-and-win)
See more
See less
X
-
Technically you dont need to go that far. Since blastn/blastp scores for latest blast+ packages is already sorted with top hit as the first hit, you can just use the following.
Code:blastn -query transcripts.fa -db Targets.fa [other options you like ] -outfmt 7 -out blast_output.txt cat blast_output.txt |awk '/hits found/{getline;print}' | grep -v "#" > top_hits.txt
Last edited by rahulvrane; 03-04-2015, 03:27 PM.
Leave a comment:
-
Hi,
Can this -max_target_seqs 1 & also work for commandline blast if I need to output the best hits?
kaps
Leave a comment:
-
Hello everybody!
I am in a kind of problem with my standalone blastp.
I used parameters from NormSci and worked pretty good but there is one point remained in my case. I have 1000 proteins to compare to my query and at these commands, all of them are showed (even with the message "0 hits found"). This represents 998 of my 1000 but I just want to see the 2 which have hits.
Does anyone have a tip for me?
Thank you so much!
Leave a comment:
-
Originally posted by dariober View PostHi- I'm not sure how/if it can be done by Blast. However you can extract the top hit from the blast ouput (blastout.txt) with the code below:
Code:sort -k1,1 -k12,12nr -k11,11n blastout.txt | sort -u -k1,1 --merge
The second sort picks the first line from each query. Obviously you can skip the first sort if the output is already sorted in the 'correct' order.
Hope it helps (make sure it does what you want)...
Dario
Code:sort -k1,1 -k12,12gr -k11,11g -k3,3gr blastout.txt | sort -u -k1,1 --merge > bestHits
Note, make sure that your locale settings recognize "." as a decimal separator, e.g.
Code:export LC_ALL=en_US.UTF-8 export LANG=en_US.UTF-8
Last edited by rhinoceros; 05-17-2014, 03:08 AM.
Leave a comment:
-
@Dariober, one needs to also append the -t, parameter to your code snippet, since it is a comma-separated file and the default delimiter for sort is non-blank to blank.
Code:sort -t, -k1,1 -k12,12nr -k11,11n blastout.txt | sort -u -t, -k1,1 --merge
Last edited by carmeyeii; 05-21-2013, 06:21 PM.
Leave a comment:
-
Originally posted by NormSci View PostHi Tsuyoshi,
After fooling around with blast+ today, I think I've managed to achieve your objective. The command I'm using is:
blastn -query transcripts.fa -out transcripts.blast.txt -task megablast -db refseq_rna -num_threads 12 -evalue 1e-10 -best_hit_score_edge 0.05 -best_hit_overhang 0.25 -outfmt 7 -perc_identity 50 -max_target_seqs 1 &
Adding the "-max_target_seqs" flag and setting it to "1" yields what appears to be the best hit in terms of e-value and bit score. I haven't done extensive comparisons, but it appears where multiple matches yield the same e-value (e.g., 0), the match with the highest score is retained.
Perhaps anyone who has more experience with blast+ can provide further insight.
Thank you for your kind reply!
I learned more useful commands in using blast+ from your reply, such as set the minimum identity percentage and especially the max_targe_seqs. It sounds convenient to use to output the only top match. I would like to try it ASAP.
Alternatively, I managed to extract the top hit match from excel file using macro tools like
****************************************************************
Sub tophitonly()
Dim i%
For i = [a65536].End(3).Row To 1 Step -1
If Application.CountIf(Range("a:a"), Cells(i, 1)) > 1 Then
Cells(i, 1).EntireRow.Delete
End If
Next i
End Sub
****************************************************************
Since the queries in result.txt were in an ascending order based on the bit score and evalue, the macro worked well to pick the top hit one of each query. Meanwhile, I could count how many matches were obtained from one blast.
Thank you.
Leave a comment:
-
Originally posted by dariober View PostHi- I'm not sure how/if it can be done by Blast. However you can extract the top hit from the blast ouput (blastout.txt) with the code below:
Code:sort -k1,1 -k12,12nr -k11,11n blastout.txt | sort -u -k1,1 --merge
The second sort picks the first line from each query. Obviously you can skip the first sort if the output is already sorted in the 'correct' order.
Hope it helps (make sure it does what you want)...
Dario
Thank you very much for answering this question, and you helped me a lot to resolve the problems of python several days ago.
In order to extract the best hit in results.txt, I have tried an alternative way in using macro tool of excel. The script for this purpose was written like
****************************************************************
Sub tophitonly()
Dim i%
For i = [a65536].End(3).Row To 1 Step -1
If Application.CountIf(Range("a:a"), Cells(i, 1)) > 1 Then
Cells(i, 1).EntireRow.Delete
End If
Next i
End Sub
****************************************************************
Anyway, thank you very much. And I would like to try your method in the following days.
Leave a comment:
-
Hi Tsuyoshi,
After fooling around with blast+ today, I think I've managed to achieve your objective. The command I'm using is:
blastn -query transcripts.fa -out transcripts.blast.txt -task megablast -db refseq_rna -num_threads 12 -evalue 1e-10 -best_hit_score_edge 0.05 -best_hit_overhang 0.25 -outfmt 7 -perc_identity 50 -max_target_seqs 1 &
Adding the "-max_target_seqs" flag and setting it to "1" yields what appears to be the best hit in terms of e-value and bit score. I haven't done extensive comparisons, but it appears where multiple matches yield the same e-value (e.g., 0), the match with the highest score is retained.
Perhaps anyone who has more experience with blast+ can provide further insight.
Leave a comment:
-
Hi- I'm not sure how/if it can be done by Blast. However you can extract the top hit from the blast ouput (blastout.txt) with the code below:
Code:sort -k1,1 -k12,12nr -k11,11n blastout.txt | sort -u -k1,1 --merge
The second sort picks the first line from each query. Obviously you can skip the first sort if the output is already sorted in the 'correct' order.
Hope it helps (make sure it does what you want)...
Dario
Leave a comment:
-
how to output only one best-hit result using standalone blast
Hello
I am running standalone blast now to blastp a protein data set against a genome data.
Although I have set the best-hits filtering algorithm, namely, using -best_hit_overhang, and the value I set was 0.25 (0.1~0/25 was recommended at http://www.ncbi.nlm.nih.gov/books/NB...stHits_filteri)
Compared the results without -best_hit_overhang, indeed, the number of matches for each inquery was eliminated a lot, however, what I want is to output just one best hit result (according to the bit score and evalue).
For example, the output result was like
***************************************************************
gi|270118727|emb|CAT18807.1| gi|323452346|gb|EGB08220.1| 26.15 65 48 0 31 95 835 899 0.05 35.4
gi|270118727|emb|CAT18807.1| gi|323451569|gb|EGB07446.1| 28.77 73 46 2 226 298 14 80 0.42 31.2
gi|270118727|emb|CAT18807.1| gi|323445796|gb|EGB02232.1| 30.65 62 43 0 349 410 111 172 1.8 29.3
gi|270118727|emb|CAT18807.1| gi|323449782|gb|EGB05667.1| 29.27 41 29 0 132 172 105 145 2.1 29.6
gi|270118727|emb|CAT18807.1| gi|323448614|gb|EGB04510.1| 26.92 52 38 0 189 240 15 66 6.5 27.7
***************************************************************
And I just want the top best one,
***************************************************************
gi|270118727|emb|CAT18807.1| gi|323452346|gb|EGB08220.1| 26.15 65 48 0 31 95 835 899 0.05 35.4
***************************************************************
Would anyone please help to tell me how to do that?
Latest Articles
Collapse
-
by seqadmin
Developments in sequencing technologies and methodologies have transformed the field of epigenetics, giving researchers a better way to understand the complex world of gene regulation and heritable modifications. This article explores some of the diverse sequencing methods employed in the study of epigenetics, ranging from classic techniques to cutting-edge innovations while providing a brief overview of their processes, applications, and advances.
Methylation Detect...-
Channel: Articles
05-31-2023, 10:46 AM -
-
Differential Expression and Data Visualization: Recommended Tools for Next-Level Sequencing Analysisby seqadmin
After covering QC and alignment tools in the first segment and variant analysis and genome assembly in the second segment, we’re wrapping up with a discussion about tools for differential gene expression analysis and data visualization. In this article, we include recommendations from the following experts: Dr. Mark Ziemann, Senior Lecturer in Biotechnology and Bioinformatics, Deakin University; Dr. Medhat Mahmoud Postdoctoral Research Fellow at Baylor College of Medicine;...-
Channel: Articles
05-23-2023, 12:26 PM -
-
by seqadmin
Continuing from our previous article, we share variant analysis and genome assembly tools recommended by our experts Dr. Medhat Mahmoud, Postdoctoral Research Fellow at Baylor College of Medicine, and Dr. Ming "Tommy" Tang, Director of Computational Biology at Immunitas and author of From Cell Line to Command Line.
Variant detection and analysis tools
Mahmoud classifies variant detection work into two main groups: short variants (<50...-
Channel: Articles
05-19-2023, 10:03 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 06-01-2023, 08:56 PM
|
0 responses
10 views
0 likes
|
Last Post
by seqadmin
06-01-2023, 08:56 PM
|
||
Deep Sequencing Unearths Novel Genetic Variants: Enhancing Precision Medicine for Vascular Anomalies
by seqadmin
Started by seqadmin, 06-01-2023, 07:33 AM
|
0 responses
9 views
0 likes
|
Last Post
by seqadmin
06-01-2023, 07:33 AM
|
||
Unveiling Genetic Associations Through Transcription Factor Binding Quantitative Trait Loci
by seqadmin
Started by seqadmin, 05-31-2023, 07:50 AM
|
0 responses
6 views
0 likes
|
Last Post
by seqadmin
05-31-2023, 07:50 AM
|
||
Exploring French-Canadian Ancestry: Insights into Migration, Settlement Patterns, and Genetic Structure
by seqadmin
Started by seqadmin, 05-26-2023, 09:22 AM
|
0 responses
11 views
0 likes
|
Last Post
by seqadmin
05-26-2023, 09:22 AM
|
Leave a comment: