Seqanswers Leaderboard Ad

**cstack** · 03-08-2018, 10:39 AM

I am not 100% clear on what additional data you are trying to pull out from the blastn output.

Could you post a sample of the blastn output, and maybe a rough schematic of what you wish the them to look like after processing?

**sombrajo** · 03-10-2018, 04:09 AM

This is an example of how my blastn output looked like:

8_124616757-124616839- gi|146149349|gb|AC197214.8| GCCTCCTTAGCGTAGTAGGTAGCACGTCAGTCTCATAATCTGAAGATTTCAACAACTGAGTGCCTCATTGCTCAAGGAGTGAA 180078 179996 Macaca mulatta 5e-30
8_124616757-124616839- gi|54908|emb|X04525.1| GCCTCCTTAGCGCAGTAGGTAGCGCGTCAGTCTCATAATCTGAAG 1 45 Mus musculus 3e-12
8_124616757-124616839- gi|51571999|gb|AC119854.7| GCCTCCTTAGCGCAGTAGGCAGCGCGTCAGTCTCATAATCTGAAGAT 74096 74142 Mus musculus 1e-11
8_124616757-124616839- gi|37651859|gb|AC101490.8| GCCTCCTTAGCGCAGTAGGCAGCGCGTCAGTCTCATAATCTGAAGAT 223791 223837 Mus musculus 1e-11

As I understand this, each row corresponds to a hit, so a match was found for each of the accessions that are shown, then it shows the exact match in terms of DNA sequence, followed by the start and end position of the match within the piece of DNA represented by this accession number, the problem being that I took all accession numbers of all mammals in the nucleotide database to blast against, so those accessions may correspond to genes, contigs or whole chromosomes submitted to the NCBI database, and sometimes there is no information about the genomic location of these sequences. I just wanted to know, for each hit, where exactly in the mammal genome it belongs in terms of genomic coordinates, but I don't think that blast keeps track of this information so there is no direct way of getting what I want from the blast output. Anyways, I ended up retrieving whatever information was available for each hit by means of its accession (esearch | efetch), and since some of them, as I said, don't contain information about chromosomic location, I just gave up on that.

**Markiyan** · 03-12-2018, 06:32 AM

My 5 cents:

1. It is better to have a dedicated filtered blast database(es) than working with a complete nt/nr database, especially if you are searching against the small subset of sequences - filter the input nt.fasta file to match your criteria (like genus/species name) and formatdb/makeblastdb it.

2. if you want to be able to extract locus hit coordinates in the genome of your choice than you either have to blast your sequences against the selected version of the complete genome sequence - like whole human or mouse chromosomes, or pre process the input fasta file used for the database creation and add the chromosome ID and start/stop to each of the fasta ID's in the blast db.
EX:

>[NCBI fasta header] chr=chr1 start=10000000 stop=10020000

Than this would give you the global coordinates of the subject in your genome of choice

PS: If blast+ does not like = signs in the fasta header - than use : or similar to separate variable from the value.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 19 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 17 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 49 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Retrieve genomic coordinates (locus start, locus end) from blastn hits?

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News