Seqanswers Leaderboard Ad

**whataBamBam** · 11-15-2013, 12:23 AM

If I understand what your after correctly then I do the same using this..

cat BlastOutputFile | grep -B 3 -A 1 "# 1 hits found"

When BlastOutputFile is in outformat 7. Obviously if your using Xml output or any other format it'll be different.

**Kennels** · 11-15-2013, 01:52 AM

Originally posted by whataBamBam View Post

If I understand what your after correctly then I do the same using this..

cat BlastOutputFile | grep -B 3 -A 1 "# 1 hits found"

When BlastOutputFile is in outformat 7. Obviously if your using Xml output or any other format it'll be different.

Thanks bambam, but that's not what I'm after.

Yes that will pull out those with 1 match, but what I'm after is those that have only have one alignment 'section' (i.e. high-scoring segment pair).

e.g. an isoform could match to only one target in the database, but half of it could be aligning to to the 5' end, then a gap, then the other half the 3' end of the target.

I'm after hits which aligns without any interruptions, i.e. only 1 HSP. Blastn options don't seem to do what I want (e.g. -ungapped, -culling_limit, -max_hsps_per_subject just mask the information).

**sphil** · 11-15-2013, 02:46 AM

Originally posted by Kennels View Post

I'm after hits which aligns without any interruptions, i.e. only 1 HSP. Blastn options don't seem to do what I want (e.g. -ungapped, -culling_limit, -max_hsps_per_subject just mask the information).

So you mean you want to parse for full-length hsp?

**Kennels** · 11-15-2013, 02:56 AM

Yes and no.
Also looking for partial alignment, but only those with single hsp.

**atcghelix** · 11-15-2013, 11:33 AM

I usually use BioPerl for something like this. The following will get you close (the formatting of the output blast file generated by Bio::SearchIO::Writer::TextResultWriter is slightly different, but basically the same).

Usage would be:
perl script.pl --blast inputBlastFile.txt --out reducedBlastFile.txt

Code:

#!/usr/bin/perl

use strict;
use warnings;
use Getopt::Long;
use Bio::SearchIO;
use Bio::SearchIO::Writer::TextResultWriter;

my $blastFile;
my $outFile;

GetOptions("blast=s" => \$blastFile,
           "out=s"   => \$outFile) || die "Couldn't get options with GetOpt::Long.\n";

my $blastIO = Bio::SearchIO->new(-file => $blastFile,
                                 -format => 'blast'); #you might need to change this

my $searchIOWriter = Bio::SearchIO::Writer::TextResultWriter->new();
my $out = Bio::SearchIO->new(-writer => $searchIOWriter,
                             -file => '>outtest.txt');

while (my $result = $blastIO->next_result) {
    if ($result->num_hits() == 1) {
        my $hit = $result->next_hit();
        if ($hit->num_hsps() == 1) {
            print "A single HSP for a single hit found for " . $result->query_name() . "\n";
            $out->write_result($result);
        }
    }
}

**Kennels** · 11-16-2013, 02:44 AM

Thanks I'll give it a try.

**lindenb** · 11-16-2013, 03:18 AM

if you used the XML output of blast, you could use the following XSLT stylesheet:

Code:

<?xml version='1.0'  encoding="ISO-8859-1"?>
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform' version='1.0' >
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:apply-templates select="BlastOutput/BlastOutput_iterations/Iteration/Iteration_hits/Hit"/>
</xsl:template>
<xsl:template match="Hit">
<xsl:if test="count(Hit_hsps/Hsp) = 1">
<xsl:value-of select="Hit_id"/><xsl:text>	</xsl:text>
<xsl:value-of select="Hit_def"/><xsl:text>	</xsl:text>
<xsl:value-of select="Hit_accession"/><xsl:text>
</xsl:text>
</xsl:if>
</xsl:template>
</xsl:stylesheet>

usage:

Code:

$ xsltproc --novalid  stylesheet.xsl  blast.xml

**whataBamBam** · 11-17-2013, 05:06 PM

Originally posted by Kennels View Post

Thanks bambam, but that's not what I'm after.

Yes that will pull out those with 1 match, but what I'm after is those that have only have one alignment 'section' (i.e. high-scoring segment pair).

e.g. an isoform could match to only one target in the database, but half of it could be aligning to to the 5' end, then a gap, then the other half the 3' end of the target.

I'm after hits which aligns without any interruptions, i.e. only 1 HSP. Blastn options don't seem to do what I want (e.g. -ungapped, -culling_limit, -max_hsps_per_subject just mask the information).

Ah sorry I get you. Yes that would be really useful. I'll have a look at the script posted and see what that's doing.

I think Gmap will do something like that.. it isn't really designed to match pairs of transcript length sequences (it's designed to map a transcript length sequence to a genome) but I actually do use it to map transcripts to ESTs and it seems to work really well. I then filter the output based on length of the alignment and can find the matches where the EST aligned along it's whole length without any significant gaps. Which is probably quite similar to what your trying to do.

Possibly I really shouldn't be using Gmap to do this so if someone can tell me that then great because I actually do it all the time.

**Kennels** · 11-17-2013, 11:17 PM

Originally posted by atcghelix View Post

I usually use BioPerl for something like this. The following will get you close (the formatting of the output blast file generated by Bio::SearchIO::Writer::TextResultWriter is slightly different, but basically the same).

Just letting you know the script did what I wanted.
I just modified the script a little:

Code:

#!/usr/bin/perl

use strict;
use warnings;
use Getopt::Long;
use Bio::SearchIO;
use Bio::SearchIO::Writer::TextResultWriter;

my $blastFile = $ARGV[0];     ## added one argument
my $outFile;

GetOptions("blast=s" => \$blastFile,
           "out=s"   => \$outFile) || die "Couldn't get options with GetOpt::Long.\n";

my $blastIO = Bio::SearchIO->new(-file => $blastFile,
                                 -format => 'blasttable'); ### changed from 'blast'

my $searchIOWriter = Bio::SearchIO::Writer::TextResultWriter->new();
my $out = Bio::SearchIO->new(-writer => $searchIOWriter,
                             -file => '>outtest.txt');

while (my $result = $blastIO->next_result) {
    if ($result->num_hits() == 1) {
        my $hit = $result->next_hit();
        if ($hit->num_hsps() == 1) {
            
	print $result->query_name() . "\n";   ## just print the result ID
            $out->write_result($result);
        }
    }
}

Since my output was in -outfmt 6 from blastn, I changed the format to 'blasttable'
Also I just wanted the ID of the query for single HSP, so I just modified the stdout and redirect to a file to get my list of IDs.
I ran as:

Code:

parse_blastreport_single_hsp.pl Result.blastn6 > singleHSPs.txt

For others, also need to make sure the output has '-max_target_seqs' in blastn is set to more than 1.

Many thanks!

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Today, 11:49 AM	0 responses 12 views 0 likes	Last Post by seqadmin Today, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

how to identify blast hits with only 1 HSP (not limit the number of HSP)

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News