Dear experts,
I am working on metagenomic datasets generated from Illumina HiSeq. I normally use BLAST (either BLASTn or BLASTx) for annotating assembled contigs using e-value cutoff. I only depend on e-value for hits and haven't carefully looked at other parameters. Today I extracted some of hits having % identity of 100. When I looked at alignment length between query sequence (over 10 Kbp) and reference sequence (viral genomes), it is quite short than I expected. There were many hits with alignment length of 30 bp and % identity of 100. For me, only 30 bp of alignment length is not long enough to be called as 100% of identity. So my question is, shouldn't I rely on only e-value and set up a cutoff for alignment length? Or is it common to have that short alignment length? Thank you for your help in advance.
I am working on metagenomic datasets generated from Illumina HiSeq. I normally use BLAST (either BLASTn or BLASTx) for annotating assembled contigs using e-value cutoff. I only depend on e-value for hits and haven't carefully looked at other parameters. Today I extracted some of hits having % identity of 100. When I looked at alignment length between query sequence (over 10 Kbp) and reference sequence (viral genomes), it is quite short than I expected. There were many hits with alignment length of 30 bp and % identity of 100. For me, only 30 bp of alignment length is not long enough to be called as 100% of identity. So my question is, shouldn't I rely on only e-value and set up a cutoff for alignment length? Or is it common to have that short alignment length? Thank you for your help in advance.
Comment