Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • blastn - qcovs field - or how to parse results based on % coverage of query sequence

    I have been playing around with blast+ (blastn), a local installation and various custom databases.

    I thought I had my workflow figured out, but some output is confusing me.

    Specifically the qcovs flag. As per the blast manual 'qcovs means Query coverage per subject' - i.e. how much of my query is represented in an alignment. I assumed this to be a percentage value (maxium 100%). And I have used this for filtering.

    But, now, I have done a local blast using a genome db, where the qcovs value goes up to 400! So clearly, it is not calculated in % ! Which means my previous filtering is probably crap...

    I basically want to do the following:

    Blast a set of sequences against dátabase 1. Filter blast result for: a) %idendity and b) alignment length and c) % of query sequence covered in alignment.

    I am basically not interested in alignments that cover 100% of the query, as I am doing breakpoint/insertion mapping. So I wanna filter these out and re-blast against database 2.

    Any ideas?

  • #2
    Nevermind... User error. I managed to mess up the columns while filtering the blast result...

    Comment


    • #3
      Using what options will produce the qcovs?

      Comment


      • #4
        Originally posted by okorist View Post
        Using what options will produce the qcovs?
        Manuals tend to be useful..
        savetherhino.org

        Comment


        • #5
          Indeed, using the -outfmt paramater, you can add all of the fields specified in the manual, see here from the manual:

          outfmt string 0

          alignment view options:
          0 = pairwise,
          1 = query-anchored showing identities,
          2 = query-anchored no identities,
          3 = flat query-anchored, show identities,
          4 = flat query-anchored, no identities,
          5 = XML Blast output,
          6 = tabular,
          7 = tabular with comment lines,
          8 = Text ASN.1,
          9 = Binary ASN.1
          10 = Comma-separated values
          11 = BLAST archive format (ASN.1)
          Options 6, 7, and 10 can be additionally configured to produce a custom format specified by space delimited format specifiers.
          The supported format specifiers are:
          qseqid means Query Seq-id
          qgi means Query GI
          qacc means Query accesion
          sseqid means Subject Seq-id
          sallseqid means All subject Seq-id(s), separated by a ';'
          sgi means Subject GI
          sallgi means All subject GIs
          sacc means Subject accession
          sallacc means All subject accessions
          qstart means Start of alignment in query
          qend means End of alignment in query
          sstart means Start of alignment in subject
          send means End of alignment in subject
          qseq means Aligned part of query sequence
          sseq means Aligned part of subject sequence
          evalue means Expect value
          bitscore means Bit score
          score means Raw score
          length means Alignment length
          pident means Percentage of identical matches
          nident means Number of identical matches
          mismatch means Number of mismatches
          positive means Number of positive-scoring matches
          gapopen means Number of gap openings
          gaps means Total number of gap
          ppos means Percentage of positive-scoring matches
          frames means Query and subject frames separated by a '/'
          qframe means Query frame
          sframe means Subject frame
          btop means Blast traceback operations (BTOP)
          staxids means unique Subject Taxonomy ID(s), separated by a ';'(in numerical order)
          sscinames means unique Subject Scientific Name(s), separated by a ';'
          scomnames means unique Subject Common Name(s), separated by a ';'
          sblastnames means unique Subject Blast Name(s), separated by a ';' (in alphabetical order)
          sskingdoms means unique Subject Super Kingdom(s), separated by a ';' (in alphabetical order)
          stitle means Subject Title
          salltitles means All Subject Title(s), separated by a '<>'
          sstrand means Subject Strand
          qcovs means Query Coverage Per Subject
          qcovhsp means Query Coverage Per HSP
          When not provided, the default value is:
          'qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore', which is equivalent to the keyword 'std'

          Comment


          • #6
            I think the qcov sums up the HSP lengths and divide it against query-length. If there is repeats in your query, sth bigger than 100% can show up. Because HSPs are repeatedly calculated. Is that your case?

            I have no solution for this problem, it seems complicated to program and filter the result.
            It will give you a bias towards bigger qcov. But I don't mind too much about it

            I wonder about what qcovhsp does though.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Recent Advances in Sequencing Technologies
              by seqadmin







              Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

              Long-Read Sequencing
              Long-read sequencing has...
              12-02-2024, 01:49 PM
            • seqadmin
              Genetic Variation in Immunogenetics and Antibody Diversity
              by seqadmin



              The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
              11-06-2024, 07:24 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 12-02-2024, 09:29 AM
            0 responses
            139 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 12-02-2024, 09:06 AM
            0 responses
            49 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 12-02-2024, 08:03 AM
            0 responses
            38 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 11-22-2024, 07:36 AM
            0 responses
            69 views
            0 likes
            Last Post seqadmin  
            Working...
            X