Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Blast against viral database vs NR

    Hi,

    I'm trying to blast(x) some contigs I've created. The problem is I have a ton of them so I'd like to reduce the search space by using a database of only viral sequences. I created a database like this, but the results seem off. For example, when I get a sequence that matches with a plant virus, taking that sequence and using NCBI's web blast interface against NR returns plant sequences.

    My thinking is that blast returns whatever sequences the query is close to, and when checking against viruses there are only viral sequences so it returns whatever viral sequences there are. When I take the sequence to NR, there are many plant sequences that are much closer so it returns those. Hopefully that makes sense.

    The main problem with this is that I can't trust my results. If I get a virus that I'm interested in after searching the viral database, I have to use NR to make sure that it's actually correct. So my question is, can I do anything to make sure when I'm searching against the viral database my results are actually accurate?

  • #2
    BLAST inherently will always give you a best match against whichever database you query. That's how the algorithm works. And things like bit scores and e values are dependent on the size of the database searched, so are not comparable between databases.

    If you only search against virus sequence, you will only get a virus as your best possible match. If you suspect plant contamination in your sequences, then you need to include both viral and plant to make any judgements about homology match.

    Homology is not a yes or no outcome. It is a judgement call based upon the similarity of the alignments you get back. Searching against only viral sequence when you know or suspect you have non-viral query sequences will inherently bias your interpretation of what you have.

    So I would say you should be using the NR database, regardless of the overhead that adds to your computations. Homology determination by sequence similarity and alignment is entirely dependent on what you query your data against. Using only viral sequences would only make sense if you had a strong basis for assuming your search sequences were uncontaminated by anything other than virus.

    P.S. your results against the viral-only database you made will actually be accurate, but they will be accurate for that specific query and reference set. Similarly, you results against NR will also be accurate and correct. BLAST and any homology matching are context dependent, always.
    Last edited by mbblack; 09-26-2014, 11:36 AM.
    Michael Black, Ph.D.
    ScitoVation LLC. RTP, N.C.

    Comment


    • #3
      What is not clear in the original post is if the sequence set is only supposed to have viral sequences. So if that is true then using the smallest database should make the search more sensitive.

      Comment


      • #4
        To complicate things, don't viruses sometimes have genes which are homologous to their hosts genes?

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        18 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        22 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        17 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        49 views
        0 likes
        Last Post seqadmin  
        Working...
        X