Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Shorash
    Member
    • Sep 2012
    • 18

    Reciprocal blast help

    I have 2 datasets (Dataset A and dataset B) in which I have reciprocally allocated as BLASTx hits against one another. However I am having difficulties in identifying those contigs that are hits to one another from both datasets. I have been able to export the BLAST results into sequence tables but I'm not sure how I can identify the reciprocal top hits of one another for a large number of contigs (160,000). Here is an example of how it looks in an Excel spreadsheet:

    aaaaaDataset Aaaaaaaaaaaaaaaaa Dataset B
    Contig123=Contig789_1 Contig789=111Contig123_1
    Contig456=72Contig221 Contig221=Contig456_3
    Contig777=43Contig954 Contig954=3Contig1561_1


    In the example above you can see that the results or hit from each file have characters on the beginning and sometimes on the end of each corresponding hit making it hard to compare using excel formulas. In the example, the first two rows are the ones I'm interested in extracting as they have hit the same contig in both datasets, unlike row 3 which do not match.

    Any help would be greatly appreciated!
    Last edited by Shorash; 07-15-2014, 08:40 PM.
  • bio_boris
    Member
    • Mar 2013
    • 14

    #2
    What have you come up with so far? Do you have any scripts to use or code that you have tried to create?

    Some resources


    Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

    Comment

    • someperson
      Junior Member
      • Jul 2013
      • 9

      #3
      Not really a direct answer to your question, but a tip for a tool that probably already does what you want:
      I'm mostly using the tool proteinortho (curren version proteinortho5) for reciprocal blast analyses.
      A strong advantage of this tool is, that it does not only direct orthologs via direct reziprocal blast, but can also list the respective paralogs and group them into "orthologeous groups".
      links:

      Comment

      • Shorash
        Member
        • Sep 2012
        • 18

        #4
        Originally posted by bio_boris View Post
        What have you come up with so far? Do you have any scripts to use or code that you have tried to create?

        Some resources


        http://seqanswers.com/forums/showthread.php?t=20652
        I haven't managed to create any scripts or codes. I've been manually looking at specific genes of interest but it would be great to be able to do all of them at once.

        Comment

        • Shorash
          Member
          • Sep 2012
          • 18

          #5
          Originally posted by someperson View Post
          Not really a direct answer to your question, but a tip for a tool that probably already does what you want:
          I'm mostly using the tool proteinortho (curren version proteinortho5) for reciprocal blast analyses.
          A strong advantage of this tool is, that it does not only direct orthologs via direct reziprocal blast, but can also list the respective paralogs and group them into "orthologeous groups".
          links:

          http://www.biomedcentral.com/1471-2105/12/124
          Great thanks for that, I'll give this a try.

          Comment

          • rhinoceros
            Senior Member
            • Apr 2013
            • 372

            #6
            Originally posted by Shorash View Post
            how I can identify the reciprocal top hits of one another for a large number of contigs (160,000).
            I would go for tabular blast output and first sort for best hits. So then, depending how you did your blasts, you can have e.g. two best-hit sorted output files with query in the first column and subject in the second. One option would be to cut columns 1-2 and switch the the order in one file and then cat it with the other file. Then you'd sort based on column 1 and only output the lines where uniq -c is 2. I'm sure there's an awk one-liner for this too..
            savetherhino.org

            Comment

            Latest Articles

            Collapse

            • SEQadmin2
              Nine Things a Sample Prep Scientist Thinks About Before Sequencing
              by SEQadmin2


              I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


              Here are nine questions we think about, in roughly the order they matter, before...
              Today, 07:11 AM
            • SEQadmin2
              From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
              by SEQadmin2


              Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


              The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
              ...
              06-02-2026, 10:05 AM
            • SEQadmin2
              Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
              by SEQadmin2


              With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


              Introduction

              Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
              05-22-2026, 06:42 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by SEQadmin2, Yesterday, 06:09 AM
            0 responses
            16 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-09-2026, 11:58 AM
            0 responses
            37 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-05-2026, 10:09 AM
            0 responses
            42 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-04-2026, 08:59 AM
            0 responses
            49 views
            0 reactions
            Last Post SEQadmin2  
            Working...