Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Inconsistent best reciprocal/bidirectional hit over 3 genomes: Where to put in venn?

    Hi everyone,

    currently I try to compare multiple genomes (3, a 4th might come).
    To do that, I do a best reciprocal/bidirectional blast hit on protein level (I know, there are some shortcomings).

    Now I have sometimes this "problem":



    I understand the biology, that's not my question.
    The question is: Where do I put this "hit" in my venn diagramm?



    It obviously doesn't go in the middle.
    I could make it 3 different hits, going in each of the overlaps between 2 genomes.
    But...is that correct? Because you'd have the proteins from organism B (and C) both in the comparison to A and to C, each meaning it's "unique" to this comparison, which also isn't true.

    What do I do with this?
    I'm sure someone else must have figured it out, but I guess my google skills are right now lacking a bit.

    If anyone could help me please....
    Attached Files

  • #2
    Can you provide a clear definition/limits of what you consider a primary/secondary "hit"? Is the secondary "hit" referring to a case when two proteins share a domain but are other wise unrelated. A protein like that may have to be considered unique for that organism.

    Comment


    • #3
      Nope, easier, I mean simply orthologues.
      Let's say e.g. (because I currently have that) that all proteins are some sort of kinases, e.g. acetate kinase, and in organism A there had been a duplication event a long time ago, whereas organism B + C only have one of them.
      Due to that the acetate kinase A1 has high homology with B, which has high homology with C (due to the lack of any other orthologues in B + C; hypothetical case), which has high homology with A2, but lower homology with A1, due to the hypothetical evolutionary distance of these proteins.

      That doesn't make all these proteins 1 entity, so no idea where to put them in the diagramm :/.

      Comment


      • #4
        I think I've got the same sort of issue. I'm still trying to figure out what I've got, though.

        For the case you give, I think it may be reasonable to split it up into a triplet and a singleton - assuming the two proteins in organism A are indeed similar. If the only output from the work is a Venn diagram with counts, then this is straightforward in the sense that no further action need be taken; you don't need to decide which of protein A1 and A2 is the full match and which is 'unique'.

        In generating RBH it seems to be a standard to accept any hit of over 99% (of bit-score) the highest hit as a RBH. This caters to detecting near-identical duplications, which otherwise might be rejected due to random vagaries in the matching algorithm.
        This finds (in my case very slightly) more RBH in the pairwise comparison than otherwise. However I haven't yet looked at the effect of propagating all near-best matches to the case of more than two genomes.

        Comment


        • #5
          triplet + singleton...well...the problem is than still the Venn diagram doesn't reflect properly your data. While it makes logical sense, it is not what you have.

          The final answer for my data was: If you data doesn't fit your approach, or your approach doesn't fit your data, then one has to change.
          Venn diagram just isn't really suitable.

          I was looking around how to make a properly weighted Venn diagramm for >3 organisms (because we had 4 at the end), and it turned out that this is apparently not really possible (at least with nice circles, and there are no packages for it, etc).
          In the same thread on Stackoverflow (link) someone mentioned some alternative visualization ways, and we finally went for a weighted network graph, which for the 4 organisms only displays RBHs between all 4 organisms, RBHs between each pair, and separate notes for the amount of unique proteins, avoiding the problems with inconsistent hits, without mis-representing the data. It gives also a nice overview, and if you play a bit with colour gradients, then it also looks really nice (I'd show what my coworker did with it, but I'd fear that someone steals the design ).

          I doubt though that it's nicely applicable for >4 organisms, because then you'll probably get spaghetti.

          Comment


          • #6
            Originally posted by bastianwur View Post
            triplet + singleton...well...the problem is than still the Venn diagram doesn't reflect properly your data. While it makes logical sense, it is not what you have.
            Provided you're willing to accept the 99% of best hit as a RBH (which I already have in the creation of it) and the group fits within that, I think this doesn't damage the data. Whether the data is actually a useful reflection of the situation is rather a different matter - the issue is really that RBH don't capture the duplications in the first place.
            On the other hand, in your original post one idea you rejected was to "make it 3 different hits, going in each of the overlaps between 2 genomes". I agree that would be bad, because it would mask the duplication.

            Venn diagramm
            I know it's a typo, but I really like this


            ... alternative visualization ways, and we finally went for a weighted network graph, which for the 4 organisms only displays RBHs between all 4 organisms, RBHs between each pair, and separate notes for the amount of unique proteins, avoiding the problems with inconsistent hits, without mis-representing the data.
            I like this idea, but I'm having trouble 'seeing' it. I get the RBH pairs between each genome, but you say you are showing proteins in common between all somehow but can't show uniques. Did you mean the opposite way round?

            Here is a sketch I made of what I think you are describing. Please forgive the rough nature and general ugliness. Is a pretty version of that what you meant?
            Click image for larger version

Name:	weightedNetworkGraph_demo.png
Views:	1
Size:	48.2 KB
ID:	304738

            I'm now wondering whether it's worth the pain of getting circos to render every RBH. It's kind of what it's for, and doing it would perhaps be interesting.

            Comment


            • #7
              Originally posted by Loris View Post
              Provided you're willing to accept the 99% of best hit as a RBH (which I already have in the creation of it) and the group fits within that, I think this doesn't damage the data. Whether the data is actually a useful reflection of the situation is rather a different matter - the issue is really that RBH don't capture the duplications in the first place.
              That is true, yeah.
              Still don't want to bother with it .
              Let's say I'd consider it somehow valid, but I'd not be happy with it myself.

              Originally posted by Loris View Post
              I know it's a typo, but I really like this

              German spelling of diagram, I have an excuse.

              Originally posted by Loris View Post
              I like this idea, but I'm having trouble 'seeing' it. I get the RBH pairs between each genome, but you say you are showing proteins in common between all somehow but can't show uniques. Did you mean the opposite way round?
              No, I'm showing the uniques, but not the "inconsistent" hits.

              Originally posted by Loris View Post
              Here is a sketch I made of what I think you are describing. Please forgive the rough nature and general ugliness. Is a pretty version of that what you meant?
              Yes, like this, but with an additional bubble in the middle, showing the consensus between all 4.

              Originally posted by Loris View Post
              I'm now wondering whether it's worth the pain of getting circos to render every RBH. It's kind of what it's for, and doing it would perhaps be interesting.
              For how many, and how good are your genomes?
              If they're only 3, all one chromosome, and you only want to have a rough idea where they are, then you can use the java application on www.hiveplot.net ...er...wait...the java webstart link, the downloadable jar file doesn't work.
              Else I have my own custom script to plot the genomes against each other, for some higher resolution, because the hiveplot program didn't allow me to zoom in enough, and I wanted to look at specific re-arrangements.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Recent Advances in Sequencing Analysis Tools
                by seqadmin


                The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                05-06-2024, 07:48 AM
              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Today, 06:35 AM
              0 responses
              9 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 02:46 PM
              0 responses
              15 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-07-2024, 06:57 AM
              0 responses
              14 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-06-2024, 07:17 AM
              0 responses
              18 views
              0 likes
              Last Post seqadmin  
              Working...
              X