Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Acquiring contigs, de Bruijn graphs, velvet

    Hi guys. Could someone show me on some example how we identify that certain node(s) in the graph represent one single contig?

  • #2
    I'm not completely clear on what you are asking but I think you want to know how to decide when to merge distinct nodes in the de Bruijn graph into a single node. This is done when two nodes are unambiguously connected. In other words, if two nodes x and y are connected by an edge and neither x nor y branches, then they can be merged. I've attached a simple diagram of this. In the diagram the connected blue nodes can be merged together, so can the middle red nodes. In this case 19 nodes would be merged into 5 "contigs".

    Let me know if that isn't clear or if you are asking something else.
    Attached Files

    Comment


    • #3
      Let me know if that isn't clear or if you are asking something else.
      Thank you very much. That's exactly what I'm interested in. Identifying and acquiring contigs and then using scaffolding to merge them and get the original DNA.
      In this case 19 nodes would be merged into 5 "contigs".
      Ok. If I'm getting it correctly then the image below represents just 1 single contig, because this graph fragment(although it is a full graph itself) can be unambiguously assembled into just one node, namely the one having the following sequence: TAGTCGAGGCTTTAGATCCGATGAGGCTTTAGAGACAG. Or I'm getting it wrong and there are actually 4 contigs?
      Attached Files

      Comment


      • #4
        Originally posted by bioinf View Post
        Ok. If I'm getting it correctly then the image below represents just 1 single contig, because this graph fragment(although it is a full graph itself) can be unambiguously assembled into just one node, namely the one having the following sequence: TAGTCGAGGCTTTAGATCCGATGAGGCTTTAGAGACAG. Or I'm getting it wrong and there are actually 4 contigs?
        Not quite since the graph contains a loop. Label the nodes as follows:

        x = TAGTCGAG
        y = GAGGCTTTAGA
        z = AGAGACAG
        w = AGATCCGAGATGAG

        Note that node y branches in both directions (to x/w on its left and to w/z on its right). This branch means that the graph cannot be unambiguously simplified further.

        In particular, the path of the assembly that you suggest is:

        x -> y -> w -> y -> z

        This is ambiguous as the following is also a valid assembly:

        x -> y -> w -> y -> w -> y -> z

        The second assembly is the same as the first except it travels through the y/w loop twice. In general there is no way to know how many times to travel through the loop so most assemblers will output 4 contigs here.
        Last edited by jts; 01-10-2011, 05:05 AM.

        Comment


        • #5
          I see. I guess even the mate-pairs can't help in such graph. The only solution in this case is to have the information about the length of the original DNA strand. Then we can deduce the number of times the repetition occured.

          What is generally done in such cases? What is the common approach?

          Comment


          • #6
            It varies assembler to assembler. Some will select the most likely number of copies of the repeat based on read pairs spanning the loop and the insert size distribution. Others will just built a scaffold of x,z and leave the sequence inbetween as a run of "N"s.

            Comment


            • #7
              Now everything is clear. Thank you.

              Comment


              • #8
                Simple model of De Bruijn graph

                Dear jts or bioinf,

                I have a trouble in understanding de Bruijn graph because the concept of graph is vague for me. What does it mean when we say k-mer acts as edge and k-1 mers act as edges? How the use of k-1 mer is used instead of only k mer? Do you have simpler model of assembly using de Bruijn graph? I would appreciate if you can help.

                Thanks for your time,
                Scientist1

                Comment


                • #9
                  Sorry, my first question is:
                  What does it mean when we say k-mer acts as edge and k-1 mers act as nodes?

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Essential Discoveries and Tools in Epitranscriptomics
                    by seqadmin


                    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                    Today, 07:01 AM
                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  37 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  41 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  35 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  54 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X