Hi guys. Could someone show me on some example how we identify that certain node(s) in the graph represent one single contig?
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
I'm not completely clear on what you are asking but I think you want to know how to decide when to merge distinct nodes in the de Bruijn graph into a single node. This is done when two nodes are unambiguously connected. In other words, if two nodes x and y are connected by an edge and neither x nor y branches, then they can be merged. I've attached a simple diagram of this. In the diagram the connected blue nodes can be merged together, so can the middle red nodes. In this case 19 nodes would be merged into 5 "contigs".
Let me know if that isn't clear or if you are asking something else.Attached Files
-
Let me know if that isn't clear or if you are asking something else.
In this case 19 nodes would be merged into 5 "contigs".Attached Files
Comment
-
Originally posted by bioinf View PostOk. If I'm getting it correctly then the image below represents just 1 single contig, because this graph fragment(although it is a full graph itself) can be unambiguously assembled into just one node, namely the one having the following sequence: TAGTCGAGGCTTTAGATCCGATGAGGCTTTAGAGACAG. Or I'm getting it wrong and there are actually 4 contigs?
x = TAGTCGAG
y = GAGGCTTTAGA
z = AGAGACAG
w = AGATCCGAGATGAG
Note that node y branches in both directions (to x/w on its left and to w/z on its right). This branch means that the graph cannot be unambiguously simplified further.
In particular, the path of the assembly that you suggest is:
x -> y -> w -> y -> z
This is ambiguous as the following is also a valid assembly:
x -> y -> w -> y -> w -> y -> z
The second assembly is the same as the first except it travels through the y/w loop twice. In general there is no way to know how many times to travel through the loop so most assemblers will output 4 contigs here.Last edited by jts; 01-10-2011, 05:05 AM.
Comment
-
I see. I guess even the mate-pairs can't help in such graph. The only solution in this case is to have the information about the length of the original DNA strand. Then we can deduce the number of times the repetition occured.
What is generally done in such cases? What is the common approach?
Comment
-
It varies assembler to assembler. Some will select the most likely number of copies of the repeat based on read pairs spanning the loop and the insert size distribution. Others will just built a scaffold of x,z and leave the sequence inbetween as a run of "N"s.
Comment
-
Simple model of De Bruijn graph
Dear jts or bioinf,
I have a trouble in understanding de Bruijn graph because the concept of graph is vague for me. What does it mean when we say k-mer acts as edge and k-1 mers act as edges? How the use of k-1 mer is used instead of only k mer? Do you have simpler model of assembly using de Bruijn graph? I would appreciate if you can help.
Thanks for your time,
Scientist1
Comment
Latest Articles
Collapse
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
-
by seqadmin
Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...-
Channel: Articles
03-22-2024, 06:39 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
25 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
29 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
||
Started by seqadmin, 04-10-2024, 09:21 AM
|
0 responses
24 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 09:21 AM
|
||
Started by seqadmin, 04-04-2024, 09:00 AM
|
0 responses
52 views
0 likes
|
Last Post
by seqadmin
04-04-2024, 09:00 AM
|
Comment