Can someone please explain why we need to have the HashMap and store there the id of the first read where that k-mer was encountered? Is it not just sufficient to walk the graph and write down the k-mers to build up the original sequence? What is this HashMap else used for?
Unconfigured Ad
Collapse
X
-
I don't know where you got the word "HashMap" from - I think that is Java. Any association between reads and their kmers is for the purposes of paired-end resolution and read usage statistics.
Are you going to put this presentation up somewhere?
Comment
-
-
I'm no biologist I'm a programmer. Hash map is not related to any specific language(Java, C++ etc), it is a data structure for a O(1) constant time access to an element (at least in the best case). The article describes that we keep the info about the first occurence of the k-mer in the hashmap. What I don't get is why we would need this information for a traceback? I can assemble the sequence by just following the arcs and writing down the k-mers. Why would I need an information about the reads which are represented by those k-mers after the graph is already constructed. Is it meant that the hashmap is needed for the construction itself and only? (question to all who might know)
Of course. This is my seminar presentation at the Uni.Are you going to put this presentation up somewhere?
It can't be used for the usage statistics, since the hashmap contains the information about only the first read where certain k-mer is found. There might be several reads with the same k-mer, but at our disposal is the information of the location of only one such read.Any association between reads and their kmers is for the purposes of paired-end resolution and read usage statistics.
Intuitively I think that it is done to link up all the reads which have such k-mer. Read set is analyzed one-by-one and each k-mer is added to the hash map in form of the id of the first read where it was found. Any subsequent requests in another reads for the storage of the same k-mer are denied. Afterwards when all information is stored we walk all reads again. Each time k-mer of some read is retrieved it is being looked up in the hashmap and there we find the id of the read where it was found for the first time so we can link these reads. The same is done further. We get such one-to-many correspondance. That's what I assume from the paper since it is stated unclear in it but I can't present my assumptions on the slides.Last edited by bioinf; 01-06-2011, 10:57 AM.
Comment
-
-
If going back to the biological details. Could you please explain how repeats in the DNA lead to the gaps between contigs? Yes they are overlapped although they shouldn't be, but how does it lead to "gaps"? Since velvet cuts all tips longer than 2k, then whenever a repeat with a big portion of sequence after it is overlapped to the k-mer which was found earlier such "tip" will be discarded.Last edited by bioinf; 01-08-2011, 11:31 AM.
Comment
-
-
@bioinf: I am not sure I fully get your question but here are my two cents. If there is a repeat then either there will be a node reported with a coverage higher than the expected coverage or there will be a loop. In the later case, assembler, while making contigs, dont know the frequency of the repeat and hence cannot connect the contigs to the right and left of the repeat and therefore report them as 2 different contigs with a gap in between...
As far as the tips are concerned, I couldnt connect "tips" with "repeats" as I thought tips occur when there is a sequencing error at the end of the read. It has nothing to do with repeat.
Please do correct me if I am wrong as I am also trying to understand the logic of velvet.
Can you also post your presentation or email me?
- Parit
Comment
-
-
Hey guys,
was anyone able to compile Velvet 1.1.04, released yesterday by D. Zerbino?
Hope someone has an idea, thanks a lot!Code:src/readSet.c:34: fatal error: zlib.h: File or directory not found compilation terminated.
Edit: Problem is solved, thanks a lot!
Comment
-
-
So what was the solution?Originally posted by Jenzo View PostHey guys,
was anyone able to compile Velvet 1.1.04, released yesterday by D. Zerbino?
Hope someone has an idea, thanks a lot!Code:src/readSet.c:34: fatal error: zlib.h: File or directory not found compilation terminated.
Edit: Problem is solved, thanks a lot!
Comment
-
Latest Articles
Collapse
-
by GATTACATLove this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
-
Channel: Articles
07-01-2026, 11:43 AM -
-
by SEQadmin2
I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.
Here are nine questions we think about, in roughly the order they matter, before...-
Channel: Articles
-
ad_right_rmr
Collapse
News
Collapse
| Topics | Statistics | Last Post | ||
|---|---|---|---|---|
|
Started by SEQadmin2, 07-02-2026, 11:08 AM
|
0 responses
16 views
0 reactions
|
Last Post
by SEQadmin2
07-02-2026, 11:08 AM
|
||
|
Started by SEQadmin2, 06-30-2026, 05:37 AM
|
0 responses
17 views
0 reactions
|
Last Post
by SEQadmin2
06-30-2026, 05:37 AM
|
||
|
Started by SEQadmin2, 06-26-2026, 11:10 AM
|
0 responses
20 views
0 reactions
|
Last Post
by SEQadmin2
06-26-2026, 11:10 AM
|
||
|
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population
by SEQadmin2
Started by SEQadmin2, 06-17-2026, 06:09 AM
|
0 responses
54 views
0 reactions
|
Last Post
by SEQadmin2
06-17-2026, 06:09 AM
|
Comment