Unconfigured Ad

**bioinf** · 01-05-2011, 02:12 PM

Can someone please explain why we need to have the HashMap and store there the id of the first read where that k-mer was encountered? Is it not just sufficient to walk the graph and write down the k-mers to build up the original sequence? What is this HashMap else used for?

**Zigster** · 01-06-2011, 08:45 AM

I don't know where you got the word "HashMap" from - I think that is Java. Any association between reads and their kmers is for the purposes of paired-end resolution and read usage statistics.

Are you going to put this presentation up somewhere?

**bioinf** · 01-06-2011, 10:20 AM

I'm no biologist I'm a programmer. Hash map is not related to any specific language(Java, C++ etc), it is a data structure for a O(1) constant time access to an element (at least in the best case). The article describes that we keep the info about the first occurence of the k-mer in the hashmap. What I don't get is why we would need this information for a traceback? I can assemble the sequence by just following the arcs and writing down the k-mers. Why would I need an information about the reads which are represented by those k-mers after the graph is already constructed. Is it meant that the hashmap is needed for the construction itself and only? (question to all who might know)

Are you going to put this presentation up somewhere?

Of course. This is my seminar presentation at the Uni.

Any association between reads and their kmers is for the purposes of paired-end resolution and read usage statistics.

It can't be used for the usage statistics, since the hashmap contains the information about only the first read where certain k-mer is found. There might be several reads with the same k-mer, but at our disposal is the information of the location of only one such read.

Intuitively I think that it is done to link up all the reads which have such k-mer. Read set is analyzed one-by-one and each k-mer is added to the hash map in form of the id of the first read where it was found. Any subsequent requests in another reads for the storage of the same k-mer are denied. Afterwards when all information is stored we walk all reads again. Each time k-mer of some read is retrieved it is being looked up in the hashmap and there we find the id of the read where it was found for the first time so we can link these reads. The same is done further. We get such one-to-many correspondance. That's what I assume from the paper since it is stated unclear in it but I can't present my assumptions on the slides.

**bioinf** · 01-08-2011, 10:03 AM

If going back to the biological details. Could you please explain how repeats in the DNA lead to the gaps between contigs? Yes they are overlapped although they shouldn't be, but how does it lead to "gaps"? Since velvet cuts all tips longer than 2k, then whenever a repeat with a big portion of sequence after it is overlapped to the k-mer which was found earlier such "tip" will be discarded.

**parit** · 01-28-2011, 06:39 AM

@bioinf: I am not sure I fully get your question but here are my two cents. If there is a repeat then either there will be a node reported with a coverage higher than the expected coverage or there will be a loop. In the later case, assembler, while making contigs, dont know the frequency of the repeat and hence cannot connect the contigs to the right and left of the repeat and therefore report them as 2 different contigs with a gap in between...
As far as the tips are concerned, I couldnt connect "tips" with "repeats" as I thought tips occur when there is a sequencing error at the end of the read. It has nothing to do with repeat.
Please do correct me if I am wrong as I am also trying to understand the logic of velvet.
Can you also post your presentation or email me?

- Parit

**Zigster** · 01-28-2011, 06:44 AM

yes please post it

**boetsie** · 01-28-2011, 08:24 AM

For repeats, you can have a look at his dissertation

EMBL-EBI Training

http://www.ebi.ac.uk/training/ftp/PhDtheses/Daniel_Zerbino.pdf

We train scientists at all levels to get the most out of publicly available biological data.

See Chapter 4. Hope this makes it more clear.

Boetsie

**Zigster** · 03-08-2011, 07:48 AM

Is this presentation available?

**parit** · 03-10-2011, 01:56 AM

dude seem to have vanished :O hope presentation went fine.

**Jenzo** · 05-19-2011, 11:47 PM

Hey guys,
was anyone able to compile Velvet 1.1.04, released yesterday by D. Zerbino?

Code:

src/readSet.c:34: fatal error: zlib.h: File or directory not found
compilation terminated.

Hope someone has an idea, thanks a lot!

Edit: Problem is solved, thanks a lot!

**nilshomer** · 05-20-2011, 09:11 AM

Originally posted by Jenzo View Post

Hey guys,
was anyone able to compile Velvet 1.1.04, released yesterday by D. Zerbino?

Code:

src/readSet.c:34: fatal error: zlib.h: File or directory not found
compilation terminated.

Hope someone has an idea, thanks a lot!

Edit: Problem is solved, thanks a lot!

So what was the solution?

**Thorondor** · 05-20-2011, 09:27 AM

you can copy the *.o files in third-party/zlib-1.2.3 from an older velvet version. I am pretty sure that they did not changed.

**dp05yk** · 05-20-2011, 10:53 AM

Originally posted by nilshomer View Post

So what was the solution?

I'm going to hazard a guess that they had to either install zlib or modify the makefile to link up correctly.

**Jenzo** · 05-20-2011, 11:15 AM

Daniel Zerbino wrote today:

Dear all,

my sincere apologies for the compilation bug which was lying in the
recently updated code. I have just updated the repositories. Thanks to
Sylvain Forêt for quickly correcting it.
[...]
Regards,

Daniel

**Thorondor** · 05-21-2011, 01:44 AM

yup Jenzo, also did get this email, but the oases compilation bug "src/readSet.c:34: fatal error: zlib.h: File or directory not found compilation terminated." is still there. ;-)

Topics	Statistics	Last Post
High-Resolution Sequencing Exposes Hidden Toxoplasma Diversity by SEQadmin2 Started by SEQadmin2, 07-02-2026, 11:08 AM	0 responses 16 views 0 reactions	Last Post by SEQadmin2 07-02-2026, 11:08 AM
New AI Model Captures Long-Range Genomic Signals to Improve RNA Splice Site Prediction by SEQadmin2 Started by SEQadmin2, 06-30-2026, 05:37 AM	0 responses 17 views 0 reactions	Last Post by SEQadmin2 06-30-2026, 05:37 AM
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 20 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 54 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM

Unconfigured Ad

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News