Hi,
I want to use the Celera Assembler (WGS) in my assembly pipeline in order to compare the results to Phred / Phrap. I read that to vector / quality trim my reads, I should use Lucy, but on this point I am confused.
What is the "sequence of the vector splice site"?
I am reading this: http://www.cbcb.umd.edu/research/CeleraAssembler.shtml
"Each vector file [one per vector] must be accompanied by a splice site file containing the sequence within the vector that is adjacent to the splice sites used in the project. In case your project uses an adapter it should be included in the splice file. ... The vector file must contain a single FASTA-formatted sequence representing the entire sequencing vector. The splice file contains 4 FASTA records corresponding to approximately 200 bp flanking either side of the splice site, presented in both the forward and reverse-complemented orientation."
Unfortunately I don't understand what this means, specifically, what is the splice site file and how do I identify the splice sites? Typically will this refer to the sequencing vector or the cloning vector (BAC)?
The project uses the pSMART-HCKan (AF532107) sequencing vector from the Lucigen CLONESMART Blunt Cloning Kit ... does that mean anything to anyone?
Should I just use the 200 bp either side of the primer sites?
Sorry for the potentially very dumb question!
Dan.
I want to use the Celera Assembler (WGS) in my assembly pipeline in order to compare the results to Phred / Phrap. I read that to vector / quality trim my reads, I should use Lucy, but on this point I am confused.
What is the "sequence of the vector splice site"?
I am reading this: http://www.cbcb.umd.edu/research/CeleraAssembler.shtml
"Each vector file [one per vector] must be accompanied by a splice site file containing the sequence within the vector that is adjacent to the splice sites used in the project. In case your project uses an adapter it should be included in the splice file. ... The vector file must contain a single FASTA-formatted sequence representing the entire sequencing vector. The splice file contains 4 FASTA records corresponding to approximately 200 bp flanking either side of the splice site, presented in both the forward and reverse-complemented orientation."
Unfortunately I don't understand what this means, specifically, what is the splice site file and how do I identify the splice sites? Typically will this refer to the sequencing vector or the cloning vector (BAC)?
The project uses the pSMART-HCKan (AF532107) sequencing vector from the Lucigen CLONESMART Blunt Cloning Kit ... does that mean anything to anyone?
Should I just use the 200 bp either side of the primer sites?
Sorry for the potentially very dumb question!
Dan.
Comment