Greetings, I am trying to assemble a small microbial genome that is approximately 1.6 Mbp in size. We used Illumina HiSeq technology with 72bp paired end reads. As a first pass at assembly, i used the velvet package. As well, i read through a few tutorials on pre-processing of Illumina data.
Downloading and installation was quite simple.
I initially followed nick loman's suggestions for pre-processing sequences.
Overall, our read qualities displayed median Q-values of 38, with slight decreases towards the 5' prime ends of the sequences.
However, I found it quite strange that I was not seeing any contigs greater than 60 bp, and the N50 values was around 24. Below are the initial velveth and velvetg commands I used.
tonybert$
velveth run1velveth_01022012/ 31 -fasta -shortPaired COLLAPSED.fasta
tonybert$
velvetg run2velveth_01022012/ -ins_length 300 -exp_cov 227
I assessed the contigs.fa file, and found only sequences of ~60bp. Quite disappointing.
Since then, I have tried using unfiltered reads raw reads, both paired and and single end, and I continually am getting the same results.
As well, i have tried adjusting the kmer length to 21, as well as running the data through with with no insert length estimation or expected coverage.
If anyone has any ideas of what the issues might be, i would sincerely appreciate any comments.
Downloading and installation was quite simple.
I initially followed nick loman's suggestions for pre-processing sequences.
Overall, our read qualities displayed median Q-values of 38, with slight decreases towards the 5' prime ends of the sequences.
However, I found it quite strange that I was not seeing any contigs greater than 60 bp, and the N50 values was around 24. Below are the initial velveth and velvetg commands I used.
tonybert$
velveth run1velveth_01022012/ 31 -fasta -shortPaired COLLAPSED.fasta
tonybert$
velvetg run2velveth_01022012/ -ins_length 300 -exp_cov 227
I assessed the contigs.fa file, and found only sequences of ~60bp. Quite disappointing.
Since then, I have tried using unfiltered reads raw reads, both paired and and single end, and I continually am getting the same results.
As well, i have tried adjusting the kmer length to 21, as well as running the data through with with no insert length estimation or expected coverage.
If anyone has any ideas of what the issues might be, i would sincerely appreciate any comments.
Comment