Hi everyone,
I am a newbe to sequencing and assembly and I posted before for help for my library and you guys were great - thanks for that again. Now I am doing Velvet denovo assemblies and have no idea if I am doing it right and as good as possible. I have different questions about FastX processing and Velvet input as well as interpreting the quality of my output contigs.
I haven't found a FastX -> velvet tutorial, so if there is one I am sorry for wasting your time.
My data:
2x250bp paired end Illumina Miseq data, 1.9million reads per genome. Organism is an E. Coli strain so I assume my assembly has a size of roughly 5mbp.
What I did so far:
- Assembly without FastX processing:
Used both reads, shuffled them using the velvet perl script. Input my shuffled sequences into velveth, 'MAXKMERLENGTH=151' -shortPaired.
The output was:
- Expected coverage: 17.949313
- Estimated cutoff: 8.974657
- nodes: 148
- n50 of 234827
- max 537019
- total 5039428
This was basically my first attempt to get to know velvet. Next I tried to improve my assembly by quality processing my data. I trimmed the ends of my reads by 7bp for them to be accepted as high quality in FastQC.
Then I retried my assembly with those reads and kmer lengths between 131-149 (region of lowest number of nodes). Using the quality trimmed reads I end up at best with 158 nodes and n50 values of 212407. Only the max contig length rose to 650709.
Last I applied all the FastX tools to filter, trim, clip my reads. However, I think due to filtering the reads the shuffle script shuffled the reads wrong and I got very bad assemblies.
So here are my questions I got from my attempts:
- Is it acceptable to denovo assemble unprocessed reads?
- If no, which quality enhancement methods are absolutely necessary?
- How can I apply the FastX filter tools and still run velvet paired end assemblies?
- How can I tell a good assembly from a bad assembly? By contig n50 or number of nodes? Kmer coverage?
I have found threads that cover aspects of what I am asking but I am very insecure if I am doing it right, and I want to do it right.
Thanks for your patience,
Illnoobina
I am a newbe to sequencing and assembly and I posted before for help for my library and you guys were great - thanks for that again. Now I am doing Velvet denovo assemblies and have no idea if I am doing it right and as good as possible. I have different questions about FastX processing and Velvet input as well as interpreting the quality of my output contigs.
I haven't found a FastX -> velvet tutorial, so if there is one I am sorry for wasting your time.
My data:
2x250bp paired end Illumina Miseq data, 1.9million reads per genome. Organism is an E. Coli strain so I assume my assembly has a size of roughly 5mbp.
What I did so far:
- Assembly without FastX processing:
Used both reads, shuffled them using the velvet perl script. Input my shuffled sequences into velveth, 'MAXKMERLENGTH=151' -shortPaired.
The output was:
- Expected coverage: 17.949313
- Estimated cutoff: 8.974657
- nodes: 148
- n50 of 234827
- max 537019
- total 5039428
This was basically my first attempt to get to know velvet. Next I tried to improve my assembly by quality processing my data. I trimmed the ends of my reads by 7bp for them to be accepted as high quality in FastQC.
Then I retried my assembly with those reads and kmer lengths between 131-149 (region of lowest number of nodes). Using the quality trimmed reads I end up at best with 158 nodes and n50 values of 212407. Only the max contig length rose to 650709.
Last I applied all the FastX tools to filter, trim, clip my reads. However, I think due to filtering the reads the shuffle script shuffled the reads wrong and I got very bad assemblies.
So here are my questions I got from my attempts:
- Is it acceptable to denovo assemble unprocessed reads?
- If no, which quality enhancement methods are absolutely necessary?
- How can I apply the FastX filter tools and still run velvet paired end assemblies?
- How can I tell a good assembly from a bad assembly? By contig n50 or number of nodes? Kmer coverage?
I have found threads that cover aspects of what I am asking but I am very insecure if I am doing it right, and I want to do it right.
Thanks for your patience,
Illnoobina
Comment