Dear Friends,
I was working on the assembly of a 3Gb mammalian genome using next-gen data from multiple platforms. I have data from Illumina HiSEQ (PE, ~160x sequence depth) with multiple library runs. Another round of sequencing was also performed on PacBio with short (1Kb) and large (10Kb) libraries as well as on Roche 454.
Here is my sketch of the assembly process using the sequencing data:
Phase-1:
1. Assembly of Illumina sequence reads (multiple libraries) to build contigs using Ray or Velvet or ALLPATHS-LG
2. Assembly of 454 reads using Celera Assembler
3. Assembly of PacBio reads from individual library using PacBio's HGAP, PBJelly and Quiver approach
Phase-2:
4. Curing the long PacBio reads with the accurate Illumina reads by mapping
5. Assembly the resulting high quality consensus reads from (3) using Ray/Velvet/CA
6. Perform reference based hybrid assembly of illumina reads, 454 reads using Ray (reference here are the consensus sequence build from PacBio reads)
Phase-3:
5. Hybrid assembly integrating (1), (2), (3) and (6) using Ray
Please advice me in case if you have a better thoughts or comments on this. Also share your experience and approaches for a large genome assembly using multi platform omics data.
Regards,
Raj
I was working on the assembly of a 3Gb mammalian genome using next-gen data from multiple platforms. I have data from Illumina HiSEQ (PE, ~160x sequence depth) with multiple library runs. Another round of sequencing was also performed on PacBio with short (1Kb) and large (10Kb) libraries as well as on Roche 454.
Here is my sketch of the assembly process using the sequencing data:
Phase-1:
1. Assembly of Illumina sequence reads (multiple libraries) to build contigs using Ray or Velvet or ALLPATHS-LG
2. Assembly of 454 reads using Celera Assembler
3. Assembly of PacBio reads from individual library using PacBio's HGAP, PBJelly and Quiver approach
Phase-2:
4. Curing the long PacBio reads with the accurate Illumina reads by mapping
5. Assembly the resulting high quality consensus reads from (3) using Ray/Velvet/CA
6. Perform reference based hybrid assembly of illumina reads, 454 reads using Ray (reference here are the consensus sequence build from PacBio reads)
Phase-3:
5. Hybrid assembly integrating (1), (2), (3) and (6) using Ray
Please advice me in case if you have a better thoughts or comments on this. Also share your experience and approaches for a large genome assembly using multi platform omics data.
Regards,
Raj
Comment