Hi All-
I'm new, but I thought it would be good to let people know about Plantagora, which is a project that I've been part of for the past year. It's purpose is to find the best approaches to sequencing a new genome using next gen sequencing and whole genome assembly. It is oriented towards plant genomes, but for the most part, the information, tools, etc. applies to all species. It's inspiration was the realization that even with a lot of good sequencing coverage, it can still be difficult or impossible to come up with a good genome sequence.
For the Plantagora project, we created simulated reads modeling those from the Illumina or 454 sequencing platforms. The source of the sequences was primarily rice chromosome one, but we also used some whole plant genomes, also. We used several different assemblers, depending on the data, e.g. Newbler, ABySS, and SOAPdeNovo. The resulting assemblies are evaluated using a very long list of metrics, some being statistics about the contigs and scaffolds, others are derived by alignment to the original sequence to provide various metrics about the fidelity of the assemblies.
The results of these studies, of which there are thousands, are entered into a database that is available for download. There is also a graphing tool, so that you can generate custom graphs from the data. The tools used to create the data are also posted. All of this is more or less now available on our new website: plantagora.org (http://www.plantagora.org/) We hope people will make use of it, because that's what it's there for! It was funded by NSF, but is now being taken over by the iPlant Collaborative, another NSF-funded project. It should be of great use to those considering a new genome sequencing project, and those of you working on whole genome assembly.
I'm new, but I thought it would be good to let people know about Plantagora, which is a project that I've been part of for the past year. It's purpose is to find the best approaches to sequencing a new genome using next gen sequencing and whole genome assembly. It is oriented towards plant genomes, but for the most part, the information, tools, etc. applies to all species. It's inspiration was the realization that even with a lot of good sequencing coverage, it can still be difficult or impossible to come up with a good genome sequence.
For the Plantagora project, we created simulated reads modeling those from the Illumina or 454 sequencing platforms. The source of the sequences was primarily rice chromosome one, but we also used some whole plant genomes, also. We used several different assemblers, depending on the data, e.g. Newbler, ABySS, and SOAPdeNovo. The resulting assemblies are evaluated using a very long list of metrics, some being statistics about the contigs and scaffolds, others are derived by alignment to the original sequence to provide various metrics about the fidelity of the assemblies.
The results of these studies, of which there are thousands, are entered into a database that is available for download. There is also a graphing tool, so that you can generate custom graphs from the data. The tools used to create the data are also posted. All of this is more or less now available on our new website: plantagora.org (http://www.plantagora.org/) We hope people will make use of it, because that's what it's there for! It was funded by NSF, but is now being taken over by the iPlant Collaborative, another NSF-funded project. It should be of great use to those considering a new genome sequencing project, and those of you working on whole genome assembly.
Comment