Hey all! I've got (what I consider) an interesting data processing question that I thought someone here may be able to help out with.
Study System: Single species, non-model (ie. no reference genome of anything close) vertebrate with genome ~2.5Gb that shows large within-species karyotypic differences between geographic isolates.
Data: Low-coverage approach; 100bp PE DNA-Seq datasets with ~170M PF clusters per isolate genome.
Study Question: Can we use low-coverage, isolate-specific DNA-seq libraries to identify genomic differences between individuals/populations in a non-model organism?
SeqAnswers Question: Because our cost-induced low coverage limits our ability to make a decent assembly (Ray worked okay, but inconsistently between samples), we were thinking that maybe there would be a way to work from the reads of each isolate directly to perform a sort of in silico subtraction? Then, a small scale assembly could be done on the isolate-specific reads.
To rephrase for clarity:
1) Sequence genome of isolate 1 and isolate 2 at low coverage
2) Perform normal read QC
3) Instead of assembling and reciprocal BLASTing (precluded from coverage), how could we compare the reads from isolate 1 to the reads of isolate 2 and subset the unique reads in each isolate (which since they're the same species should be highly enriched for the chromosomal anomolies).
Any thoughts here? There may be a tool for this and I just don't know the right search terms. It also may be bioinformatics sacrilege, and if so, accept my thousand pardons.
EDIT: I just wanted to add that we have tried to assemble these genomes and do our analyses in a more traditional way, but assembly quality wasn't consistent between samples. I add this as a gesture that I'm not milking SeqAnswers to come up with my whole project pipeline, but rather, to assist in looking for non-traditional methods when the others fall away.
Study System: Single species, non-model (ie. no reference genome of anything close) vertebrate with genome ~2.5Gb that shows large within-species karyotypic differences between geographic isolates.
Data: Low-coverage approach; 100bp PE DNA-Seq datasets with ~170M PF clusters per isolate genome.
Study Question: Can we use low-coverage, isolate-specific DNA-seq libraries to identify genomic differences between individuals/populations in a non-model organism?
SeqAnswers Question: Because our cost-induced low coverage limits our ability to make a decent assembly (Ray worked okay, but inconsistently between samples), we were thinking that maybe there would be a way to work from the reads of each isolate directly to perform a sort of in silico subtraction? Then, a small scale assembly could be done on the isolate-specific reads.
To rephrase for clarity:
1) Sequence genome of isolate 1 and isolate 2 at low coverage
2) Perform normal read QC
3) Instead of assembling and reciprocal BLASTing (precluded from coverage), how could we compare the reads from isolate 1 to the reads of isolate 2 and subset the unique reads in each isolate (which since they're the same species should be highly enriched for the chromosomal anomolies).
Any thoughts here? There may be a tool for this and I just don't know the right search terms. It also may be bioinformatics sacrilege, and if so, accept my thousand pardons.
EDIT: I just wanted to add that we have tried to assemble these genomes and do our analyses in a more traditional way, but assembly quality wasn't consistent between samples. I add this as a gesture that I'm not milking SeqAnswers to come up with my whole project pipeline, but rather, to assist in looking for non-traditional methods when the others fall away.
Comment