Dear all,
Recently I sequenced a plant genome using Hiseq 2000, 101bp paired end reads. I assembled the reads to contigs. Because I just sequenced one insert libray (size 400 bp), I got so many contigs. To build a better genome and get more genes, I aligned the contigs to a related species genome using ABACAS, and filled the gaps using Gapfiller. Maybe the two species have some different regions in chromosomes, so in the result there are many Ns in the generated scaffolds (the same as the chromosomes number). So I want to break the scaffolds to contigs if there are more than 100 Ns between two continuous sequences. Because I am only a biologist and with on knowledge on scripts. Could anyone could indicate me which software or script works for my data? Many thanks.
Best Wishes,
yun
Recently I sequenced a plant genome using Hiseq 2000, 101bp paired end reads. I assembled the reads to contigs. Because I just sequenced one insert libray (size 400 bp), I got so many contigs. To build a better genome and get more genes, I aligned the contigs to a related species genome using ABACAS, and filled the gaps using Gapfiller. Maybe the two species have some different regions in chromosomes, so in the result there are many Ns in the generated scaffolds (the same as the chromosomes number). So I want to break the scaffolds to contigs if there are more than 100 Ns between two continuous sequences. Because I am only a biologist and with on knowledge on scripts. Could anyone could indicate me which software or script works for my data? Many thanks.
Best Wishes,
yun
Comment