I am trying to construct an semi annotated transciptome out of a organism which doesn't have a sequenced genome.
What I am doing is using CLC genomics workbench and assembling my samples using the default parameters. This is giving me a ton of contigs back and of these contigs I am only choosing contigs with over 500 hits. I then make consensus sequences out of these contigs and blast them. Now ideally, this would be all I had to do, but what I am finding, is that many of these contigs are not unique to one another and are producing redundant blasts. I can re-assemble the consensus sequences and find that many assemble onto them selves.
Is there a optimal way to make the original assembly more efficient so that doing serial assemblies isn't necessary? If anything I am saying doesn't make sense, please say so.
Thanks,
Mike
What I am doing is using CLC genomics workbench and assembling my samples using the default parameters. This is giving me a ton of contigs back and of these contigs I am only choosing contigs with over 500 hits. I then make consensus sequences out of these contigs and blast them. Now ideally, this would be all I had to do, but what I am finding, is that many of these contigs are not unique to one another and are producing redundant blasts. I can re-assemble the consensus sequences and find that many assemble onto them selves.
Is there a optimal way to make the original assembly more efficient so that doing serial assemblies isn't necessary? If anything I am saying doesn't make sense, please say so.
Thanks,
Mike
Comment