Dear all,
Here I have a question about assembling shotgun metagenomics short reads (Illumina, paired-end reads, 2 x 150 bp).
I have two drinking water pipelines in two cities. For each pipeline, I selected four different sites for sampling. At each site, I sampled water in the summer, fall, winter, and spring in 2017. I then processed the samples individually with the NGS and generated the short reads. I conducted the assembly with MEGAHIT for all my sample all at once:
$ megahit -1 $R1s -2 $R2s
I found that the final result has only one final.contigs.fa file instead of N contigs.fa files for N pairs of samples. It seems that MEGAHIT treats all my samples as a single group and assembles them together. Will this cause miss-assembly? For instance, read A from one sample and read B from another sample are merged to a contig because A and B are similar in sequence; however, read A and read B are actually from different bacterial species. Should this be a concern for my case? In other words, what would be the best strategy for assembling the reads from the two drinking water pipelines? Should I simply put all my samples together and assemble them all at once? Or, should I assemble the samples from the two water pipelines separately?
Thanks a lot for any suggestions and comments.
Here I have a question about assembling shotgun metagenomics short reads (Illumina, paired-end reads, 2 x 150 bp).
I have two drinking water pipelines in two cities. For each pipeline, I selected four different sites for sampling. At each site, I sampled water in the summer, fall, winter, and spring in 2017. I then processed the samples individually with the NGS and generated the short reads. I conducted the assembly with MEGAHIT for all my sample all at once:
$ megahit -1 $R1s -2 $R2s
I found that the final result has only one final.contigs.fa file instead of N contigs.fa files for N pairs of samples. It seems that MEGAHIT treats all my samples as a single group and assembles them together. Will this cause miss-assembly? For instance, read A from one sample and read B from another sample are merged to a contig because A and B are similar in sequence; however, read A and read B are actually from different bacterial species. Should this be a concern for my case? In other words, what would be the best strategy for assembling the reads from the two drinking water pipelines? Should I simply put all my samples together and assemble them all at once? Or, should I assemble the samples from the two water pipelines separately?
Thanks a lot for any suggestions and comments.
Comment