Hi All,
I have a single lane of Illumina HiSeq 2000 101-bp paired end reads on a single genome (hop--est. genome size 2.8 Gb) along with two RNA-Seq experiments (same conditions as genome seq) that I'm attempting to assemble using Velvet (in advance, please don't criticize use of velvet...I will be using other assembly packages in future). All three "experiments" were done on different genotypes. I've run just the genome sequence data with all ambiguities removed as "single" reads and have successfully completed runs with velvet. It took approximately 35 hours to complete assembly--but of course, assembly using the above settings was not great (only 1/3 genome covered with N50=270). I've also run the RNA-Seq experiments as "single" reads and have seen velvetg run to completion in a similar amount of time.
I have now processed all reads to remove all orphaned reads resulting from paired end read processing (removed all ambiquities and trimmed) and combined these orphaned reads into a single fastq.gz file. I submitted all the processed paired end read files (as "shortPaired" reads) along with orphaned read files (as "short" reads) from genome sequence and two RNA-seq experiments back on Tuesday (April 2nd) on a 1000 Gb RAM machine and velvetg has been running ever since then (96 hours). Using "top" command on UNIX machine has showed significant changes in amount of RAM used through the early parts of assembly but the last 2 days have shown a consistent use of 640 Gb RAM with one processor running at 100 %.
My question is this, is this length of time normal for velvetg for assembling such a large dataset or has velvetg just run into a continuous loop and it will never run to completion?
I have a single lane of Illumina HiSeq 2000 101-bp paired end reads on a single genome (hop--est. genome size 2.8 Gb) along with two RNA-Seq experiments (same conditions as genome seq) that I'm attempting to assemble using Velvet (in advance, please don't criticize use of velvet...I will be using other assembly packages in future). All three "experiments" were done on different genotypes. I've run just the genome sequence data with all ambiguities removed as "single" reads and have successfully completed runs with velvet. It took approximately 35 hours to complete assembly--but of course, assembly using the above settings was not great (only 1/3 genome covered with N50=270). I've also run the RNA-Seq experiments as "single" reads and have seen velvetg run to completion in a similar amount of time.
I have now processed all reads to remove all orphaned reads resulting from paired end read processing (removed all ambiquities and trimmed) and combined these orphaned reads into a single fastq.gz file. I submitted all the processed paired end read files (as "shortPaired" reads) along with orphaned read files (as "short" reads) from genome sequence and two RNA-seq experiments back on Tuesday (April 2nd) on a 1000 Gb RAM machine and velvetg has been running ever since then (96 hours). Using "top" command on UNIX machine has showed significant changes in amount of RAM used through the early parts of assembly but the last 2 days have shown a consistent use of 640 Gb RAM with one processor running at 100 %.
My question is this, is this length of time normal for velvetg for assembling such a large dataset or has velvetg just run into a continuous loop and it will never run to completion?
Comment