I am using the "BWA for SOLiD" tool on Galaxy. It calls for two inputs:
1) "Reference Genome": I am using mrna.fa.gz - Human mRNA from GenBank (from the website http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/). It is ~532MB.
2) "FASTQ file (Nucleotide-space recoded from color-space)": I am using a .fastq file of human transcriptome data. It is ~ 1.8GB.
I was told to go ahead and try running "BWA for SOLiD" with these inputs, but that it would most likely exceed resources with a memory error.
I am wondering how I can prevent this (without having to reference cloud resources, etc), and just use the normal Galaxy platform. I have already reduced my .fastq file from its original size by 10-fold (I randomly kept only 1 out of every 10 sequences).
What is the most effective way for me to reduce the process? And how can I do so without introducing more biases? Should I further reduce my .fastq files by another 2 or 5 fold etc.? Or should I reduce my .fa file, and if so, what is the ideal way to accomplish this?
I am not concerned about quality. This is for a quick course project - not for any publication! )
I am feeling concerned because already, it has been ~2 hours since I submitted the "BWA for SOLiD" job to Galaxy, and it is still "waiting to run", whereas I have since run many other smaller jobs, and have never had to wait for my job to begin on Galaxy, except for a few minutes. Approximately how long would such a job take on Galaxy, given the size of the inputs? I just don't know what to expect, and am feeling concerned about time issues....
Sorry for a long message. If you have any advice on any of the topics, I would be glad to hear them!
1) "Reference Genome": I am using mrna.fa.gz - Human mRNA from GenBank (from the website http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/). It is ~532MB.
2) "FASTQ file (Nucleotide-space recoded from color-space)": I am using a .fastq file of human transcriptome data. It is ~ 1.8GB.
I was told to go ahead and try running "BWA for SOLiD" with these inputs, but that it would most likely exceed resources with a memory error.
I am wondering how I can prevent this (without having to reference cloud resources, etc), and just use the normal Galaxy platform. I have already reduced my .fastq file from its original size by 10-fold (I randomly kept only 1 out of every 10 sequences).
What is the most effective way for me to reduce the process? And how can I do so without introducing more biases? Should I further reduce my .fastq files by another 2 or 5 fold etc.? Or should I reduce my .fa file, and if so, what is the ideal way to accomplish this?
I am not concerned about quality. This is for a quick course project - not for any publication! )
I am feeling concerned because already, it has been ~2 hours since I submitted the "BWA for SOLiD" job to Galaxy, and it is still "waiting to run", whereas I have since run many other smaller jobs, and have never had to wait for my job to begin on Galaxy, except for a few minutes. Approximately how long would such a job take on Galaxy, given the size of the inputs? I just don't know what to expect, and am feeling concerned about time issues....
Sorry for a long message. If you have any advice on any of the topics, I would be glad to hear them!
Comment