Seqanswers Leaderboard Ad

**WhatsOEver** · 11-13-2014, 01:13 AM

Hi Ire1234,
1) If you have "some bunch of genomes", why aren't you running the genomes in parallel instead of the chromosomes of each genome? This would give the same speed improvement without the need of splitting anything, wouldn't it?
2) I don't know if GenotypeGVCF can handle these multiple files (though I doubt it), but file concatenation would be a very easy and straight forward approach here.

**lre1234** · 11-13-2014, 04:59 AM

Thanks. So I am looking at human genomes. I am able to run the files over multiple machines in a cluster and thought that breaking it up into smaller pieces and running many of them at the same time would speed everything up.

Your point #2 was really my question. Can GenotypeGVCF hand multiple files from the same individual. I have been looking around, but haven't been able to find an answer. Perhaps a concatenation of the files first would be the best approach.

**WhatsOEver** · 11-13-2014, 07:10 AM

Let's just assume the following:
You have 24 Genomes, 24 chr each and you can use 24 nodes. Why would it be faster to run 1 genome with 24 chr on each node in comparison to running 24x 1 chromosome on each node? The latter will actually be slower. Or do I misunderstand your question?

**lre1234** · 11-13-2014, 07:29 AM

So essentially, I would be running all 24 chr simultaneously on the 1 machine, which should be faster than running the whole genome on 1 node. This should speed things up. Also, the GATK docs says that there may be issues with using the -nct to multithread the run.

At the end, I would end, for 1 subject, I will have 24 gvcf files (1 for each chr) for that sample. The question is in the next step of genotypeGVF. Would I have to concatenate all of the individual GVCF's together, or run genotypeGVCF on all individual samples and it will be smart enough to match the sample in the individual files.

**IonTom** · 11-14-2014, 03:32 AM

Have you thought about using Platypus for the task instead of Haplotype caller:

Centre for Human Genetics

http://www.well.ox.ac.uk/platypus

Both are haplotype based callers but Platypus is multiple times faster.

Topics	Statistics	Last Post
Advanced Epigenome Editing Platform Explores Gene Regulation Mechanisms by seqadmin Started by seqadmin, Today, 02:46 PM	0 responses 10 views 0 likes	Last Post by seqadmin Today, 02:46 PM
Telomere Maintenance by PARP1: A New Perspective in Cancer Research by seqadmin Started by seqadmin, 05-07-2024, 06:57 AM	0 responses 13 views 0 likes	Last Post by seqadmin 05-07-2024, 06:57 AM
Enhanced Neoantigen Detection: Introducing NeoHunter by seqadmin Started by seqadmin, 05-06-2024, 07:17 AM	0 responses 16 views 0 likes	Last Post by seqadmin 05-06-2024, 07:17 AM
A Close Examination at Probiotic-Related Bacteremia by seqadmin Started by seqadmin, 05-02-2024, 08:06 AM	0 responses 23 views 0 likes	Last Post by seqadmin 05-02-2024, 08:06 AM

Seqanswers Leaderboard Ad

Announcement

running haplotypcaller in small buckets

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News