Seqanswers Leaderboard Ad

**blueskypy** · 05-15-2013, 11:02 AM

Here is a reply from the GATK team:
Well, the thing to keep in mind is that if you merge all your BAMs together into a big one, the processing of that big BAM is going to be very computationally demanding. Also, note that the recalibrator will process read groups individually, so you will not get the "whole lane data" advantage that you think from recalibrating multiple samples together.
Actually, let me take a step back and give you a quick run-down of what we do in-house.
Our setup is a bit complex because we have samples spread over multiple lanes, with multiple samples per lane. So when we get the FastQs, we separate out the read data by read group into individual files, so that after alignment we have one bam file per read group. We run dedup-realign-recal on each bam file, then merge the bams of read groups that belong to the same sample to produce one bam file per sample. Then we do another round of realign-recal on the sample bams as a form of cross-lane cleanup.
But if you don't have samples spread across different lanes, you don't need to do all of this. The simplest is probably to separate out the samples using their read group tags into individual bam files and process them separately through dedup-realign-recal. I realize this contradicts what I said earlier; technically that (earlier suggestion to just process all together) is still an option (with some advantages, e.g. all samples aligned the same way), but if you plan to split up your samples before variant calling anyway, you might as well split everything earlier on and save yourself the compute resources of processing everything together.

Topics	Statistics	Last Post
Expanded Genetic Insights into Blood Pressure Regulation by seqadmin Started by seqadmin, Yesterday, 12:17 PM	0 responses 13 views 0 likes	Last Post by seqadmin Yesterday, 12:17 PM
The Role of Enhancers in Defining Cell Fate by seqadmin Started by seqadmin, 04-29-2024, 10:49 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-29-2024, 10:49 AM
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 24 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 23 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Latest Articles

ad_right_rmr

News