Seqanswers Leaderboard Ad

**francois.sabot** · 06-07-2011, 11:07 PM

A ReadGroup will assign an origin to a set of reads in order to assign a specific genotype to this origin when making the SNP/InDel calling. Without this step, you will have a set of SNPs but you cannot assign them to a specific genotype... This AddOrReplace step is requested by GATK pipeline, as it supposed you will call genotype and not only SNP. If you need only a raw set of SNP, you can use PileUp format and VarScan utility Pileup2SNP.

**ocs** · 06-07-2011, 11:25 PM

Hello Francois,

thank you for your quick answer. I get the glimpse of an idea, but your answer is not fully clear to me. With origin you mean from where the reads came physically (e.g. chip, lane)? And I know what SNP calling is (locating SNPs in comparsion to reference genome), but what is genotype calling? I can imagine that its the sum of all SNPs but I'm not sure. Even with this knowledge I can't imagine what this step is useful for. My thought is that the read groups are determined by somewhat the technology since it knows on which lanes and chips which reads were sequenced. So I think of this groups as a constant which should not be changed, this is actually my problem.

Thanks for any hints on this!

**francois.sabot** · 06-07-2011, 11:49 PM

The origin in my case can be either a lane, the name of the individual/organism. You can have eg 10 individuals tagged in a single lane, then mapped individually and then affected to a group (eg Indiv1, Indiv2...). Then all reads from a single individual are tagged by the same flag RG at the end of the SAM line. When you merge all those 10 SAM, each lane is tagged by an origin.
Then you asked for example to the GATK Genotyper to 'call the genotype'. It means that SNP will be identified, based on depth, quality, etc. And as each read can be affected to a specific individual, you can say obtain in the resultant VCF file an info saying 'Ok, Indiv1 has a A instead of a G at the position chr01:234554'.

This is the genotype calling, ie affecting the specific SNPs to a specific individual.

**ocs** · 06-08-2011, 03:04 AM

Hello Francois,

thank you again for your answer. I understand now what a readgroup and genotype-calling is. But the last part of my previous post is still unclear, because I use the fastq files to align to the reference genome but in the AddOrReplaceReadGroups-step I give the same files as a read-group library. This seems redundant to me, ain't it? Shouldn't he have the read - to - read group assignment already? This is what is still confusing me.

Thanks,
Oliver

**francois.sabot** · 06-08-2011, 10:45 PM

Yes it is redundant at first look, but if you did not specified the RG tag during the mapping assay (as BWA allows eg), you did not have this information within the SAM file. Thus you need to add it, as the information in the SAM header in a standard version did not contain any reference to the origin of the reads.

If you had specified it, then there is no need to perform this step.

**DZhang** · 06-09-2011, 07:51 AM

Originally posted by ocs View Post

Hello Francois,

thank you again for your answer. I understand now what a readgroup and genotype-calling is. But the last part of my previous post is still unclear, because I use the fastq files to align to the reference genome but in the AddOrReplaceReadGroups-step I give the same files as a read-group library. This seems redundant to me, ain't it? Shouldn't he have the read - to - read group assignment already? This is what is still confusing me.

Thanks,
Oliver

Hi Ocs,

If RG is not critical to your pipeline, you may use "VALIDATION_STRINGENCY=SILENT" to suppress the warning. I used this option a few months ago but am not sure if it still works. You may give it try and report back if it still works. Picard is under very active and rapid development, as I see it.

Douglas

https://www.contigexpress.com

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 19 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 17 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 49 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

What exactly is AddOrReplaceReadGroups (picard tools) doing?

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News