Seqanswers Leaderboard Ad

**zee** · 08-23-2011, 07:12 AM

Hi Trickytank,

We did a similar study but looked at dbSNP concordance rather than the FASTQC quality profile.

I think your figure legends (from top to bottom) in figures 2 and 4 probably mean "After" calibration. Is that right?

This is quite an interesting observation why GATK would do this.

**trickytank** · 08-23-2011, 04:20 PM

Originally posted by zee View Post

I think your figure legends (from top to bottom) in figures 2 and 4 probably mean "After" calibration. Is that right?

Thanks, I've fixed that now.

Do you have a link/article for your study?

**sparks** · 08-23-2011, 06:58 PM

Tricktank,
There could be a simple explanation. Novoalign can clip alignments, trimming them back to the best local alignment. This means a mismatch in the last few bases is likely to be clipped.
Novoaligns quality calibration works on alignments before clipping so it won't show this affect.
Clipping is done to improve accuracy of SNP calling. With dynamic programming algorithms like Smith-Waterman and Needleman-Wunsch there are often suboptimal alignments that only differ slightly in score from the optimal alignment. This especially happens near the ends of alignments. For example an true indel of 1bp in the last few bp of a read may be aligned as mismatches. The clipping ensures there are enough matching bases after a SNP or Indel to ensure the alignment is optimum.
Clipping can be turned off with the option -o FULLNW

Colin

**trickytank** · 08-23-2011, 07:16 PM

Hey thanks for that. By the sounds of it I would be better off not using:
-o FULLNW

I found that the number of SNP variants changes very little <30 of ~100,000 conditioning on depth >4 when using GATK and having used Novoalign recalibration.
Using BWA that GATK changes the SNP variants by around 1,000~2,000. I'm thinking to just not use GATK recalibration on Novoalign runs.

**trickytank** · 08-23-2011, 09:00 PM

I'm going to try the -o FULLNW option to see if it removes what I have observed.

**trickytank** · 08-23-2011, 09:00 PM

I'm going to try the -o FULLNW option to see if it removes what I have observed. I'll post my results here.

**trickytank** · 08-23-2011, 10:44 PM

to clarify, does this mean by default Novoalign clips mismatches at the ends of reads which are not seen in the reference index?

**sparks** · 08-24-2011, 01:34 AM

Yes, mismatches near the ends of the alignment will be clipped so that best local alignment is reported. It doesn't seem right if all we had was SNPs but if our sample includes indels and structural variations and these occur near the ends of the read then they may get aligned as mismatches. This can then cause erroneous SNP calls. Clipping avoids this problem and improves specificity of SNP & Indel calls but it may reduce sensitivity a bit.
It would be interesting to see effect of clipping on dbSNP concordance, we haven't done this yet.

**trickytank** · 09-05-2011, 10:14 PM

Using the -o FULLNW option

And with the -o FULLNW, the FastQC plots are no longer worrying.

Novoalign with recalibration and -o FULLNW option, before GATK recalibration BAM file:

By trickytank at 2011-09-05

Novoalign with recalibration and -o FULLNW option, after GATK recalibration BAM file:

By trickytank at 2011-09-05

I was under the impression that BAQ implemented in SAMtools is designed to overcome the problems of misalignments caused by indels near the ends of reads, and shouldn't effect sensitivity as much as clipping at the alignment stage? (Local realignment around indels also seems like an alternative too.)

**sparks** · 09-12-2011, 07:50 PM

Local realignment should help if you use -o FullNW, I haven't looked into this. It would be interesting to see effect on dbSNP concordance.
We added soft clipping before these tools were readily available.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 17 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Novoalign with GATK recalibration

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News