Unconfigured Ad

**swbarnes2** · 11-11-2013, 04:42 PM

You are using vcfutils.pl from samtools?

Here's the relevant code from that program:

Code:

my %het = (AC=>'M', AG=>'R', AT=>'W', CA=>'M', CG=>'S', CT=>'Y',
			 GA=>'R', GC=>'S', GT=>'K', TA=>'W', TC=>'Y', TG=>'K');

        $q = $1 if ($t[7] =~ /FQ=(-?[\d\.]+)/);
	  if ($q < 0) {
		$_ = ($t[7] =~ /AF1=([\d\.]+)/)? $1 : 0;
		$b = ($_ < .5 || $alt eq '.')? $ref : $alt;
		$q = -$q;
	  } else {
		$b = $het{"$ref$alt"};
		$b ||= 'N';

So if the FQ value is positive, and concatenating together the REF and the ALT makes a combination found in that %het, then W's and K's and whatever get put in the consensus.

**jmartin** · 11-12-2013, 12:01 PM

I am using vcfutils.pl vcf2fq to build fastq from the vcf. But I only have a tenuous grasp on the full extent of the code thats going on. Under what condition does an ambiguous base get used? Looking at that snippit, I'm not getting what $q is, so I wasn't able to follow why it goes into one branch over the other. I do get what is being assigned and why in both branches though.

I guess I basically need to know when an 'n' is used in the consensus output.

**swbarnes2** · 11-12-2013, 12:51 PM

Originally posted by jmartin View Post

I am using vcfutils.pl vcf2fq to build fastq from the vcf. But I only have a tenuous grasp on the full extent of the code thats going on. Under what condition does an ambiguous base get used? Looking at that snippit, I'm not getting what $q is, so I wasn't able to follow why it goes into one branch over the other. I do get what is being assigned and why in both branches though.

$q is the FQ value.

I guess I basically need to know when an 'n' is used in the consensus output.

Ah, that's different.

First:

Code:

if ($t[1] - $last_pos > 1) {
	  $seq .= 'n' x ($t[1] - $last_pos - 1);
	  $qual .= '!' x ($t[1] - $last_pos - 1);
	}

So if there is for some reason a gap in your all-points vcf, it puts n's there.

Or

Code:

else {
		$b = $het{"$ref$alt"};
		$b ||= 'N';
	  }

If REF concatenated with ALT isn't in that starting %het, then it puts an N instead.

That seems to be the only two points in that script where N's or n's get used.

Note that indels get handled differently, they are not put into the consensus, there is instead a window of lowercase letters around the putative indel.

**jmartin** · 11-12-2013, 01:58 PM

I get it now, thank you very much!

Topics	Statistics	Last Post
High-Resolution Sequencing Exposes Hidden Toxoplasma Diversity by SEQadmin2 Started by SEQadmin2, Yesterday, 11:08 AM	0 responses 6 views 0 reactions	Last Post by SEQadmin2 Yesterday, 11:08 AM
New AI Model Captures Long-Range Genomic Signals to Improve RNA Splice Site Prediction by SEQadmin2 Started by SEQadmin2, 06-30-2026, 05:37 AM	0 responses 11 views 0 reactions	Last Post by SEQadmin2 06-30-2026, 05:37 AM
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 19 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 53 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM

Unconfigured Ad

Understanding samtools mpileup consensus

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News