Unconfigured Ad

**kopi-o** · 11-11-2011, 12:39 PM

samtools view alignment.bam chr22:1000000-1234000

will give you the alignments for that region. You can then go on to extract the sequence by e.g. doing 'cut -f 10'.

**ardmore** · 11-11-2011, 12:54 PM

Follow up:
The question is if you select column 10, there are many different segments with the same position. For example:

Code:

61AGUAAXX100114:8:70:11288:13082	165	chr22	14430092	0	*	=	14430092	0	NNNNCNAGCNGAGCGNNTCTGGG
61AGUAAXX100114:8:70:11288:13082	1113	chr22	14430092	0	101M	=	14430092	0	NNNNNNNNNNNTNNNNNNNNANN
61AJDAAXX100113:7:59:10707:12713	113	chr22	14430092	0	101M	chr14	18863090	0	CCTCGCGGGACTGGTATGGGGAC

They have same start position 14430092 but different sub sequences.

Hopefully more details.

**kopi-o** · 11-11-2011, 01:25 PM

samtools view alignment.bam chr22:1000000-1234000 | cut -f 10

or

samtools view alignment.bam chr22:1000000-1234000 > subset.sam
cut -f 10 subset.sam

10 because the sequence is the tenth column (or "field", hence the -f flag in the SAM file.

**ardmore** · 11-11-2011, 01:31 PM

Should we consider the flag such as 165,1113 etc.?
Which one should be selected for the position 14430092?

Code:

NNNNCNAGCNGAGCGNNTCTGGG
NNNNNNNNNNNTNNNNNNNNANN
CCTCGCGGGACTGGTATGGGGAC

**kopi-o** · 11-11-2011, 01:34 PM

Ah, this is the same question as on BioStar:

Site not found

http://biostar.stackexchange.com/questions/14222/about-the-10th-column-of-the-output-sequence-from-samtools-view-command

We make Stack Overflow and 170+ other community-powered Q&A sites.

You got some good advice there from brentp.

**ardmore** · 11-11-2011, 01:41 PM

Well, I don't think he answered my question although I market it as an answer.
So I posted it here to see if anybody can help me. He just extract the sequence from the reference rather than the read.

**kopi-o** · 11-11-2011, 01:49 PM

OK, let me spell it out then.

brentp told you in one of his replies that you should ignore all the reads with mapping quality 0. The reads you show here have mapping quality 0 (MAPQ column: column 5 according to the SAM format specification). Thus, they are completely unreliable.

Regarding the SAM flags, it's really up to you if and how you want to filter based on those, for instance, if you want to require your reads to be paired or not. This page explains the flags: http://picard.sourceforge.net/explain-flags.html

EDIT: I see now that brentp has replied that this region is all Ns in the reference sequence. Thus, all of your alignments probably have MAPQ = 0, and these are just spurious alignments that for some reason got assigned to these position, but can't be trusted.

If it had been a "normal" genome region, you would likely get several variant sequences for it, because sequencing errors are not uncommon. You could then use some SNP calling/genotyping tool like samtools mpileup (or one of many others) to construct a consensus sequence based on your reads.

**swbarnes2** · 11-11-2011, 01:55 PM

You really need to read the documentation, becase a lot of your questions would be answered there.

For starters, that first read, the one with the flag of 165, didn't actually map to that position. sam format specs call for unmapped reads to be given the chromosome and position of their mapped mates.

165 = 128+32+4+1 That means the read was in the second fastq, its mate was in the reverse direction as compared to the reference, the read itself did not map, and it's from a paired-end read.

So that's why its sequence isn't what it should be.

Really, you need to read this:

Ten Simple Rules for Getting Help from Online Scientific Communities

Ten Simple Rules for Getting Help from Online Scientific Communities

http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1002202

Internet,Archives,Web-based applications,Computer software,Careers,Grammar,Human learning,Scientists

You need to espeically read all of rule 6.

and note:

One of the most impolite behaviors toward an online community is asking a question in multiple places at the same time. “Cross-posting”, as this practice is called, can make two distinct online communities work through a solution for you when only one is needed; this is an abuse of forum members' time.

**ardmore** · 11-11-2011, 01:57 PM

Thank you very much. Your statements are much clearer than his. Could you please give me a quick command sample to get a sequence based on my reads? I am not strong on it.

Code:

samtools mpileup -uf ref.fa aln.bam | bcftools view -cg - | vcfutils.pl vcf2fq > cns.fq

**gringer** · 11-12-2011, 08:04 AM

FWIW, if you're working with bowtie and colour-space, note the following about the SAM file results:

Bowtie: Manual

http://bowtie-bio.sourceforge.net/manual.shtml#decoding-colorspace-alignments

Note that in -S/--sam mode, the decoded nucleotide sequence is printed for alignments, but the original color sequence (with A=blue, C=green, G=orange, T=red) is printed for unaligned reads without any reported alignments.

This means that unaligned sequences will be double-encoded colour-space, while aligned sequences will closely match the reference.

I'm currently investigating whether cleaning colour-space sequences using bowtie and a reference genome is actually useful for the purpose of doing an assembly based on the decoded bases (instead of, or in addition to, using SAET).

**nickloman** · 11-12-2011, 11:22 AM

I have a strong suspicion reading this and other posts of ardmore that we are helping him/her with his course-work!

Topics	Statistics	Last Post
UC San Diego Bioengineers Map Gene Function in Human Stem Cells by SEQadmin2 Started by SEQadmin2, Yesterday, 10:26 AM	0 responses 11 views 0 reactions	Last Post by SEQadmin2 Yesterday, 10:26 AM
New Analysis Splits Leukemia Into 16 Epigenomic Subgroups by SEQadmin2 Started by SEQadmin2, 07-09-2026, 10:04 AM	0 responses 25 views 0 reactions	Last Post by SEQadmin2 07-09-2026, 10:04 AM
Genome-Wide CRISPR Screen Uncovers Unlikely Psoriasis Target by SEQadmin2 Started by SEQadmin2, 07-08-2026, 10:08 AM	0 responses 16 views 0 reactions	Last Post by SEQadmin2 07-08-2026, 10:08 AM
Engineered Protein Motor Takes Its First Steps Along DNA Track by SEQadmin2 Started by SEQadmin2, 07-07-2026, 11:05 AM	0 responses 33 views 0 reactions	Last Post by SEQadmin2 07-07-2026, 11:05 AM

Unconfigured Ad

To get the sequence from a given BAM file

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News