Unconfigured Ad

lh3 · 05-13-2009, 09:49 AM

The latest SVN parses the header @SQ lines. You need to use the "view" command to do this.

jkbonfield · 05-13-2009, 08:09 AM

Feature request

I find it cumbersome that samtools import *requires* an external index file to tell it what reference sequences are present. This is even true if there is a header in the .sam file.

For my own sanity I knocked up this simple, albeit slow, perl script:

Code:

#!/usr/bin/perl -w

# Reads a sam file and generates an index suitable for "samtools import".

use strict;

#--- Build index
my %ref; #key=name, value=length
while (<>) {
    my @F=split(" ", $_);
    my $rname = $F[2];

    next if $rname eq "*";
    my $end = $F[3] + length($F[9]);

    $ref{$rname} = $end if (!exists($ref{$rname}) || $ref{$rname} < $end);
}

#--- Output input to stdout
foreach (sort keys %ref) {
    print "$_\t$ref{$_}\n";
}

It generates a file suitable for passing as the 1st argument after "samtools import". Admittedly it doesn't know the true ending of the reference sequence, but for de-novo assemblies the reference is just the consensus anyway so it's faked.

Note that the above is an estimate. It doesn't consider the fact that the sequences may have CIGAR string possibly altering the mapped coordinates of the reference, but it's a start at least.

zee · 05-07-2009, 11:58 PM

Originally posted by lh3 View Post

Please check out the latest version (0.1.3-9 r256) from SVN. I fixed a kind of typo but I am not sure if this also fixes this assertion failure. It is not easy to see this. Please let me know if this works. Many thanks.

I have checked out the 'latest' version from SVN:

Program: samtools (Tools for alignments in the SAM format)
Version: 0.1.3-16 (r267)

This seems to be a later version, would it contain the fix?

lh3 · 05-04-2009, 11:00 AM

Bwa and samtools are two separated packages. I will start a new thread about questions related to bwa.

I answer your question here. I guess you are using reads shorter than 30bp. BWT based algorithms are generally slow for enumerating multiple hits and very short reads are more likely to have many hits. Therefore, BWA's 'sampe' and 'samse -n' are inefficient for these short reads. Probably non-BWT algorithms are better for such alignment. If you really want to align very short reads, you may consider use smaller -n with samse and smaller -o (say 200) with sampe. Using smaller -o will make pairing less effective, though.

The memory of bwa is almost independent of the number of input reads and the speed is linear in that. You can separate input into smaller batches.

gcrdb · 05-04-2009, 10:37 AM

Hi,
The bwa aln is fast (few hours),
but somehow bwa samse(or sampe) is slow.
This program have been running for almost a week:
bwa samse -n 255 mouse_genome.fa data.sai data.fq >data.sam

My data.fq has about 80 million reads. Is that to much reads for samse(pe)? Should I split them and merge?

lh3 · 05-04-2009, 10:19 AM

Please check out the latest version (0.1.3-9 r256) from SVN. I fixed a kind of typo but I am not sure if this also fixes this assertion failure. It is not easy to see this. Please let me know if this works. Many thanks.

nilshomer · 05-04-2009, 09:00 AM

I have also had a lot of problems when viewing locations with many insertions and deletions. I have been trying to find the bug in the source code, but so far no luck.

Nils

zee · 05-04-2009, 02:12 AM

I was wondering if I could get some help on the SAM specification.

I get the error:

samtools: bam_lpileup.c:115: tview_func: Assertion `l == tv->n_pre' failed

When I page the viewer to my reads containing indels. A sample of my data is

suis_indels_30749_30917_a0b7e 147 Streptococcus_suis 30782 150 60M7D15M * 0 0 CAGATTGTTGCGGTTTATAAGGATCAGGCGACGGCTTTTAATGGTGGTAAGAAAGAACAGGCAAGGGCCGGCTCA YX[QRQZETFQLZT[FQMX]_``\SZ```_`^``_[``^XZ```^``__``X]T[```_^`_````````````` NM:i:7 MD:Z:60M7D15M
suis_indels_30862_31024_a3384 131 Streptococcus_suis 30801 150 41M7D34M * 0 0 AGGATCAGGCGACGGCTTTTAATGGTGGTAAGAAAGAACAGGCAAGGGCCGGCTCAATAATCTGATTTCATCTTT `````H^_^^_```Z]_``````]``^]``_]```]]X````]_]^````]Z_URYT]]\\]_X``_^][U^``\ NM:i:7 MD:Z:41M7D34M
suis_indels_30874_31073_a388e 131 Streptococcus_suis 30813 150 29M7D46M * 0 0 CGGCTTTTAATGGTGGTAAGAAAGAACAGGCAAGGGCCGGCTCAATAATCTGATTTCATCTTTGATTTTTGAAAA ```^`\[`````````````_X][U]RIST\`XDDLIHGK]J\XMFVER^]`````[POHLGTU]^_Z``]]]KV NM:i:7 MD:Z:29M7D46M
suis_indels_30890_31082_a9685 131 Streptococcus_suis 30829 150 13M7D62M * 0 0 TAAGAAAGAACAGGCAAGGGCCGGCTCAATAATCTGATTTCATCTTTGATTTTTGAAAAATTGAATGAAGCTGGT `]__`ZOY^``ZUV``]TTY`_``````^LU```_`_``XV[````^\`\\^UU[`XHOPKITNEUNIRRNDWW` NM:i:7 MD:Z:13M7D62M
suis_indels_16281_16438_b9354 131 Streptococcus_suis 16224 150 6M1D69M * 0 0 AAATGAGATATTTACGAATTATCTCACGCAATACAATCACCGGGATTGATAAACGATGAATTATTAATGTTGAG`[``````_`_I_]O[__`__Y__VFMY\````^Z```^Z\U``_``````]S`\Y^[_]^```^]`_\Y^ZQXH NM:i:2 MD:Z:6M1D69M
suis_indels_2007371_2007505_bf820 147 Streptococcus_suis 2007364 150 71M3D4M * 0 0 GTTCTGACTTTTATTCACATCTTGTGGAAAACTTTTCTTAACAGTGTGGATTTTAAAAATTATCTGTGGAATTTT MRT^\_]RZ__`]]YP[^`ZT`T]VH``^]^TS]]``]^^````````_W_`_\\```_````````````\^`^ NM:i:3 MD:Z:71M3D4M
suis_indels_30783_30934_c09ed 147 Streptococcus_suis 30799 150 43M7D32M * 0 0 TAAGGATCAGGCGACGGCTTTTAATGGTGGTAAGAAAGAACAGGCAAGGGCCGGCTCAATAATCTGATTTCATCT U]PRKMJTRKIK___`Y`\]`\^```ZXVYMZX]`_```Z``_``WWQ]_````_]``]````````````Z_W` NM:i:7 MD:Z:43M7D32M
suis_indels_16171_16358_c4122 147 Streptococcus_suis 16227 150 3M1D72M * 0 0 TGAGATATTTACGAATTATCTCACGCAATACAATCACCGGGATTGATAAACGATGAATTATTAATGTTGAGAGGNFMU``_`_`NTPV]^__^]YYQLX^ZK[^[[[^^``]RUX``Z]Y\Z````````^``_`]U\`^_```]```` NM:i:1 MD:Z:3M1D72M
suis_indels_16164_16353_ce374 147 Streptococcus_suis 16222 150 8M1D67M * 0 0 CCAAATGAGATATTTACGAATTATCTCACGCAATACAATCACCGGGATTGATAAACGATGAATTATTAATGTTG[]]^[]\X[^`\ZZX`^_^\_``````\`_JL[\R]X```_`````_``_Y[]`````[\````Z```````^_H NM:i:2 MD:Z:8M1D67M
suis_indels_30842_31012_d14fb 131 Streptococcus_suis 30781 150 61M7D14M * 0 0 CCAGATTGTTGCGGTTTATAAGGATCAGGCGACGGCTTTTAATGGTGGTAAGAAAGAACAGGCAAGGGCCGGCTC ```_`_`````]`````_```^_VUW^```_^_ZXZT^`````__]YNXTEKV`_YXNRW\\REHDDLSGMFURD NM:i:7 MD:Z:61M7D14M
suis_indels_30721_30932_d9806 147 Streptococcus_suis 30797 150 45M7D30M * 0 0 TATAAGGATCAGGCGACGGCTTTTAATGGTGGTAAGAAAGAACAGGCAAGGGCCGGCTCAATAATCTGATTTCAT PPJ[TFRJ``````TVU`__`^^`_]_`````GFPZTWY\VX`_````__``````^`````````````````` NM:i:7 MD:Z:45M7D30M
suis_indels_30785_30945_e17b9 147 Streptococcus_suis 30810 150 32M7D43M * 0 0 CGACGGCTTTTAATGGTGGTAAGAAAGAACAGGCAAGGGCCGGCTCAATAATCTGATTTCATCTTTGATTTTTGA TFWOOKDUNQZ`^RU^^`STTSVW^_``````^`_V``_``````````\[[`XR_```_W\]O_``__`_T^`` NM:i:7 MD:Z:32M7D43M
suis_indels_30737_30928_e6336 147 Streptococcus_suis 30793 150 49M7D26M * 0 0 GGTTTATAAGGATCAGGCGACGGCTTTTAATGGTGGTAAGAAAGAACAGGCAAGGGCCGGCTCAATAATCTGATT LMRUPOTQPY[V]]^``]TS\^`\^`UXX^S^```_^Z``\\``_```^Y^Q]```````V\]^``^``K^```` NM:i:7 MD:Z:49M7D26M

And I'm using novoalign format where I've modified the novo2sam.pl script to produce a 'correct' CIGAR format. Most samtools operation work except for the tview

mhc · 05-01-2009, 08:51 AM

Understanding samtools pileup output

Hi,

I'm having trouble trying to parse the samtools output. In the example below, at position 60, I have 108 reads. As I understand it, 8 reads terminate (since there are 8 '$'s), and there are 2 new reads (marked by the '^') on the next line.
So the next line - line 61 - should have 108-8+2=102 reads.
Instead, it has 99.
What am I missing here?
This is the 40th line of input with 40bp reads, and this is the first instance where '$' appears. Other lines seem to work out fine.

seq1 60 a 108 .$,$,$,$,$g$....ggt*.G,g,,.,,+2tt,+3agcG.,+4atgc,+4ttgcg.c,c.$.,,,..$,+4aggat+6ccgttt,..,tt,.,,..,.
+7CTGCCTG,.,.,,.,,..,.,..,.,.,,.,,..,,.,.,.,.,.,,.,.,.,.,,^].^],^], CBB=ABBA>BBCBB7BB<BBBBCBBBBBCBB@BB@BBCB:CBBAACC>ABCBBBBBBBBBCC9BBABB@B
B<BBB7CBBBBABBBBBCBBBBB@BB;BBBC@CBCBCB
seq1 61 g 99 A$.$.$.$t$c$t$,..,,a,A,,$..,.a,,.,+4gcag,,.,,$,A.,,+7ctgtttg,t$A,a..,.,.,.,,.,,..,.,..,.,.,,.,,..,,
.,.,.,.,.,,.,.,.,.,,.,,^].^]t BCAABB@B1BB<BBBBBBBBBBBBBACBB@BBABBBBBBBBBCBBCCBBBBBBBBCCBBB6BBBBBBBBBBBABCBBBBBBBBBBBBCAC?BCCBCBB@

lh3 · 04-28-2009, 11:58 AM

The first thing you may want to try is:

samtools view -h aln.bam | less -S

emucaki · 04-27-2009, 01:12 PM

Hi, I'm a novice geneticist who is interested in using the 1000 Genome project data available on NCBI and I can't quite figure how to obtain sequence information from the BAM file, SAMTools' website is little help. I am wondering if anyone knows a good place to get information for this kind of work.

(Offtopic, anyone know why the 1000 Genome project has a log-in but no register option?)

nilshomer · 04-23-2009, 11:43 PM

I have a working patch to view ABI SOLiD color space using samtools (http://samtools.sourceforge.net/) text viewer. For example, some of the features using output from BFAST (in SAM format), which includes the "CS" and "CQ" tags, are:

- option to display colors instead of nucleotides.
- option to color bases/colors based on color. This is similar if you want to color bases based on the given base.
- option to color bases/colors based on color quality.
- the "." (dot) option when displaying color space will only show those colors that were corrected during alignment (i.e. the color errors).
- option to remove all insertions in the current display (in some regions, spurious insertions can cause a headache when viewing that region).

PM me and I can supply you with a source version.

lh3 · 04-21-2009, 11:37 AM

You are invoking pileup with "-c" and you should also read this page:

Consensus/Indel Calling

http://samtools.sourceforge.net/cns0.shtml

A read base "*" means a deletion. The second line at "chr1 1949878" shows indel call. In principle this is not part of pileup.

gcrdb · 04-20-2009, 03:46 PM

lh3,
Thanks for quick response, pileup is working now!
Here is some questions about pileup format when I look at them at first time:
(1) what is a "*" in read bases , which is not documented in "http://samtools.sourceforge.net/pileup.shtml".
(2) Is it okay for a base in the same position pile-up twice ? (chr1 1949878 occur twice in my first pileup output)
thanks,
Below is the piece pile-up output I found the problem:
chr1 1949878 A A 142 0 60 55 C$....,,,...,,.C..,.,,..,...,,,.,.,.+1C,,,,..,.,........,^F
,^], &5,2IIII5I<II+%5=I+II(II8@CII3*I0I+IIII,I$@IIAAIIDI@I*5
chr1 1949878 * */+C 38 38 * +C 13 4 30 8
chr1 1949879 A A 150 0 60 54 ....,G,...,,.-1G..-1G.,.,,..,...,,,.,.,.,,,,..,.,........,,
, II*IIII;I.II&(&1I3II%9III:II7&I&@&IIII5I"6IIIIG.I33I$6
chr1 1949879 * -G/* 481 481 -G * 6 9 30 9
chr1 1949880 G G 25 0 60 55 .$A$..,A,..A,,*.*A,A,,A.,A..,,,.,A,.,,,,.+1A.,.,........A,,
^], +(,.III)8-II8%D.I0II#,I@5III.$I+I,IIII$I2EIIIIIIIIIE&?/
chr1 1949880 * */+A 350 350 * +A 24 3 19 9
chr1 1949881 A A 162 0 60 53 ..,,,...,,....,.,,..,...,,,.,.,.,,,,..,.,........G,,, 6DI
II3I$II81D)I%II+.III'III$I2I+IIII+II<II46IIAHI.2IB

lh3 · 04-20-2009, 11:03 AM

should be: ../samtools pileup -t ex1.fa.fai ex1.sam or ../samtools pileup ex1.bam. I have added a Makefile.

Note that there is a companion format called BAM which is the binary representation of SAM. Most of samtools commands work on BAM only. I know having two formats is a bit confusing, but this is necessary for faster parsing.

Topics	Statistics	Last Post
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 25 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 42 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 48 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 49 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM

Unconfigured Ad

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News