Unconfigured Ad

**maubp** · 02-14-2011, 04:07 AM

Originally posted by seqsyd View Post

can any 1 tell me please how can i install SAMtools on my windows operating system...Help Please...thanks

Have you tried the provided Windows binaries on sourceforge? i.e. samtools-0.1.12a_i386-win32.zip

[I'd suggest getting hold of a Linux (or Mac OS X) machine if you can, since not that many of the NGS tools will work on Windows]

**ElMichael** · 04-15-2011, 06:43 AM

optional fields in SAM format

Could anybody advise me the best way to parse optional fields in SAM file? I'm interested in tags X0 and XA, and the problem is that they do not have a permanent place in the output string (e.g., sometimes 13th or 14th, 18th, 19th or 20th). Certainly, I can do it myself in perl, but such parsing would take too much time and resources.
What I'm looking for is:
1) either a way to control output tags (their number and order)
2) or (better) some bioperl module, like blast parser, that could parse it effectively.
thanks!

**lh3** · 04-15-2011, 06:48 AM

For perl, very easy and efficient:

perl -ne 'print "$1\n" if /XA:i

\d+)/'

**ElMichael** · 04-15-2011, 06:52 AM

Originally posted by lh3 View Post

For perl, very easy and efficient:

perl -ne 'print "$1\n" if /XA:i

\d+)/'

It's exactly what I want to avoid - checking each line with IF condition.

**lh3** · 04-15-2011, 06:57 AM

This is fast. If present, the ordering of tags are fixed. You can

if /X0:i

\d+).*XA:Z

\S+)/

to grep out everything with a single pass. You need to read and split after all. Regex matching should be of similar speed, if not faster. The bad implementation is to loop through the optional tags and then try to find matching.

**ElMichael** · 04-15-2011, 10:57 AM

Thank you, Heng!

**Strand SI** · 04-21-2011, 08:39 AM

I converted some ELAND export files to SAM and now tried to make BAM files out of those, but it is complaining about @SQ headers not being there?!
Does the export2sam.pl tool not create a SAM file format that is correct??

**oudacontrol** · 06-10-2011, 03:37 PM

I am trying to view a portion of a sorted, indexed bam file and I am getting the error:
[sam_header_read2] 26 sequences loaded.
[main_samview] random alignment retrieval only works for indexed BAM files.

It is already sorted and indexed.... I have no idea how to resolve this.

**jnfass** · 06-20-2011, 01:56 PM

@oudacontrol: You should post the command you used. It seems like the command may have been mis-formatted, but there's no way to tell if you don't include it.

**Bruins** · 08-03-2011, 05:48 AM

Dear all,

I'm had the same problem as oudacontrol above and Jeckow earlier in this thread.

Trying to get samtools view to extract one chromosome from a sorted and indexed bam file fails:

Code:

[sam_header_read2] 84 sequences loaded.
[main_samview] random alignment retrieval only works for indexed BAM files.

It took me a long while to figure out what caused it, so I thought I'd post my simple solution for poor sobs like me...

Code:

samtools sort in.bam in.sorted

Code:

samtools index in.sorted.bam

Code:

samtools view -bh -t human_g1k_v37.fa.fai -o in.sorted.chr9.bam in.sorted.bam 9

As it turned out, I should not have used the -t option. Without it, view finished as it should have, leaving me with a nice chr9-only bam file! (at least, it's a 279M file, I'll have to check later if trackster or igv will load it

)

Cheers,
Bruins

**Hena** · 08-05-2011, 01:04 AM

Does anyone have nice perl snippet to calculate the end point of the read in a sam file?

**pengchy** · 10-20-2011, 04:26 AM

maq2sam-long problems

Dear all,

Originally posted by xmluo View Post

I used maq2sam-long to convert maq output to sam format, but all pairing information is missed in the results: MRNM is "*", and MPOS and ISIZE are 0. Can you recommend how to get these information? Thanks, Mei

I run into the same questions.

Another question is about the statistics for the results of out.map and out.bam.

1. how can I calculate the mapping rate?
95%(1903176 / 2000000) from maq or 95.68%(1820937 / 1903176) from the converted bam?

2. how can I calculate the unique mapped pair-end reads number?
I have conceived a method: according to the sam format spcification, the TAG "H0:i:1 H1:i:0", give the Number of perfect hits (H0) and Number of 1-dierence hits (H1), so we can deduce that one pair is uniquely mapped if and only if any one end's H0:i:1 or "H0:i:0 and H1:i:1", can it work?

3. how to get the MRNM, MPOS and ISIZE?
In my case, the ISIZE is normal. I found that the flag value in the second column is normal too. The mate read also given at another line, so we can fill the MRNM and MPOS columns accordingly. can it work?

Code:

##suppose we got the output file of "maq map" is out.map. then
##got out.sm
samtools-0.1.18/misc/maq2sam-long out.map  > out.sam
##got the sorted bam file out.bam
samtools-0.1.18/samtools view -ut reference.fa.fai out.sam | samtools  sort - out
##got the statistics results
##maq
maq-0.7.1/maq.pl statmap nohup.out > out.map.stat
samtools flagstat out.bam > out.bam.stat

-bash-3.2$ more out.map.stat 

-- == statmap report ==

-- # single end (SE) reads: 0
-- # mapped SE reads: 0 (/ 0 = NA%)
-- # paired end (PE) reads: 2000000
-- # mapped PE reads: 1903176 (/ 2000000 = 95.15%)
-- # reads that are mapped in pairs: 1756127 (/ 1903176 = 92.27%)
-- # Q>=30 reads that are moved to meet mate-pair requirement: 2962 (/ 1756127 = 0.16%)
-- # Q<30 reads that are moved to meet mate-pair requirement: 203102 (11.56%)

-bash-3.2$ more out.bam.stat 
1903176 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
1820937 + 0 mapped (95.68%:nan%)
1903176 + 0 paired in sequencing
951588 + 0 read1
951588 + 0 read2
1673888 + 0 properly paired (87.95%:nan%)
1738698 + 0 with itself and mate mapped
82239 + 0 singletons (4.32%:nan%)
1738698 + 0 with mate mapped to a different chr
1471758 + 0 with mate mapped to a different chr (mapQ>=5)

Any suggestions are appreciated, Thank you all.

pengchy

**pengchy** · 10-21-2011, 05:48 PM

Hi,
I have found the solution.
1. the mapping rate is 1820937/2000000

2. the unique mapped reads can be got from:
for the single end: "H0:i:1 H1:i:A" or "H0:i:0 H1:i:1" or "H0:i:0 H1:i:0" ##here, "A" stand for any number
for the pair end: any end has above tags.

3. Yes, it is.

**dtc** · 10-15-2012, 12:21 AM

I have the same problem now, do you have any solution now?
Thanks Alexey

**Hena** · 10-15-2012, 12:39 AM

Originally posted by dtc View Post

I have the same problem now, do you have any solution now?
Thanks Alexey

I did my own version of this in the end using CIGAR string. However it does not take into account 'X', '=', 'N' or 'P' tags.

Code:

sub last_pos {
  my @read = @_;
  local $_;
  my ($match,$del) = (0,0);
  while ($read[$CIGAR] =~ m/(\d+)M/g) {
    $match += $1;
  }
  while ($read[$CIGAR] =~ m/(\d+)D/g) {
    $del += $1;
  }
  return $read[$POS] + $match + $del -1;
}

So counting the match/mismatch and deletion tags should give the number of nucleotides it will span in reference.

Topics	Statistics	Last Post
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 21 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 39 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 46 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 49 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM

Unconfigured Ad

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News