Seqanswers Leaderboard Ad

**jeales** · 04-14-2014, 01:40 PM

Index scf000046.bam ('samtools index scf000046.bam') and then see what 'samtools idxstats scf000046.bam' says
Are there reads reported for all the scaffolds or just scaffold000046?

**toreoe** · 04-15-2014, 03:58 AM

I extracted another, scaffold0008, from a different bam file. I then indexed scaffold00008.bam as suggested. samtools idxstats reported reads only for the target scaffold, none for the others:

scaffold00001 21966007 0 0
scaffold00002 18670670 0 0
scaffold00003 16657927 0 0
scaffold00004 15265287 0 0
scaffold00005 14680965 0 0
scaffold00006 14166276 0 0
scaffold00007 14112451 0 0
scaffold00008 13353334 2289885 55567
scaffold00009 13059713 0 0

scaffold00008 loaded properly in IGV, something I couldn't do with my previous extracted file. So it might be that there was something wrong with my original bam file.

Anyways, it seems like this is simply the way the samtools view extract command outputs the subset bam, i.e. contains all scaffolds from original bam in header. And it works fine.

Thank you very much for your help, jeales.

**jeales** · 04-15-2014, 06:06 AM

Thats good, if there were reads from other scaffolds an awful lot of my work would be wrong as well!

I don't have a good solution for trimming the header to only include scaffolds/chromosomes that have reads present in the bam, other than manual editing

But it's not realyl doing any harm as long as the counts are zero

**dpryan** · 04-15-2014, 06:32 AM

The header is left as-is for technical reasons. In BAM files (and internally by samtools and likely picard too) the chromosome/contig field of an alignment is just a numeric index into this list in the header. Trimming the header, then would result in needing to alter every read as it was output, which would be annoying (this is also why people need to be careful when they reheader BAM files, since changing the order of the contigs/chromosomes in the header will make all of the alignments wrong), though I guess not that hard to implement (it would affect performance, though).

**jeales** · 04-15-2014, 07:18 AM

Very informative thanks
So if all the alignments in a bam are mapped to the 10th chromosome/contig in the header their chromosome/contig will be 10 (or probably 9, if you start counting at 0), irrespective of chromosome name?

**dpryan** · 04-15-2014, 07:23 AM

Exactly. As you surmised, everything is 0-based internally, so the value is 9. For those curious, the value is unsigned, so a value of -1 used for unmapped reads.

Topics	Statistics	Last Post
A Closer Look at the Enigmatic Genomes of Oikopleura dioica by seqadmin Started by seqadmin, 05-10-2024, 06:35 AM	0 responses 20 views 0 likes	Last Post by seqadmin 05-10-2024, 06:35 AM
Advanced Epigenome Editing Platform Explores Gene Regulation Mechanisms by seqadmin Started by seqadmin, 05-09-2024, 02:46 PM	0 responses 26 views 0 likes	Last Post by seqadmin 05-09-2024, 02:46 PM
Telomere Maintenance by PARP1: A New Perspective in Cancer Research by seqadmin Started by seqadmin, 05-07-2024, 06:57 AM	0 responses 21 views 0 likes	Last Post by seqadmin 05-07-2024, 06:57 AM
Enhanced Neoantigen Detection: Introducing NeoHunter by seqadmin Started by seqadmin, 05-06-2024, 07:17 AM	0 responses 21 views 0 likes	Last Post by seqadmin 05-06-2024, 07:17 AM

Seqanswers Leaderboard Ad

Announcement

Extracting a subset of bam files

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News