Seqanswers Leaderboard Ad

**drio** · 12-08-2010, 09:14 AM

For those cases, MAPQ should be 0, so you can filter them out prior performing your downstream analysis.

**SWP** · 12-08-2010, 09:19 AM

Thank you...

**dawe** · 12-08-2010, 10:04 AM

Originally posted by SWP View Post

Thank you...

You can also try GibbsAM, this should help to uniquely map your alignments. I'm still evaluating it, so I can't tell anything more than the paper.
If you want to give it a chance, I have some scripts to convert bwa output into GibbsAM input (and back to BAM, then).
Also, the mapq=0 rule is not always true... I've found a number of tags aligned in multiple places (and with the XT:A:R tag) but with mapq > 0

d

**SWP** · 12-08-2010, 02:39 PM

Yeah, I'd love to have the scripts and give it a shot.

**dawe** · 12-09-2010, 12:47 AM

First of all you have to run bwa samse with -n X option, with X high enough (I've used 255). Then you can use this python script

Code:

#!/usr/bin/python

import sys
import pysam

samfile = pysam.Samfile(sys.argv[1], 'rb')
sep = "_"

for x in samfile.fetch():
  if x.flag & 4: continue
  buf = []
  readChrom = samfile.references[x.rname]
  if '_random' not in readChrom and '_hap' not in readChrom:
    strand = '+'
    if x.flag & 16:
      strand = '-'
    buf.append( '_'.join([readChrom, str(x.pos), strand]))
  
  XAList = [Tag for Tag in x.tags if Tag[0] == 'XA']
  if len(XAList):
    for p in XAList[0][1].split(';'):
      a = p.split(',')
      if '_random' in p or '_hap' in p: continue
      if len(a) <= 1: continue 
      if int(a[1]) >= 0:
        buf.append('_'.join([a[0], a[1][1:], '+']))
      else:
        buf.append('_'.join([a[0], a[1][1:], '-']))
  if len(buf):
    print x.qname + '\t' + ','.join(buf)

to convert your BAM into a "map" file.

Code:

$ python bam2map.py filein.bam > filein.map

You can use filein.map with get_unique_file.pl from GibbsAM (part C). My code depends on pysam and prints output to stdout (and sorry if it's ugly code).
Beware, GibbsAM is rather slow.
After GibbsAM has done and you get your final output (results.txt) you can run this

Code:

import sys
import pysam

samfile = pysam.Samfile(sys.argv[1], 'rb')
gfile = open(sys.argv[2], 'r')
outfile = pysam.Samfile("-", 'wh', header = samfile.header)

def cigar2tuple(cigarString):
  cdict = {'M':0, 'I':1, 'D':2, 'N':3, 'S':4, 'H':5, 'P':6}
  outlist = []
  pn = 0
  for n in range(len(cigarString)):
    l = cigarString[n]
    if l in 'MIDNSHP':
      outlist.append((cdict[l], int(cigarString[pn:n])))
      pn = n + 1
  return tuple(outlist)

def correctNM(Taglist, newNM):
  newTagList = []
  for x in Taglist:
    if x[0] == 'NM':
      newTagList.append(('NM', int(newNM)))
    else:
      newTagList.append(x)
  return newTagList

def fixTags(Taglist):
  newTagList = []
  for x in Taglist:
    if x[0] == 'XT':
      newTagList.append(('XT', 'U'))
    elif x[0] == 'XA':
      continue
    else:
      newTagList.append(x)
  newTagList.append(('GF','M'))
  return newTagList
  
mappings = {}

for line in gfile:
  t = line.strip().split()
  if t[-1] == 'a':
    mappings[t[0]] = t[1] + ',' + t[3] + t[2]

for x in samfile.fetch():
  try:
    d = mappings[x.qname]
  except KeyError:
    outfile.write(x)
    continue
  XATag = [Tag for Tag in x.tags if Tag[0] == 'XA'][0][1]
  x.tags = fixTags(x.tags)
  try:
    theString = [xa for xa in XATag.split(';') if d in xa][0]
    t = theString.split(',')
    x.rname = samfile.references.index(t[0])
    x.pos = int(t[1][1:])
    x.cigar = cigar2tuple(t[2])
    if  t[1][0] == '-':
      x.flag = x.flag | 16
    else:
      x.flag = x.flag & ~16
    x.tags = correctNM(x.tags, t[3])
  except IndexError:
    # the read is already in the correct position
    outfile.write(x)
    continue
#  temp = [Tag for Tag in x.tags if Tag[0] != 'XA']
#  x.tags = temp
  if not x.mapq:
    x.mapq = 1
  outfile.write(x)
outfile.close()

to transform your results.txt into a SAM file, just issue:

Code:

$ python gam2bam.py filein.bam results.txt > results.sam

where filein.bam is your original BAM file (from bwa).
Note that these scripts are somehow optimized for human genomes, it excludes _random and _hap chroms (it's a long story, but this happens because GibbsAM doesn't support those chroms...). If you are performing analysis with another genome you may need to modify the first code. I'm in touch with GibbsAM developers, I'm waiting this flaw to be fixed.

HTH
d

**Thomas Doktor** · 12-09-2010, 03:26 AM

Unless I've misunderstood something, if you want the uniquely aligned hits, you'd want the reads which had just 1 best hit reported. BWA reports that as a custom tag in the SAM output, "X0:i:n" where n is the number of best hits. To get the unique hits you could simply do a grep:
$ grep -P "X0:i:1\t" hits.sam > unique.hits.sam

Does anyone else have some inputs?

EDIT
You could also just go for the XT:A:U tag which BWA reports for the unique alignments.

Topics	Statistics	Last Post
The Role of Enhancers in Defining Cell Fate by seqadmin Started by seqadmin, Yesterday, 10:49 AM	0 responses 18 views 0 likes	Last Post by seqadmin Yesterday, 10:49 AM
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 24 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 20 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM

Seqanswers Leaderboard Ad

Announcement

samse repetitive hits question

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News