Seqanswers Leaderboard Ad

**ndsimons** · 01-05-2023, 04:02 PM

For posterity:
import argparse
import pysam
import os

# Define the command-line arguments
parser = argparse.ArgumentParser()
parser.add_argument("bam_file", help="Path to the BAM file")
parser.add_argument("nucleotide1", help="The nucleotide to count at position 1 (e.g., T)")
parser.add_argument("pos1", help="Position 1 (e.g., chr11:128839353)")
parser.add_argument("nucleotide2", help="The nucleotide to count at position 2 (e.g., C)")
parser.add_argument("pos2", help="Position 2 (e.g., chr11:128839368)")
args = parser.parse_args()

# Split the positions into chromosome and position
pos1_chrom, pos1_pos = args.pos1.split(":")
pos2_chrom, pos2_pos = args.pos2.split(":")

# Convert the position values to integers
pos1_pos = int(pos1_pos)
pos2_pos = int(pos2_pos)

# Open the BAM file
bam_file = pysam.AlignmentFile(args.bam_file, "rb")

# Initialize a counter for the number of reads
count = 0

# Iterate through the reads in the BAM file
for read in bam_file.fetch():
# Check if the read is on the same chromosome as pos1 and pos2
if read.reference_name == pos1_chrom and read.reference_name == pos2_chrom:
# Check if the read overlaps with either pos1 or pos2
if (read.reference_start <= pos1_pos < read.reference_end) or (read.reference_start <= pos2_pos < read.reference_end):
# Get the positions of the bases in the reference genome that correspond to the bases in the read
ref_positions = read.get_reference_positions(full_length=True)
# Initialize a flag for whether the read has the specified nucleotides at both positions
has_nucleotides_at_both_positions = True
# Check if the read has the specified nucleotide1 at pos1
if pos1_pos in ref_positions:
has_nucleotides_at_both_positions = has_nucleotides_at_both_positions and (read.query_sequence[ref_positions.index(pos1_pos)] == args.nucleotide1)
else:
has_nucleotides_at_both_positions = False
# Check if the read has the specified nucleotide2 at pos2
if pos2_pos in ref_positions:
has_nucleotides_at_both_positions = has_nucleotides_at_both_positions and (read.query_sequence[ref_positions.index(pos2_pos)] == args.nucleotide2)
else:
has_nucleotides_at_both_positions = False
# If the read has the specified nucleotides at both positions, increment the counter
if has_nucleotides_at_both_positions:
count += 1

# Print the count
print(count)

# get read depth
read_depth = bam_file.count(pos1_chrom, pos1_pos, pos1_pos+1)

# print read depth
print(read_depth)

# Close the BAM file
bam_file.close()

To run:
python phaseCount.py test.bam C chr11:128839352 C chr11:128839367
where test.bam is subset to include reads overlapping BOTH sites (can be generate with bedtools intersect -wa -a original.bam -b sites.bed -F 1.0 > test.bam)

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 13 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

count reads spanning multiple variants from BAM file

Comment

Latest Articles

ad_right_rmr

News