Hi group:
I am interested in finding substitutions in a read downstream of a particular site.
In detail:
My target region - chr2: 1200-2000.
CRISPR site: chr2:1220-1240
First, find the reads that have 'NO substitution' in 1220-1240. Within those reads, that means if read A has no substitution at this site, I want to count the substitutions downstream of site which is between 1240 - 2000 in the read A. I want to tabulate what kind of substitutions between 1240-2000 (for example. A->T is found in 1200 reads..or G>A is found in 200 reads..etc..)
Second, find reads that have 'substitution' in 1220-1240 and in those reads see if downstream of that read have any substitution. In case yes, then what type of substitution and how many. For example, if read B has a substitution in 1220-1240, then I want to count the # of substitutions between 1240-2000 in read B.
Case where there is substitution..
1220--------------------1240-------------------------------------------------|
|------------A------------|-------------------------------A/G------G/T-------|
|--------G-------AT------|---------------------A/T-----------------G/T------|
Case where is no substitution...
1220--------------------1240-------------------------------------------------|
|-------------------------|-------------------------------G/A------------------|
|-------------------------|-------------T/G------------------------------------|
|-------------------------|---------------------A/T----------------------------|
What I could do :
using pysam, I could seperate the reads that have and does not substitution in the 1220-1240 into two files.
How can I find identify and count substitutions in 1240-2000.. region..
Any ideas..
thanks
Adrian
I am interested in finding substitutions in a read downstream of a particular site.
In detail:
My target region - chr2: 1200-2000.
CRISPR site: chr2:1220-1240
First, find the reads that have 'NO substitution' in 1220-1240. Within those reads, that means if read A has no substitution at this site, I want to count the substitutions downstream of site which is between 1240 - 2000 in the read A. I want to tabulate what kind of substitutions between 1240-2000 (for example. A->T is found in 1200 reads..or G>A is found in 200 reads..etc..)
Second, find reads that have 'substitution' in 1220-1240 and in those reads see if downstream of that read have any substitution. In case yes, then what type of substitution and how many. For example, if read B has a substitution in 1220-1240, then I want to count the # of substitutions between 1240-2000 in read B.
Case where there is substitution..
1220--------------------1240-------------------------------------------------|
|------------A------------|-------------------------------A/G------G/T-------|
|--------G-------AT------|---------------------A/T-----------------G/T------|
Case where is no substitution...
1220--------------------1240-------------------------------------------------|
|-------------------------|-------------------------------G/A------------------|
|-------------------------|-------------T/G------------------------------------|
|-------------------------|---------------------A/T----------------------------|
What I could do :
using pysam, I could seperate the reads that have and does not substitution in the 1220-1240 into two files.
How can I find identify and count substitutions in 1240-2000.. region..
Any ideas..
thanks
Adrian