Seqanswers Leaderboard Ad

**rboettcher** · 02-05-2013, 07:45 AM

Hi HSV-1,

You can use bowtie with option -c TCGTGACGGTA
and specify that you want to output all alignments via option -a.
However, in this case you also need to specify that only perfect matches are allowed via -v 0.

See http://bowtie-bio.sourceforge.net/manual.shtml for more details.

Regards

**Richard Finney** · 02-05-2013, 11:18 AM

Here's a fast C solution ... output is 0 based (so add one).

This takes about 2 minutes on my 2000 bogomips machine for hg19 ...

______ begin code _______
// this program finds short patterns and reverse complemented patterns in *.fa fasta (genome) files
// how to compile: gcc -Wall -O3 -o patmatch patmach.c
// how to use : for i in *.fa; do cat $i | ./patmatch; done

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <ctype.h>

// put your pattern in pat1 and the reverse complement in pat2 ...
char pat1[] = "TCGTGACGGTA"; // use upper case - NB: this should be a parameter but you can do it
char pat2[] = "TACCGTCACGA"; // reverse complement of pat1

#define MAXCHROMSIZE 254235640 // maximum chromsome size - edit as necessary - you will need to fix this IF you're using a whole genome FASTA
char chr[MAXCHROMSIZE + 50];

int main()
{
long int i,len;
int j;
char header[5012];
char s[5012];
char *spot = &chr[0];

memset(chr,0,sizeof(chr));
while (gets(s))
{
if (s[0] == '>') { strcpy (header,s); continue;}
for (i=0;s[i];i++) s[i] = toupper(s[i]);
strcat(spot,s);
for ( ; *spot ; spot++) ;
}
len = spot-chr;
spot = chr;
for (i=0;i<len;i++)
{
if (chr[i] == (char)0) break;
for (j=0;pat1[j]==chr[i+j];j++);
if (j == 11) printf("F %s at %ld\n",header,i);
for (j=0;pat2[j]==chr[i+j];j++);
if (j == 11) printf("R %s at %ld\n",header,i);
}
return 0;
}
_______ end code ______

Example for hg19 fastas ...

-bash-3.00$ for i in *.fa; do cat $i | ./patmatch; done
R >chr10 at 111870831
F >chr11 at 36061863
R >chr11 at 77190239
R >chr12 at 119747880
R >chr14 at 81206117
R >chr14 at 95419269
R >chr16 at 11844841
F >chr16 at 78553508
R >chr17 at 45266108
F >chr1 at 17428420
F >chr1 at 52442586
F >chr2 at 25065131
R >chr2 at 53867779
R >chr2 at 114616666
F >chr3 at 55121176
F >chr4 at 1412897
R >chr4 at 136465390
R >chr5 at 84661141
R >chr5 at 103058499
F >chr7 at 2875412
R >chr9 at 75467056

**maasha** · 02-06-2013, 12:04 AM

This can be done with Biopieces (www.biopieces.org) like this:

Code:

read_fasta -i genome.fna | patscan_seq -p tcgtgacggta | write_bed -o out.bed -x

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 20 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 20 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

How to count the repeating times of a certain 11bp long sequence

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News