Hi, I have a related task:
I have a tab-delimited list of primers of a known gene:
<name 1> <sequence 1>
<name 2> <sequence 2>
etc.
We know on which chromosome the gene is, of course.
I would like to run the entire list and get a BED file of the primer locations:
chr"n" Start Stop <name 1> 0 +or- Start Stop
chr"n" Start Stop <name 2> 0 +or- Start Stop
etc.
The BED file is used to show the primers on a custom annotation track in the Alamut program. I am looking for a way to do this with a batch of primers instead of copying and pasting them in a dialog box one by one. I tried BLAT but that requires primers of at least 20 nt length, which we don't always have.
Thank you!
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Originally posted by dariober View PostIf you want to search both strands than do not use the --noreverse flag and for palindromic patterns, by definition, you will get the same position twice (on + and -). But it seems this is not what you wanted in your previous post.
You would have to convert your bam file to bed and pass it to intersectBed as the b-file with -c flag. This will count for each feature in the a-file the number of hits with the b-file. But why not to use coverageBed that it is designed for this purpose?
Dario
Leave a comment:
-
Originally posted by owwa View PostActually, I need to search both + and - strand for palindromic patterns. Could you please help again?
By the way, how about intersectBed for the read counts. Thanks!
Dario
Leave a comment:
-
Originally posted by dariober View PostHi- glad you made it work! So with palindromic patterns you want to search only one strand, right? You can use the --noreverse flag:
Code:echo ">ref1 NNNGCTTCTACTG >ref2 AGAAGCNNNN NNNCCGG" | fastaRegexFinder.py -f - -r 'CCGG' --noreverse ref2 13 17 ref2_13_17_for 4 + CCGG
By the way, if you want to count how many reads in your bam file overlap the positions of your "tag" you could use bedtools coverageBed:
Code:fastaRegexFinder.py -f myseq.fa -r 'CCGG' --noreverse | coverageBed -abam myaln.bam -b - -counts > tag.count.bed
Actually, I need to search both + and - strand for palindromic patterns. Could you please help again?
By the way, how about intersectBed for the read counts. Thanks!Last edited by owwa; 07-04-2013, 04:53 PM.
Leave a comment:
-
Originally posted by owwa View PostThe script works well. Thanks!
if there is a fix for palindromic sequence, it will be better! for example, when I use the palindromic sequence tag CCGG, it return repeated records:
CL1.Contig1_243 size 1223 gap 0 0% 311 315 CL1.Contig1_243 size 1223 gap 0 0%_311_315_for 4 + CCGG
CL1.Contig1_243 size 1223 gap 0 0% 311 315 CL1.Contig1_243 size 1223 gap 0 0%_311_315_rev 4 - CCGG
CL1.Contig1_243 size 1223 gap 0 0% 1056 1060 CL1.Contig1_243 size 1223 gap 0 0%_1056_1060_for 4 + CCGG
CL1.Contig1_243 size 1223 gap 0 0% 1056 1060 CL1.Contig1_243 size 1223 gap 0 0%_1056_1060_rev 4 - CCGG
CL1.Contig2_243 size 879 gap 0 0% 311 315 CL1.Contig2_243 size 879 gap 0 0%_311_315_for 4 + CCGG
CL1.Contig2_243 size 879 gap 0 0% 311 315 CL1.Contig2_243 size 879 gap 0 0%_311_315_rev 4 - CCGG
CL2.Contig1_243 size 1389 gap 0 0% 513 517 CL2.Contig1_243 size 1389 gap 0 0%_513_517_for 4 + CCGG
CL2.Contig1_243 size 1389 gap 0 0% 513 517 CL2.Contig1_243 size 1389 gap 0 0%_513_517_rev 4 - CCGG
CL2.Contig1_243 size 1389 gap 0 0% 622 626 CL2.Contig1_243 size 1389 gap 0 0%_622_626_for 4 + CCGG
CL2.Contig1_243 size 1389 gap 0 0% 622 626 CL2.Contig1_243 size 1389 gap 0 0%_622_626_rev 4 - CCGG
CL2.Contig1_243 size 1389 gap 0 0% 867 871 CL2.Contig1_243 size 1389 gap 0 0%_867_871_for 4 + CCGG
CL2.Contig1_243 size 1389 gap 0 0% 867 871 CL2.Contig1_243 size 1389 gap 0 0%_867_871_rev 4 - CCGG
CL2.Contig1_243 size 1389 gap 0 0% 874 878 CL2.Contig1_243 size 1389 gap 0 0%_874_878_for 4 + CCGG
CL2.Contig1_243 size 1389 gap 0 0% 874 878 CL2.Contig1_243 size 1389 gap 0 0%_874_878_rev 4 - CCGG
Code:echo ">ref1 NNNGCTTCTACTG >ref2 AGAAGCNNNN NNNCCGG" | fastaRegexFinder.py -f - -r 'CCGG' --noreverse ref2 13 17 ref2_13_17_for 4 + CCGG
By the way, if you want to count how many reads in your bam file overlap the positions of your "tag" you could use bedtools coverageBed:
Code:fastaRegexFinder.py -f myseq.fa -r 'CCGG' --noreverse | coverageBed -abam myaln.bam -b - -counts > tag.count.bed
Last edited by dariober; 07-04-2013, 12:08 AM.
Leave a comment:
-
Originally posted by dariober View PostHello,
I'm not sure why you mention the bam file... If you want to get the position of a pattern in a fasta file I happen to write a while ago a simple python script (attached) which searches a fasta file for a regular expression and returns the position of the matches in bed format. (You need python2.7 or any python with argparse module)
After unzipping fastaRegexFinder.py.gz you would do:
Code:## Default: Search tag on forward and reverse strand, case insensitive python fastaRegexFinder.py -r 'GCTTCT' -f reference.fasta ref1 3 9 ref1_3_9_for 6 + GCTTCT ref2 0 6 ref2_0_6_rev 6 - AGAAGC ## Where example.fasta looks like: cat example.fasta >ref1 NNNGCTTCTACTG >ref2 AGAAGCNNNN
Dario
if there is a fix for palindromic sequence, it will be better! for example, when I use the palindromic sequence tag CCGG, it return repeated records:
CL1.Contig1_243 size 1223 gap 0 0% 311 315 CL1.Contig1_243 size 1223 gap 0 0%_311_315_for 4 + CCGG
CL1.Contig1_243 size 1223 gap 0 0% 311 315 CL1.Contig1_243 size 1223 gap 0 0%_311_315_rev 4 - CCGG
CL1.Contig1_243 size 1223 gap 0 0% 1056 1060 CL1.Contig1_243 size 1223 gap 0 0%_1056_1060_for 4 + CCGG
CL1.Contig1_243 size 1223 gap 0 0% 1056 1060 CL1.Contig1_243 size 1223 gap 0 0%_1056_1060_rev 4 - CCGG
CL1.Contig2_243 size 879 gap 0 0% 311 315 CL1.Contig2_243 size 879 gap 0 0%_311_315_for 4 + CCGG
CL1.Contig2_243 size 879 gap 0 0% 311 315 CL1.Contig2_243 size 879 gap 0 0%_311_315_rev 4 - CCGG
CL2.Contig1_243 size 1389 gap 0 0% 513 517 CL2.Contig1_243 size 1389 gap 0 0%_513_517_for 4 + CCGG
CL2.Contig1_243 size 1389 gap 0 0% 513 517 CL2.Contig1_243 size 1389 gap 0 0%_513_517_rev 4 - CCGG
CL2.Contig1_243 size 1389 gap 0 0% 622 626 CL2.Contig1_243 size 1389 gap 0 0%_622_626_for 4 + CCGG
CL2.Contig1_243 size 1389 gap 0 0% 622 626 CL2.Contig1_243 size 1389 gap 0 0%_622_626_rev 4 - CCGG
CL2.Contig1_243 size 1389 gap 0 0% 867 871 CL2.Contig1_243 size 1389 gap 0 0%_867_871_for 4 + CCGG
CL2.Contig1_243 size 1389 gap 0 0% 867 871 CL2.Contig1_243 size 1389 gap 0 0%_867_871_rev 4 - CCGG
CL2.Contig1_243 size 1389 gap 0 0% 874 878 CL2.Contig1_243 size 1389 gap 0 0%_874_878_for 4 + CCGG
CL2.Contig1_243 size 1389 gap 0 0% 874 878 CL2.Contig1_243 size 1389 gap 0 0%_874_878_rev 4 - CCGGLast edited by owwa; 07-03-2013, 11:40 PM.
Leave a comment:
-
Originally posted by dariober View PostHello,
I'm not sure why you mention the bam file... If you want to get the position of a pattern in a fasta file I happen to write a while ago a simple python script (attached) which searches a fasta file for a regular expression and returns the position of the matches in bed format. (You need python2.7 or any python with argparse module)
After unzipping fastaRegexFinder.py.gz you would do:
Code:## Default: Search tag on forward and reverse strand, case insensitive python fastaRegexFinder.py -r 'GCTTCT' -f reference.fasta ref1 3 9 ref1_3_9_for 6 + GCTTCT ref2 0 6 ref2_0_6_rev 6 - AGAAGC ## Where example.fasta looks like: cat example.fasta >ref1 NNNGCTTCTACTG >ref2 AGAAGCNNNN
Dario
thank you so much for your kind help! iI will try it.
I just want to extract the read count from the BAM file for the position where has the tag GCTTCT or some other tags,Last edited by owwa; 07-03-2013, 04:42 AM.
Leave a comment:
-
Originally posted by owwa View PostI have a BAM file and also the related refference genome (fasta), and now I need to extract all the position for some tags, such as "GCTTCT" in the genome, and then make a custom bed file for these tags as the following:
chr1 127471196 127472363 +
chr1 127472363 127473530 +
chr1 127473530 127474697 +
chrx 127474697 127475864 +
chr2 127475864 127477031 -
chr3 127477031 127478198 -
chr5 127478198 127479365 -
chr7 127479365 127480532 +
chr7 127480532 127481699 -
Your help will be greatly appreciated!
I'm not sure why you mention the bam file... If you want to get the position of a pattern in a fasta file I happen to write a while ago a simple python script (attached) which searches a fasta file for a regular expression and returns the position of the matches in bed format. (You need python2.7 or any python with argparse module)
After unzipping fastaRegexFinder.py.gz you would do:
Code:## Default: Search tag on forward and reverse strand, case insensitive python fastaRegexFinder.py -r 'GCTTCT' -f reference.fasta ref1 3 9 ref1_3_9_for 6 + GCTTCT ref2 0 6 ref2_0_6_rev 6 - AGAAGC ## Where example.fasta looks like: cat example.fasta >ref1 NNNGCTTCTACTG >ref2 AGAAGCNNNN
DarioAttached Files
Leave a comment:
-
how to create a custom bed file
I have a BAM file and also the related refference genome (fasta), and now I need to extract all the position for some tags, such as "GCTTCT" in the genome, and then make a custom bed file for these tags as the following:
chr1 127471196 127472363 +
chr1 127472363 127473530 +
chr1 127473530 127474697 +
chrx 127474697 127475864 +
chr2 127475864 127477031 -
chr3 127477031 127478198 -
chr5 127478198 127479365 -
chr7 127479365 127480532 +
chr7 127480532 127481699 -
Your help will be greatly appreciated!Tags: None
Latest Articles
Collapse
-
by seqadmin
Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.
Nobel Prize for MicroRNA Discovery
This week,...-
Channel: Articles
10-07-2024, 08:07 AM -
-
by seqadmin
Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...-
Channel: Articles
09-23-2024, 06:35 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Yesterday, 02:44 PM
|
0 responses
7 views
0 likes
|
Last Post
by seqadmin
Yesterday, 02:44 PM
|
||
Started by seqadmin, 10-11-2024, 06:55 AM
|
0 responses
14 views
0 likes
|
Last Post
by seqadmin
10-11-2024, 06:55 AM
|
||
Started by seqadmin, 10-02-2024, 04:51 AM
|
0 responses
110 views
0 likes
|
Last Post
by seqadmin
10-02-2024, 04:51 AM
|
||
Started by seqadmin, 10-01-2024, 07:10 AM
|
0 responses
117 views
0 likes
|
Last Post
by seqadmin
10-01-2024, 07:10 AM
|
Leave a comment: