Unconfigured Ad

**cmbetts** · 12-05-2014, 10:51 AM

Something like that should be pretty easy to do with any scripting language with fastq parsing libraries (or heck maybe manually inspecting the fastq since there's only 6 primers if you're not keen on programming).

Example R psuedocode (only because that's what I'm comfortable with)
library("ShortRead"); #load the library for fastq manipulation
fq_data <- readFastq("reads.fastq.gz"); #read in fastq data
base_info <- sread(fq_data); #get just the base calls
first20 <- substring(base_info, 1, 20); #get the first 20bp of each read

then you could do something like
table(first20) to see the frequency of different 20bp sequences
alphabetFrequency(first20) to get consensus sequences

**maubp** · 12-11-2014, 05:00 AM

IIRC, some FASTQ quality control pipelines will spot and report possible primer sequences.

**JenBarb** · 12-11-2014, 05:20 AM

maubp,
I would love to find a tool that will spot and report possible primers. Can you be more specific?

I tried to sort and tally up the sequences and i am not finding them this way. Which tool are you referring to?

Thanks a bunch.
Jen

**maubp** · 12-11-2014, 05:25 AM

e.g. FASTQC reports overrepresented sequences which ought to spot your primers:

http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/3%20Analysis%20Modules/9%20Overrepresented%20Sequences.html

**Brian Bushnell** · 04-09-2015, 06:42 PM

If you have not solved this problem yet, there's another option, using BBTools:

reformat.sh in=reads.fq out=trimmed.fq ftr=19
This will trim all but the first 20 bases (all bases after position 19, zero-based).

kmercountexact.sh in=trimmed.fq out=counts.txt fastadump=f mincount=10 k=20 rcomp=f
This will generate a file containing the counts of all 20-mers that occurred at least 10 times, in a 2-column format that is easy to sort in Excel. For example:

Code:

ACCGTTACCGTTACCGTTAC	100
AAATTTTTTTCCCCCCCCCC	85

...etc. If the primers are 20bp long, they should be pretty obvious.

**JenBarb** · 05-27-2015, 05:54 AM

Hi Brian,
How should I cite your tool in a manuscript in prep that I am doing? Do you have a reference or should I use your website?
Thanks,
Jen

**Brian Bushnell** · 05-27-2015, 08:47 AM

Hi Jen,

My tools are all still unpublished, so please just cite my name and website. Thanks!

-Brian

Topics	Statistics	Last Post
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, Yesterday, 12:03 PM	0 responses 19 views 0 reactions	Last Post by SEQadmin2 Yesterday, 12:03 PM
DNA Methylation Study Reveals How Epigenetic Changes Pass Between Generations by SEQadmin2 Started by SEQadmin2, Yesterday, 11:40 AM	0 responses 14 views 0 reactions	Last Post by SEQadmin2 Yesterday, 11:40 AM
MetaBeeAI Helps Scientists Process Research Literature Faster by SEQadmin2 Started by SEQadmin2, 05-28-2026, 11:40 AM	0 responses 29 views 0 reactions	Last Post by SEQadmin2 05-28-2026, 11:40 AM
Scientists Solve a 25-Year Mystery in RNA Interference by SEQadmin2 Started by SEQadmin2, 05-26-2026, 10:12 AM	0 responses 31 views 0 reactions	Last Post by SEQadmin2 05-26-2026, 10:12 AM

Unconfigured Ad

Pull out unknown primers from fastq file?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News