Seqanswers Leaderboard Ad

**cmbetts** · 12-05-2014, 10:51 AM

Something like that should be pretty easy to do with any scripting language with fastq parsing libraries (or heck maybe manually inspecting the fastq since there's only 6 primers if you're not keen on programming).

Example R psuedocode (only because that's what I'm comfortable with)
library("ShortRead"); #load the library for fastq manipulation
fq_data <- readFastq("reads.fastq.gz"); #read in fastq data
base_info <- sread(fq_data); #get just the base calls
first20 <- substring(base_info, 1, 20); #get the first 20bp of each read

then you could do something like
table(first20) to see the frequency of different 20bp sequences
alphabetFrequency(first20) to get consensus sequences

**maubp** · 12-11-2014, 05:00 AM

IIRC, some FASTQ quality control pipelines will spot and report possible primer sequences.

**JenBarb** · 12-11-2014, 05:20 AM

maubp,
I would love to find a tool that will spot and report possible primers. Can you be more specific?

I tried to sort and tally up the sequences and i am not finding them this way. Which tool are you referring to?

Thanks a bunch.
Jen

**maubp** · 12-11-2014, 05:25 AM

e.g. FASTQC reports overrepresented sequences which ought to spot your primers:

Overrepresented Sequences

http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/3%20Analysis%20Modules/9%20Overrepresented%20Sequences.html

**Brian Bushnell** · 04-09-2015, 06:42 PM

If you have not solved this problem yet, there's another option, using BBTools:

reformat.sh in=reads.fq out=trimmed.fq ftr=19
This will trim all but the first 20 bases (all bases after position 19, zero-based).

kmercountexact.sh in=trimmed.fq out=counts.txt fastadump=f mincount=10 k=20 rcomp=f
This will generate a file containing the counts of all 20-mers that occurred at least 10 times, in a 2-column format that is easy to sort in Excel. For example:

Code:

ACCGTTACCGTTACCGTTAC	100
AAATTTTTTTCCCCCCCCCC	85

...etc. If the primers are 20bp long, they should be pretty obvious.

**JenBarb** · 05-27-2015, 05:54 AM

Hi Brian,
How should I cite your tool in a manuscript in prep that I am doing? Do you have a reference or should I use your website?
Thanks,
Jen

**Brian Bushnell** · 05-27-2015, 08:47 AM

Hi Jen,

My tools are all still unpublished, so please just cite my name and website. Thanks!

-Brian

Topics	Statistics	Last Post
ASHG 2024 Highlights – Part Two by seqadmin Started by seqadmin, Today, 11:09 AM	0 responses 24 views 0 likes	Last Post by seqadmin Today, 11:09 AM
ASHG 2024 Highlights – Part One by seqadmin Started by seqadmin, Today, 06:13 AM	0 responses 20 views 0 likes	Last Post by seqadmin Today, 06:13 AM
Seq-Scope Expands Possibilities for High-Resolution Gene Expression Analysis by seqadmin Started by seqadmin, 11-01-2024, 06:09 AM	0 responses 30 views 0 likes	Last Post by seqadmin 11-01-2024, 06:09 AM
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks by seqadmin Started by seqadmin, 10-30-2024, 05:31 AM	0 responses 21 views 0 likes	Last Post by seqadmin 10-30-2024, 05:31 AM

Seqanswers Leaderboard Ad

Announcement

Pull out unknown primers from fastq file?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News