Seqanswers Leaderboard Ad

**dawe** · 12-09-2010, 12:26 PM

Originally posted by Protaeus View Post

In some examples that I've read for using bwa to analyze paired end data, a fastq for each member of the pair is included (in other words, R1.fastq and R2.fastq). Will bwa handle paired end data that is in a single fastq? The reads are denoted with \1 and \2.

AFAIK no, it won't. You may separate reads into two different files, I guess with

Code:

$ grep -A2 ^@*1 filein.fq > reads_1.fq
$ grep -A2 ^@*2 filein.fq > reads_2.fq

d

**kmcarr** · 12-09-2010, 02:24 PM

Originally posted by dawe View Post

AFAIK no, it won't. You may separate reads into two different files, I guess with

Code:

$ grep -A2 ^@*1 filein.fq > reads_1.fq
$ grep -A2 ^@*2 filein.fq > reads_2.fq

d

Not quite. First, FASTQ sets are four lines long so you have to collect the matched line and the 3 following (-A3). Your regular expression means "match 0 or more "@" at the beginning of a line, followed by a 1 (or 2). You need to specify an "@" followed by 0 or more of any character (.*). You are also not anchoring the 1 or 2 to the end of the line. Finally need to enclose the regular expression in quotes. To get what you intended it should be:

Code:

$ grep -A3 ^"@.*1"$ filein.fq > reads_1.fq
$ grep -A3 ^"@.*2"$ filein.fq > reads_2.fq

There is however a hidden gotcha in this method. @, 1 and 2 are valid characters for the quality string if the FASTQ is Sanger (or Illumina prior to 1.5). This means that your grep could match a quality string and then write it and the next three lines as a FASTQ block. This will cause whatever program was trying to parse this to puke (from personal experience).

In a random FASTQ file of ~20m reads I found 511 quality strings which were matched by these grep patterns. An incredibly small fraction to be sure but you need one to screw up your FASTQ file.

**maubp** · 12-09-2010, 03:14 PM

For the reasons kmcarr gives (and other issues like this), personally I'd use a simple script using Biopython, BioPerl or similar rather than grep.

**dawe** · 12-09-2010, 03:28 PM

Originally posted by maubp View Post

For the reasons kmcarr gives (and other issues like this), personally I'd use a simple script using Biopython, BioPerl or similar rather than grep.

I wrote the wrong grep expression, my bad. Indeed I used to grep @XXXX where XXXX is my machine ID for most of the operations... Also, bwa doesn't use quality for alignment (so it will work with A1 or A3).
Nevertheless, I believe grep is much faster than any bioperl/biopython script.

d

**barak** · 11-09-2013, 11:42 PM

Hi. Just found this post in the GATK forum: http://gatkforums.broadinstitute.org...o-fastq-format
Essentially, you can use BWA with interleaved BAM files containing info from both pairs. I know that was not exactly the question, but it is related, and hopefully will save time for some (as with my case).

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 57 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 56 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

paired end fastq format in bwa

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News