Unconfigured Ad

**BAMseek** · 08-30-2011, 07:29 PM

Could you post the first few lines of the file? Is there something in the description line that indicates where to split them? Or are paired reads listed one right after the other?

**Balat** · 08-30-2011, 07:39 PM

The paired reads are listed as first mate read followed by second mate read.

@HWI-ST945:92:d059facxx:8:1101:1567:2217 1:N:0:TGACCA
@HWI-ST945:92:d059facxx:8:1101:1567:2217 2:N:0:TGACCA

**ocs** · 08-31-2011, 03:36 AM

Assuming you have the standard fastq file format with quality scores

Code:

@test1.1
acgt
+test1.1
1234
@test1.2
acgt
+test1.2
1234

Then this should work. Its quick and dirty and there may be more sophisticated solutions, but nevertheless:

Code:

sed -ne '1~8H;2~8H;3~8H;4~8H;${g;s/^\n//;p}' y.fastq > y_1.fastq
sed -ne '5~8H;6~8H;7~8H;8~8H;${g;s/^\n//;p}' y.fastq > y_2.fastq

When you only have lines as you have stated, its more simple:

Code:

sed -ne '1~2p' x.fastq > x_1.fastq
sed -ne '2~2p' x.fastq > x_2.fastq

Both solutions assume that the reads are consecutive.

**swbarnes2** · 08-31-2011, 08:41 AM

You could also do a grep for the line and the three lines following the lines that have the 1:N:0 pattern. But you may have to get rid of the '--' that'll be put in there (though bwa and samtools don't seem to mind them)

**dcfargo** · 08-31-2011, 08:57 AM

With one per line and every other line:

awk '0 == (NR + 1) % 2' infile > end1 &
awk '0 == (NR + 2) % 2' infile > end2 &

**BAMseek** · 08-31-2011, 05:42 PM

Yet another solution. To add to dcfargo's solution, if the file (infile) is indeed in fastq format (4 lines per record, as shown by ocs), then this should work too

awk '0 == ((NR+4) % 8)*((NR+5) % 8)*((NR+6) % 8)*((NR+7) %8)' infile > end1 &
awk '0 == (NR % 8)*((NR+1) % 8)*((NR+2) % 8)*((NR+3) %8)' infile > end2

**Balat** · 08-31-2011, 05:50 PM

Thank you all. Yes the file is fastq format with 4 lines per read. I was able to split my fastq file using both sed and awk commands.

**ocs** · 08-31-2011, 11:48 PM

Just for curiosity I tried a bit more with sed and came up with more simpler solutions (for those who are interested). My inital solution is quite complicated.

This also works:

Code:

sed -ne '1~8p;2~8p;3~8p;4~8p' x.fastq > x_1.fastq
sed -ne '5~8p;6~8p;7~8p;8~8p' x.fastq > x_2.fastq

Even more simple:

Code:

sed -ne '1~8{N;N;N;p}' x.fastq > x_1.fastq
sed -ne '4~8{N;N;N;p}' x.fastq > x_2.fastq

Also nice to see some awk solutions! Always exciting to see how things work in awk.

**robp** · 08-23-2013, 10:07 AM

That's a very concise solution! However, I think that the commands should be:

Code:

sed -ne '1~8{N;N;N;p}' x.fastq > x_1.fastq
sed -ne '5~8{N;N;N;p}' x.fastq > x_2.fastq

Where, for the second command I've replaced the 4 with a 5. This is because sed is counting from 1, so the 4th line is actually line line at offset 3, which is not the header for the second mate of the pair.

**skycreative** · 06-20-2016, 06:11 PM

It is so helpful and effective! Great thanks!

Originally posted by ocs View Post

Just for curiosity I tried a bit more with sed and came up with more simpler solutions (for those who are interested). My inital solution is quite complicated.

This also works:

Code:

sed -ne '1~8p;2~8p;3~8p;4~8p' x.fastq > x_1.fastq
sed -ne '5~8p;6~8p;7~8p;8~8p' x.fastq > x_2.fastq

Even more simple:

Code:

sed -ne '1~8{N;N;N;p}' x.fastq > x_1.fastq
sed -ne '4~8{N;N;N;p}' x.fastq > x_2.fastq

Also nice to see some awk solutions! Always exciting to see how things work in awk.

**tahia** · 09-22-2016, 07:55 AM

I think grep will be easy if you don't have consecutive read1 and read2

grep -A3 -P "1:N:" --no-group-separator in.fastq >in_1.fastq
grep -A3 -P "2:N:" --no-group-separator in.fastq >in_2.fastq

you can match your pattern as you get read name (/1,_1 or 1:N:#:#)

Best,

Topics	Statistics	Last Post
Study Captures the First Moments of DNA Replication by SEQadmin2 Started by SEQadmin2, 07-24-2026, 12:17 PM	0 responses 20 views 0 reactions	Last Post by SEQadmin2 07-24-2026, 12:17 PM
Chemotherapy Leaves Detectable DNA Signatures in Childhood Tumors by SEQadmin2 Started by SEQadmin2, 07-23-2026, 11:41 AM	0 responses 19 views 0 reactions	Last Post by SEQadmin2 07-23-2026, 11:41 AM
Single-Cell Atlases Skew Toward European Ancestry, Analysis Finds by SEQadmin2 Started by SEQadmin2, 07-20-2026, 11:10 AM	0 responses 25 views 0 reactions	Last Post by SEQadmin2 07-20-2026, 11:10 AM
UC San Diego Bioengineers Map Gene Function in Human Stem Cells by SEQadmin2 Started by SEQadmin2, 07-13-2026, 10:26 AM	0 responses 38 views 0 reactions	Last Post by SEQadmin2 07-13-2026, 10:26 AM

Unconfigured Ad

split fastq file

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News