Seqanswers Leaderboard Ad

**simonandrews** · 04-06-2011, 11:42 PM

Not tested, and assumes no blank lines in your files, but this should work:

Code:

#!/usr/bin/perl
use warnings;
use strict;

# Merge together two FastQ files
# Usage is merge_fastq.pl [read1 file] [read2 file] [outfile]


my ($in1,$in2,$out) = @ARGV;

die "Usage is merge_fastq.pl [read1 file] [read2 file] [outfile]\n" unless ($out);

open (IN1,$in1) or die "Can't open $in1: $!";
open (IN2,$in2) or die "Can't open $in2: $!";
open (OUT,'>',$out) or die "Can't write to $out: $!";

my $count;
while (1) {
  ++$count;
  my $line1 = <IN1>;
  my $line2 = <IN2>;

  last unless (defined $line1 and defined $line2);

  if ($count % 2) {
    print OUT $line1;
  }
  else {
    chomp $line1;
    print OUT $line1,$line2;
  }

}

close OUT or die "Can't write to $out: $!";

**Jenzo** · 04-06-2011, 11:53 PM

Shorty provides a very fast script in perl to merge fastq-sequences in the following way:

@read_id1/1
...
+
...
@read_id2/2
...
+
...
and so on..

Code:

#!/usr/bin/perl

$filenameA = $ARGV[0];
$filenameB = $ARGV[1];
$filenameOut = $ARGV[2];

open $FILEA, "< $filenameA";
open $FILEB, "< $filenameB";

open $OUTFILE, "> $filenameOut";

while(<$FILEA>) {
	print $OUTFILE $_;
	$_ = <$FILEA>;
	print $OUTFILE $_; 
	$_ = <$FILEA>;
	print $OUTFILE $_; 
	$_ = <$FILEA>;
	print $OUTFILE $_; 

	$_ = <$FILEB>;
	print $OUTFILE $_; 
	$_ = <$FILEB>;
	print $OUTFILE $_;
	$_ = <$FILEB>;
	print $OUTFILE $_;
	$_ = <$FILEB>;
	print $OUTFILE $_;
}

Note: It assumes that both files are of the same size and sequences are in the same order..
Usage should be: merge.pl file1.fastq file2.fastq out.fastq

**simonandrews** · 04-07-2011, 12:01 AM

The two posted scripts do slightly different things. The one I posted concatenates the sequences and qualities together so if you started with a 2 x 40bp run then you'd end up with a file of 80bp reads.

The second script simply places the reads from the two files one after another in the combined file, so you'd end up with a 40bp file which was twice as long. It's roughly equivalent to doing:

Code:

cat [file1] [file2] > [outfile]

except that it puts the equivalent reads next to each other in the final file.

I guess which one you use depends on how you wanted to combine the files....

**Jenzo** · 04-07-2011, 12:08 AM

hehe, thats right ;-) thanks for pointing it out!

**newbietonextgen** · 04-07-2011, 06:29 AM

Thanks for the script Andrew. I tried the script out and it seems that the script joins the files but not in the Paired end fashion.

Original file (1)

@HWI-EAS216_0001:1:1:1079:15982#0/1
TATGCTCTGCCTTGGCTGTGTCATCGTGTTGATGCCAACTGACACGAAACTTCTAGGCTGATTCATCCTAAGTAT
+
CCCCCCCCCCCCCCCC@BCCCCCCCCCCCCC@@CCC>2?>>>A?C@CC7@@@@A<@@@A@@@?C@=CC#######
@HWI-EAS216_0001:1:1:1079:9356#0/1
CGCTCAAGAGATGGGCTTTGGGTGCGGAATGGGGATTTGGGTTGTGACCCAATACAGCGGTAGTAGCGTGCAGCA
+
BBB=>B=BCCCCCCCCCACCCCBCBCC@BBCCCABC@CCCB@CCA@C?B9C7?@:<@##################

Original file (2)

@HWI-EAS216_0001:1:1:1079:15982#0/2
GTTTCTGAAGAGGCAGGCAGCAGAATTTGGTTTATTGAGTCTGTGTTGAAAAGAAACCACTTACGCATTATACTT
+
BCCCCBCCCCCCB7CCCC;9*;8:>?BB<CC<C@A?A5C<C@?C=CC4;>A########################
@HWI-EAS216_0001:1:1:1079:9356#0/2
GCAGGATTGCCATTCCCATCAGCTTTCTGCTGCACGCTACTACCGCTGTATTGGGTCACAACCCAAATCCCCATT
+
CCCCCBBCCCCCACCCCCCCCCC?CCCCBCCCCCCCCCCCBCCCC@ABCCCCCC<C;C>CCCBCCCBC>CCBC>>

The script from Andrew, does this (putting all the 0/1 reads first)

@HWI-EAS216_0001:1:1:1079:15982#0/1
TATGCTCTGCCTTGGCTGTGTCATCGTGTTGATGCCAACTGACACGAAACTTCTAGGCTGATTCATCCTAAGTATGTTTCTGAAGAGGCAGGCAGCAGAATTTGGTTTATTGAGTCTGTGTTGAAAAGAAACCACTTACGCATTATACTT
+
CCCCCCCCCCCCCCCC@BCCCCCCCCCCCCC@@CCC>2?>>>A?C@CC7@@@@A<@@@A@@@?C@=CC#######BCCCCBCCCCCCB7CCCC;9*;8:>?BB<CC<C@A?A5C<C@?C=CC4;>A########################
@HWI-EAS216_0001:1:1:1079:9356#0/1
CGCTCAAGAGATGGGCTTTGGGTGCGGAATGGGGATTTGGGTTGTGACCCAATACAGCGGTAGTAGCGTGCAGCAGCAGGATTGCCATTCCCATCAGCTTTCTGCTGCACGCTACTACCGCTGTATTGGGTCACAACCCAAATCCCCATT
+
BBB=>B=BCCCCCCCCCACCCCBCBCC@BBCCCABC@CCCB@CCA@C?B9C7?@:<@##################CCCCCBBCCCCCACCCCCCCCCC?CCCCBCCCCCCCCCCCBCCCC@ABCCCCCC<C;C>CCCBCCCBC>CCBC>>

What i want is :

@HWI-EAS216_0001:1:1:1079:15982#0/1
TATGCTCTGCCTTGGCTGTGTCATCGTGTTGATGCCAACTGACACGAAACTTCTAGGCTGATTCATCCTAAGTAT
+
CCCCCCCCCCCCCCCC@BCCCCCCCCCCCCC@@CCC>2?>>>A?C@CC7@@@@A<@@@A@@@?C@=CC#######
@HWI-EAS216_0001:1:1:1079:15982#0/2
GTTTCTGAAGAGGCAGGCAGCAGAATTTGGTTTATTGAGTCTGTGTTGAAAAGAAACCACTTACGCATTATACTT
+
BCCCCBCCCCCCB7CCCC;9*;8:>?BB<CC<C@A?A5C<C@?C=CC4;>A########################

Hope this helps. I know its possible

Thanks for all the help.

**simonandrews** · 04-07-2011, 06:36 AM

Originally posted by newbietonextgen View Post

What i want is :

@HWI-EAS216_0001:1:1:1079:15982#0/1
TATGCTCTGCCTTGGCTGTGTCATCGTGTTGAT
+
CCCCCCCCCCCCCCCC@BCCCCCCCCCCCCC@
@HWI-EAS216_0001:1:1:1079:15982#0/2
GTTTCTGAAGAGGCAGGCAGCAGAATTTGGTTT
+
BCCCCBCCCCCCB7CCCC;9*;8:>?BB<CC<C@

That's what Jenzo's script would produce isn't it?

**newbietonextgen** · 04-07-2011, 06:56 AM

I think, but i did not try. I used fastq_merge.pl, your script.

**simonandrews** · 04-07-2011, 07:00 AM

I explained in the second note I added that the two scripts posted did different things, and it depended on how you wanted to merge your files. Just out of interest which pipeline are you using which requires the paired files to be placed one after another?

**newbietonextgen** · 04-07-2011, 07:53 AM

Ha, Sorry my mistake. I figured it out. Thanks. SHRiMP requires that paired reads are put one behind the other.

**syfo** · 01-07-2013, 05:43 AM

Thanks guys. Scarpa too requires a merged fastq with "interleaved" reads (so that reads from the same pair follow each other) and Jenzo's script does that.

**Oyster_lab** · 10-26-2014, 05:25 PM

And what about combining PE reads from multiple runs? I have two runs from the same library and I would like to combine the PE reads into the same file (one file for R1 and one file for R2), keeping the reads separation as per Jenzo's script. Would my code look like something like this?

#!/usr/bin/perl

$filename_R1_Run1 = $ARGV[0];
$filename_R1_Run2 = $ARGV[1];
$filename_R1_Runs1And2 = $ARGV[2];

open $FILE_R1_Run1, "< $filename_R1_Run1";
open $FILE_R1_Run2, "< $filename_R1_Run2";

open $FILE_R1_Runs1And2, "> $filename_R1_Runs1And2";

while(<$FILE_R1_Run1>) {
print $FILE_R1_Runs1And2 $_;
$_ = <$FILE_R1_Run1>;
print $FILE_R1_Runs1And2 $_;
$_ = <$FILE_R1_Run1>;
print $FILE_R1_Runs1And2 $_;
$_ = <$FILE_R1_Run1>;
print $FILE_R1_Runs1And2 $_;

$_ = <$FILE_R1_Run2>;
print $FILE_R1_Runs1And2 $_;
$_ = <$FILE_R1_Run2>;
print $FILE_R1_Runs1And2 $_;
$_ = <$FILE_R1_Run2>;
print $FILE_R1_Runs1And2 $_;
$_ = <$FILE_R1_Run2>;
print $FILE_R1_Runs1And2 $_;
}

Topics	Statistics	Last Post
Study Highlights Challenges in Cellular Reprogramming for Regenerative Medicine by seqadmin Started by seqadmin, Today, 06:25 AM	0 responses 13 views 0 likes	Last Post by seqadmin Today, 06:25 AM
New DNA Modification Discovered as Key to Gene Activation in Early Development by seqadmin Started by seqadmin, Yesterday, 01:02 PM	0 responses 12 views 0 likes	Last Post by seqadmin Yesterday, 01:02 PM
Wastewater Analysis Unlocks New Method for Identifying Public Health Threats by seqadmin Started by seqadmin, 09-18-2024, 06:39 AM	0 responses 14 views 0 likes	Last Post by seqadmin 09-18-2024, 06:39 AM
Molecular Markers Shared Across Dementias by seqadmin Started by seqadmin, 09-11-2024, 02:44 PM	0 responses 14 views 0 likes	Last Post by seqadmin 09-11-2024, 02:44 PM

Seqanswers Leaderboard Ad

Announcement

Illumina Paired End Merge script

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News