Hello all,
I have some paired end MiSeq data that I have cleaned/trimmed. However the software I used to clean everything up requires that the paired end data files be concatenated into one fastq with all reads and their respective mates, and now I need to split them back into two fastq's each containing one read from a paired end set (R1 and R2).
My fastq file with all paired end reads follows a typical fastq format, example from "head" of original fastq file:
@M00763:6:000000000-A1U80:1:1101:12620:1732 1:N:0:1
TTATACTC
+
@A@AA@A@
@M00763:6:000000000-A1U80:1:1101:12620:1732 2:N:0:1
T
+
E
Where the bolded "1" or "2" indicate which member of the pair the read is, so I am trying to get all 1's into a separate file and all 2's into a separate file. I have a bit of perl code that a lab mate passed along to me that is supposed to split the reads in this way, and it runs and creates two separate files, but it is not splitting the data correctly based on the 1 and 2 in each header and I get a mixture in each file the program creates. I was wondering if anyone had any idea how to fix this? The code is below.
Thanks so much for your help,
~Ana
I have some paired end MiSeq data that I have cleaned/trimmed. However the software I used to clean everything up requires that the paired end data files be concatenated into one fastq with all reads and their respective mates, and now I need to split them back into two fastq's each containing one read from a paired end set (R1 and R2).
My fastq file with all paired end reads follows a typical fastq format, example from "head" of original fastq file:
@M00763:6:000000000-A1U80:1:1101:12620:1732 1:N:0:1
TTATACTC
+
@A@AA@A@
@M00763:6:000000000-A1U80:1:1101:12620:1732 2:N:0:1
T
+
E
Where the bolded "1" or "2" indicate which member of the pair the read is, so I am trying to get all 1's into a separate file and all 2's into a separate file. I have a bit of perl code that a lab mate passed along to me that is supposed to split the reads in this way, and it runs and creates two separate files, but it is not splitting the data correctly based on the 1 and 2 in each header and I get a mixture in each file the program creates. I was wondering if anyone had any idea how to fix this? The code is below.
Code:
use strict; use warnings; my $readdata = $ARGV[0]; open(FILE, "<$readdata") || die "cannot open $readdata\n"; open(OUT1, ">$readdata\_1") || die "cannot open $readdata\_1\n"; open(OUT2, ">$readdata\_2") || die "cannot open $readdata\_2\n"; while(<FILE>){ chomp; print OUT1 "$_\/1\n"; print OUT2 "$_\/2\n"; my $newline = <FILE>; chomp($newline); print OUT1 substr($newline, 0, length($newline)/2)."\n"; print OUT2 substr($newline, length($newline)/2, length($newline)/2)."\n"; $newline = <FILE>; chomp($newline); print OUT1 "$newline\/1\/n"; print OUT2 "$newline\/2\/n"; $newline = <FILE>; chomp($newline); print OUT1 substr($newline, 0, length($newline)/2)."\n"; print OUT2 substr($newline, length($newline)/2, length($newline)/2)."\n"; } close(FILE)
~Ana
Comment