Seqanswers Leaderboard Ad

**maubp** · 12-16-2009, 07:06 AM

Just to clarify - you want to interleave the two paired FASTQ files (F1,F2,F3,... and R1,R2,R3,...) into one file where they alternate (F1,R1,F2,R2,F3,R3,...) but also remove the "/1" and "/2" suffices on the forward and reverse read identifiers to make them the same?

I'd write a script to do this in your language of choice (e.g. Perl perhaps with BioPerl, or Python with Biopython, etc).

**maubp** · 12-16-2009, 07:15 AM

If you like Python, you could try something like this (untested):

HTML Code:

#This Python script requires Biopython 1.51 or later
from Bio import SeqIO
import itertools

#Setup variables (could parse command line args instead)
file_f = "s_1_1_sequence.txt"
file_r = "s_1_2_sequence.txt"
file_out = "interleaved.fastq"

def interleave(iter1, iter2) :
    for (forward, reverse) in itertools.izip(iter1,iter2):
        assert forward.id.endswith("/1")
        assert reverse.id.endswith("/2")
        #Remove the /1 and /2 from the identifiers,
        forward.id = forward.id[:-2]
        reverse.id = reverse.id[:-2]
        assert forward.id == reverse.id
        yield forward
        yield reverse

records_f = SeqIO.parse(open(file_f,"rU"), "fastq-illumina")
records_r = SeqIO.parse(open(file_r,"rU"), "fastq-illumina")

handle = open(file_out, "w")
count = SeqIO.write(interleave(records_f, records_r), handle, "fastq-sanger")
handle.close()
print "%i records written to %s" % (count, file_out)

Based on the Biopython example here:

OBF » Interleaving paired FASTQ files with Biopython » Interleaving paired FASTQ files with Biopython

http://news.open-bio.org/news/2009/12/interleaving-paired-fastq-files-with-biopython/

Open Bioinformatics Foundation Homepage

Note - I'm assuming you have Illumina 1.3+ FASTQ files, not Solexa style FASTQ files. See http://en.wikipedia.org/wiki/FASTQ_format and http://nar.oxfordjournals.org/cgi/co...stract/gkp1137 or for search the forum for details.

**lindseyjane** · 12-16-2009, 07:22 AM

ok, thanks, I am learning to use perl, not familiar with python, so I will write a script to do this in perl.

Just to clarify, do you know whether I need to alter the reverse read to make the sequence on the same strand as the forward read?

eg if I have in the sequence.txt file
Forward read AAAATTT
Reverse read CCGGGG

I need to interleave them as:
AAAATTT
CCCCGG
is this correct?

**maubp** · 12-16-2009, 07:45 AM

Originally posted by lindseyjane View Post

ok, thanks, I am learning to use perl, not familiar with python, so I will write a script to do this in perl.

Fair enough - you may find BioPerl helpful, it has built in FASTQ support.

Originally posted by lindseyjane View Post

Just to clarify, do you know whether I need to alter the reverse read to make the sequence on the same strand as the forward read?

Maybe. I haven't used BFAST so don't know. It shouldn't be too hard to do it if required. Again, BioPerl will have built in reverse complement code, but don't forget to reverse the qualities too.

**lindseyjane** · 12-16-2009, 08:21 AM

Thanks for all your help and rapid responses to my queries

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

BFAST input format for paired end reads

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News