Originally posted by MattB
View Post
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
-
Thanks, I had actually planned to use the Galaxy option but our local installation is undergoing some maintenance at the moment hence I was after a simple script alternative.
The Perl script that Wallysb01 pointed me to seems to work nicely in any case!
Matt
Comment
-
Hi, maubp
Thank you for your python code. Just a question, if the sequences order in the 2 input fastq files are randomized, then the sequences order in the output two paired fastq files are not the same. The two output fastq files are indeed paired files, just the sequence orders are not the same. Can you please update your code to make the output paired fastq files with the same sequence order if the input fastq files are randomized? Thank you.
Originally posted by maubp View PostAlternatively, this second version should need much less memory but produces three files forward/reverse/orphaned, so you'd need to interleave the forward/reverse files afterwards. This makes three passes though the files and just keeps a list of IDs in memory (as a Python set for speed) rather what the first script does which holds a map of IDs to file offsets (as a Python dictionary). The records are output in the order found in the input files, so as long as the input files follow the same sorting, doing the interleaving step is trivial (and easy enough to add on in Python).
Code:from Bio.SeqIO.QualityIO import FastqGeneralIterator #Biopython 1.51 or later ########################################################## # # Change the following settings to suit your needs # input_forward_filename = "SRR001666_1_rnd.fastq" input_reverse_filename = "SRR001666_2_rnd.fastq" output_paired_forward_filename = "out_forward_pairs.fastq" output_paired_reverse_filename = "out_reverse_pairs.fastq" output_orphan_filename = "out_unpaired_orphans.fastq" f_suffix = "/1" r_suffix = "/2" ########################################################## if f_suffix: f_suffix_crop = -len(f_suffix) def f_name(title): """Remove the suffix from a forward read name.""" name = title.split()[0] assert name.endswith(f_suffix), name return name[:f_suffix_crop] else: def f_name(title): return title.split()[0] if r_suffix: r_suffix_crop = -len(r_suffix) def r_name(title): """Remove the suffix from a reverse read name.""" name = title.split()[0] assert name.endswith(r_suffix), name return name[:r_suffix_crop] else: def r_name(title): return title.split()[0] print "Scaning reverse file to build list of names..." reverse_ids = set() paired_ids = set() for title, seq, qual in FastqGeneralIterator(open(input_reverse_filename)): reverse_ids.add(r_name(title)) print "Processing forward file..." forward_handle = open(output_paired_forward_filename, "w") orphan_handle = open(output_orphan_filename, "w") for title, seq, qual in FastqGeneralIterator(open(input_forward_filename)): name = f_name(title) if name in reverse_ids: #Paired paired_ids.add(name) reverse_ids.remove(name) #frees a little memory forward_handle.write("@%s\n%s\n+\n%s\n" % (title, seq, qual)) else: #Orphan orphan_handle.write("@%s\n%s\n+\n%s\n" % (title, seq, qual)) forward_handle.close() del reverse_ids #frees memory, although we won't need more now print "Processing reverse file..." reverse_handle = open(output_paired_reverse_filename, "w") for title, seq, qual in FastqGeneralIterator(open(input_reverse_filename)): name = r_name(title) if name in paired_ids: #Paired reverse_handle.write("@%s\n%s\n+\n%s\n" % (title, seq, qual)) else: #Orphan orphan_handle.write("@%s\n%s\n+\n%s\n" % (title, seq, qual)) orphan_handle.close() reverse_handle.close() print "Done"
Comment
-
Originally posted by cwzkevin View PostHi, maubp
Thank you for your python code. Just a question, if the sequences order in the 2 input fastq files are randomized, then the sequences order in the output two paired fastq files are not the same. The two output fastq files are indeed paired files, just the sequence orders are not the same. Can you please update your code to make the output paired fastq files with the same sequence order if the input fastq files are randomized? Thank you.
One way to do the sorting is to convert your FASTQ file into a tab-separated file, use the Unix sort command, and then turn the data back into FASTQ. The trick here is the Unix sort command sorts at the line level, so we must transform the data to be one line per record.
There is probably an 'off the shelf' solution to this, perhaps using biopieces?
Or go back and find out why your FASTQ files are out of order, and fix that
Comment
-
Originally posted by safina View PostI ran the script but its giving me the error:
Sequence and quality captions differ.
Can anyone help me with this error for the script posted above.???
It sounds like somehow your FASTQ files have been messed up
Comment
-
Im sending you some of my fastq file reads. pls see.
==> forward_sequences.fastq <==
@SRR1562087.10.1/1
GAGCTAGATCAGCACCATATATTACACGATGATCAGCTGTAACATTTACCTGCATCTGGTTCTTCATTCCTATCCGACCATCCTTGG
+
JJJJJJIIJJJJJJJJIJJJJJJJJJJJJJJJIJJJJJJJGIIJJJJIJJJJJJJJJIJJJJDHIHHHHHHHFDFFDDDDDDDDD>C
@SRR1562087.11.1/1
AGGTTGACTATGGTCCAGGCCATGCCAGGAGAGCAACCGAAAACAGAGAGAACGGTAAGCCAGGAGAAGAACAGTATGAGTATATAG
+
IJJGHIJIIIFIBHHGAFHGGIHJIJGJEGIGGGHGIJJJJHHGFEFEDACEEDDBDBCCCDDDDDDBDDDCDDCADDDCCCDDDDD
@SRR1562087.15.1/1
TAACATCCACAATCTCCTTCTACCCAAGAAGTCTGGAACTTCAGCATCAAAGGCTGGTGATGACGACAACTAATCCATTTACTGAAT
==> reverse_sequences.fastq <==
@SRR1562087.7.2/2
CCTGTAGATATACGTACTGCCAAAGGGTAGATAGTTGCCCATCTCAGAAAACACAACTTCAACAGCCAAGATTAATATCCATGTGAT
+
IJJJGGJBHIJJGHHHIIHJJGJGJIIDFHIJIJJJGHJJJJJJJIJGIGH@FHJIJIHIIIHHH=BDFFAEECCEEFDEDDCDCA>
@SRR1562087.9.2/2
GTAATCCAAATAAGGTATACTCACTCATCGGAGGATTTTGTGCTTCCCCTGTGAATTTCCACGCTAAGGATGGCTCCGGCTATAAAT
+
JIJIIJJJGGIIJIBC@FH@HHJGIJGCHGIEGIFHDFHJIJIJIHHIIIIJGGHHHHHCDDFDDDBDDDDDDDCDBDDBD@CDCEE
@SRR1562087.11.2/2
GAAACACTGATTGGTTCACGTATCCAGGTGTATGGACCACCTATATACTCATACTGTTCTTCTCCTGGCTTACCGTTCTCTCTGTTT
Comment
-
Your FASTQ files look fine from those snippets. Which script exactly are you using, what was the command line you used, and most importantly, the exact error message?
Update: Hang on, the names don't match, e.g. SRR1562087.10.1/1 vs SRR1562087.7.2/2Last edited by maubp; 04-01-2015, 12:20 AM.
Comment
-
Originally posted by MattB View PostThanks for quick reply. Actually, I'm looking for something even simpler, where I just have one interleaved fastq file that I want to split into separate forward and reverse. Probably should have posted in a separate thread since I'm not so worried about trimming in this case.
I'm sure it is some very simple python that I could manage in the end, such a script is probably already out there.
Code:curl -sL git.io/pairfq_lite | perl - splitpairs -i interl.fq.gz -f forward.fq -r reverse.fq
Code:curl -sL git.io/pairfq_lite | perl - joinpairs -f forward.fq -r reverse.fq -o interl.fq
Originally posted by Daniel1977 View PostHi,
could anyone please modify the script so that it works with fasta files instead of fastq files?
many many thanks!
Comment
-
Hi. Im not getting any eroor infact the pairfq program putting all my reads in single.fq where as 1-paired.fq and 2_paired.fq remains empty. the link to the script is below:
the cammand was:
$ pairfq makepairs -f s_1_1_trimmed.fq \
-r s_1_2_trimmed.fq \
-fp s_1_1_trimmed_p.fq \
-rp s_1_2_trimmed_p.fq \
-fs s_1_1_trimmed_s.fq \
-rs s_1_2_trimmed_s.fq \
--index
Comment
-
Hi. Im not getting any eroor infact the pairfq program putting all my reads in single.fq where as 1-paired.fq and 2_paired.fq remains empty. the link to the script is below:
the cammand was:
$ pairfq makepairs -f s_1_1_trimmed.fq \
-r s_1_2_trimmed.fq \
-fp s_1_1_trimmed_p.fq \
-rp s_1_2_trimmed_p.fq \
-fs s_1_1_trimmed_s.fq \
-rs s_1_2_trimmed_s.fq \
--index
Comment
-
Originally posted by maubp View PostYour FASTQ files look fine from those snippets. Which script exactly are you using, what was the command line you used, and most importantly, the exact error message?
the cammand was:
$ pairfq makepairs -f s_1_1_trimmed.fq \
-r s_1_2_trimmed.fq \
-fp s_1_1_trimmed_p.fq \
-rp s_1_2_trimmed_p.fq \
-fs s_1_1_trimmed_s.fq \
-rs s_1_2_trimmed_s.fq \
--index
Comment
Latest Articles
Collapse
-
by seqadmin
The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...-
Channel: Articles
05-06-2024, 07:48 AM -
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 05-14-2024, 07:03 AM
|
0 responses
15 views
0 likes
|
Last Post
by seqadmin
05-14-2024, 07:03 AM
|
||
Started by seqadmin, 05-10-2024, 06:35 AM
|
0 responses
37 views
0 likes
|
Last Post
by seqadmin
05-10-2024, 06:35 AM
|
||
Started by seqadmin, 05-09-2024, 02:46 PM
|
0 responses
46 views
0 likes
|
Last Post
by seqadmin
05-09-2024, 02:46 PM
|
||
Started by seqadmin, 05-07-2024, 06:57 AM
|
0 responses
39 views
0 likes
|
Last Post
by seqadmin
05-07-2024, 06:57 AM
|
Comment