Originally posted by kmcarr
View Post
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
-
Apologies about the delayed response.
Originally posted by BaCh View PostI would expect sff2fastq to work exactly like sff_extract: by using the trim information in the reads within the SFF. But then again I might be totally wrong.
B.
sff2fastq is designed to have similar functionality as the 454 tools (like sffinfo) that is produced by 454/Roche. sffinfo outputs trimmed reads by default.
The '-n' option of sff2fastq (similar to sffinfo) bypasses the trim information encoded in the within sff file and just displays the full raw read data directly.
To view more information about the original trimming information encoded within the sff file please look at the Data Analysis Software Manual produced by 454. One version of it is available by the following link:
Some trimming occurs in the signal processing step of the GS Run Processor application that performs the original base calling from the raw images acquired from the 454 instrument. It trims read ends for low quality and primer sequence (see sections 3.2 and 3.2.2 in the above manual for the details about this process).
The format of the trim information that is encoded within the sff file is described in section 13.3.8.2 of the above manual as well
Does this clarify your question about sff2fastq?
Leave a comment:
-
Originally posted by nt2010 View Post∘ sff_extract took > 270sec, output fasta and qual in separate files, quals in number not ASCII
∘ sff2fastq took 50 sec
sff2fastq is in C, so a 5 to 1 ratio in runtime is not too bad. Also, be careful with paired-end reads if you have them: sff_extract has a pipeline to get them out for you as one would expect them, sequences from sff2fastq you will need to post-process (i.e. split at the right place) yourself.
Originally posted by nt2010 View PostA question to its author: what are the criteria to trim reads? Thanks.
B.
Leave a comment:
-
I need to convert bunch of sffs to fastq. I did a quick experiment to compare sff2fastq and sff_extract
∘ picked a random sff file from my data set: size 2.2G, 662933 reads (after conversion)
∘ sff_extract took > 270sec, output fasta and qual in separate files, quals in number not ASCII
∘ sff2fastq took 50 sec
∘ sff2fastq output trimmed reads by default. There is option to output untrimmed reads. Trimmed reads about half of untrimmed reads in length.
∘ sff_extract output untrimmed reads by default, which match exactly the output of sff2fastq.
I think i'm going to use sff2fastq. A question to its author: what are the criteria to trim reads? Thanks.
Question to sff2
Leave a comment:
-
Originally posted by idas View PostI have recently release a program called 'sff2fastq' ... Any feedback about the program would be appreciated. Bug reports are very much welcomed, although I can't guarantee when they will be addressed.
Leave a comment:
-
Originally posted by maubp View PostA future version of Biopython should also let you go directly from SFF to FASTQ (or FASTA, or QUAL, or ...) which will be much simpler. This code is already written and can be tested by the adventurous
Code:from Bio import SeqIO SeqIO.convert("example.sff", "sff", "untrimmed.fastq", "fastq")
Code:from Bio import SeqIO SeqIO.convert("example.sff", "sff-trim", "trimmed.fastq", "fastq")
Leave a comment:
-
sff2fastq
To Whomever That Maybe Interested:
I have recently release a program called 'sff2fastq' onto github that does a direct SFF to FASTQ format conversion. 'sff2fastq' is implemented in the C language and should compile on *NIX type operating systems (Linux, BSD-type, & Mac OS X).
The FASTQ output produced is of the Sanger FASTQ format.
The source code & compilation instructions are available via the following github url:
extract 454 Genome Sequencer reads from a SFF file and convert them into a FASTQ formatted output - indraniel/sff2fastq
If the git version control software is not available on your system please visit the following link for installation instructions:
Access your support options and sign in to your account for GitHub software support and product assistance. Get the help you need from our dedicated support team.
Any feedback about the program would be appreciated. Bug reports are very much welcomed, although I can't guarantee when they will be addressed.
Sincerely,
Indraniel Das
The Genome Center at Washington University
Leave a comment:
-
Seeing as the thread has shifted from SFF to FASTQ, to the easier task of FASTA+QUAL to FASTQ, here is a Biopython solution which will work on Biopython 1.51 or later:
Code:from Bio import SeqIO from Bio.SeqIO.QualityIO import PairedFastaQualIterator handle = open("temp.fastq", "w") #w=write records = PairedFastaQualIterator(open("example.fasta"), open("example.qual")) count = SeqIO.write(records, handle, "fastq") handle.close() print "Converted %i records" % count
A future version of Biopython should also let you go directly from SFF to FASTQ (or FASTA, or QUAL, or ...) which will be much simpler. This code is already written and can be tested by the adventurous
Peter
Leave a comment:
-
Nice catch drio, thanks. One of those really subtle things you don't catch until you work with a different set of files.
Eugeni, sorry I didn't get back to you on this; got really crushed at work. I have uploaded a modified version of the script incorporating drio's fix.Attached FilesLast edited by kmcarr; 10-22-2009, 07:22 PM.
Leave a comment:
-
Originally posted by Eugeni View PostHi, kmcarr
Thanks for you help, the script has been worked wery well, has generated the fastq file in the sanger format, although in the stdout of the script gives this message:
Argument "" isn't numeric in addition (+) at fastaQual2fastq.pl line 41, <QUAL> chunk 380185.
Dou you know what happens, if it is important?
Thanks a lot
them:
--- fastaQual2fastaq.pl.orig 2009-10-22 22:05:24.000000000 -0500
+++ fastaQual2fastaq.pl 2009-10-22 22:04:54.000000000 -0500
@@ -33,6 +33,7 @@
chomp $qrecord;
my ($qdef, @qualLines) = split /\n/, $qrecord;
my $qualString = join ' ', @qualLines;
+ $qualString =~ s/\s+/ /g;
my @quals = split / /, $qualString;
print FASTQ "@","$qdef\n";
print FASTQ "$seqs{$qdef}\n";
Leave a comment:
-
Just a guess, but you could check your line endings (DOS/Windows versus Unix).
Leave a comment:
-
Originally posted by Eugeni View PostHi, kmcarr
Thanks for you help, the script has been worked wery well, has generated the fastq file in the sanger format, although in the stdout of the script gives this message:
Argument "" isn't numeric in addition (+) at fastaQual2fastq.pl line 41, <QUAL> chunk 380185.
Dou you know what happens, if it is important?
Thanks a lot
Leave a comment:
-
Originally posted by kmcarr View PostHere is a perl script to convert FASTA + QUAL files to FASTQ. You would need to first generate the FASTA and QUAL files from the SFF file using a tool like sffinfo from Roche or sff_extract.
Code:#!/usr/bin/perl use warnings; use strict; use File::Basename; my $inFasta = $ARGV[0]; my $baseName = basename($inFasta, qw/.fasta .fna/); my $inQual = $baseName . ".qual"; my $outFastq = $baseName . ".fastq"; my %seqs; $/ = ">"; open (FASTA, "<$inFasta"); my $junk = (<FASTA>); while (my $frecord = <FASTA>) { chomp $frecord; my ($fdef, @seqLines) = split /\n/, $frecord; my $seq = join '', @seqLines; $seqs{$fdef} = $seq; } close FASTA; open (QUAL, "<$inQual"); $junk = <QUAL>; open (FASTQ, ">$outFastq"); while (my $qrecord = <QUAL>) { chomp $qrecord; my ($qdef, @qualLines) = split /\n/, $qrecord; my $qualString = join ' ', @qualLines; my @quals = split / /, $qualString; print FASTQ "@","$qdef\n"; print FASTQ "$seqs{$qdef}\n"; print FASTQ "+\n"; foreach my $qual (@quals) { print FASTQ chr($qual + 33); } print FASTQ "\n"; } close QUAL; close FASTQ;
- Run the program just pass it the name of the fasta sequence file, e.g.
Code:%> fastaQual2fastq.pl foo.fasta
- The fasta filename must end in either .fasta or .fna
- The quality filename must have the same basename as the fasta file and end with .qual. For example, if your sequence file is "foo.fna" then the quality file must be named "foo.qual".
Thanks for you help, the script has been worked wery well, has generated the fastq file in the sanger format, although in the stdout of the script gives this message:
Argument "" isn't numeric in addition (+) at fastaQual2fastq.pl line 41, <QUAL> chunk 380185.
Dou you know what happens, if it is important?
Thanks a lot
Leave a comment:
-
Originally posted by maubp View PostInteresting - I wonder why they do that, and if it would be easy to fix their pipeline...Last edited by kmcarr; 10-07-2009, 09:27 AM. Reason: Removed message text after discovering the cln2qual is perl, not binary.
Leave a comment:
Latest Articles
Collapse
-
by seqadmin
Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...-
Channel: Articles
03-22-2024, 06:39 AM -
-
by seqadmin
The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.
Avian Conservation
Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...-
Channel: Articles
03-08-2024, 10:41 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 03-27-2024, 06:37 PM
|
0 responses
13 views
0 likes
|
Last Post
by seqadmin
03-27-2024, 06:37 PM
|
||
Started by seqadmin, 03-27-2024, 06:07 PM
|
0 responses
12 views
0 likes
|
Last Post
by seqadmin
03-27-2024, 06:07 PM
|
||
Started by seqadmin, 03-22-2024, 10:03 AM
|
0 responses
53 views
0 likes
|
Last Post
by seqadmin
03-22-2024, 10:03 AM
|
||
Started by seqadmin, 03-21-2024, 07:32 AM
|
0 responses
69 views
0 likes
|
Last Post
by seqadmin
03-21-2024, 07:32 AM
|
Leave a comment: