Originally posted by jkbonfield
View Post
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
-
Agreed arguments can be made for both, however a strong argument can be made that we pick ONE and stick with it. The world really doesn't need yet more fastq variants to deal with, yet apparently we already have two variants in the wild.
Does anyone know if "csfastq" is ABI's own format name or just a name we have placed on their data when reformatted? If the former then I think we can just do whatever they do. If not then I'd advise following the lead of the main public data banks (as frankly, good luck trying to get them to change their output formats now).
Leave a comment:
-
Originally posted by jkbonfield View PostIs there an official document describing the csfastq format? The SOLID run outputs I have do not contain any of these files, but I believe it to be ABI's own format?
The only documentation I can find consists of the ZOOM manual, which states (for example):
Code:@SRR015241.1 CLARA_20071207_2_CelmonAmp7797_16bit_26_88_34_F3 length=50 T32322133300002330031001022230020232002203222030231 +SRR015241.1 CLARA_20071207_2_CelmonAmp7797_16bit_26_88_34_F3 length=50 !21(()+%'+%40*.%%**)&%&*&%%%&%%%%%%%%%%%%%%%(+%%%%'
ftp://ftp.ncbi.nlm.nih.gov/sra/stati...15241.fastq.gz
However an old post on the ABYSS users mailing list by Nils gives this example:
Code:@ucla153_20090610_1102N.41:796_1758_1693 g23111112222312301131331111331023122031222222111120 + :46<=985889::;<829462*3<464554-6403128+-+&-.'$$%.# @ucla153_20090610_1102N.62:1159_1411_238 t32200300033221321101031000000332000002013110000000 + 89;669>?6<<;57.:+/#&+%$####$&#&&&#####'#&###$###%#
Arguments can be made for both.
Nils
Leave a comment:
-
Is there an official document describing the csfastq format? The SOLID run outputs I have do not contain any of these files, but I believe it to be ABI's own format?
The only documentation I can find consists of the ZOOM manual, which states (for example):
Code:@SRR015241.1 CLARA_20071207_2_CelmonAmp7797_16bit_26_88_34_F3 length=50 T32322133300002330031001022230020232002203222030231 +SRR015241.1 CLARA_20071207_2_CelmonAmp7797_16bit_26_88_34_F3 length=50 !21(()+%'+%40*.%%**)&%&*&%%%&%%%%%%%%%%%%%%%(+%%%%'
ftp://ftp.ncbi.nlm.nih.gov/sra/stati...15241.fastq.gz
However an old post on the ABYSS users mailing list by Nils gives this example:
Code:@ucla153_20090610_1102N.41:796_1758_1693 g23111112222312301131331111331023122031222222111120 + :46<=985889::;<829462*3<464554-6403128+-+&-.'$$%.# @ucla153_20090610_1102N.62:1159_1411_238 t32200300033221321101031000000332000002013110000000 + 89;669>?6<<;57.:+/#&+%$####$&#&&&#####'#&###$###%#
Leave a comment:
-
Originally posted by nilshomer View PostBFAST does handle "N"s (actually [nN.] for Illumina data). If you could give me an link to the SRA # or the srf file or the fastq file I would be happy to debug to identify the problem.
I believe it was ABI SOLID though which uses dot and so in that context I think it's correct, given that 0123 aren't "normal" sequence characters we're already defining a new character set and so . for amibiguity seems fine.
One thing I would recommend for a SRF2FASTQ program is to output paired end (mate-pair) reads to the same file. Programs like BFAST and Velvet expect that there is only one FASTQ file, with paired end (mate pair) reads occurring successively with the same name. This allows BFAST at least to support triple-end, quad-end, or higher grouping data, which we have generated (it exists!). Having one file per "end" or "mate" is not scalable to such grouping data.
Nils
More intriguing will be to see quite what aligner output we can produce for it though given that SAM only supports two ends currently.
Leave a comment:
-
Originally posted by jkbonfield View PostI've had a report from a user of srf2fastq that bfast cannot read its output. Specifically in the fastq I produce I write out N instead of . for ambiguity code, as N is the original "unknown" symbol with dot being a recent illumina invention (along with many other broken changes to fastq to further muddy the waters).
So my question heres are:
1) Is it correct that bfast cannot handle N and requires .? I haven't tested this myself.
2) Should I "fix" srf2fastq to output . instead via a command line option?
My own inclination to question 2 is simply to say no, fix bfast instead - we don't need to try and promote yet another format variant. However if the community feels it's needed then I'll put it in.
Comments anyone?
James
One thing I would recommend for a SRF2FASTQ program is to output paired end (mate-pair) reads to the same file. Programs like BFAST and Velvet expect that there is only one FASTQ file, with paired end (mate pair) reads occurring successively with the same name. This allows BFAST at least to support triple-end, quad-end, or higher grouping data, which we have generated (it exists!). Having one file per "end" or "mate" is not scalable to such grouping data.
Nils
Leave a comment:
-
Bfast, fastq and Ns
I've had a report from a user of srf2fastq that bfast cannot read its output. Specifically in the fastq I produce I write out N instead of . for ambiguity code, as N is the original "unknown" symbol with dot being a recent illumina invention (along with many other broken changes to fastq to further muddy the waters).
So my question heres are:
1) Is it correct that bfast cannot handle N and requires .? I haven't tested this myself.
2) Should I "fix" srf2fastq to output . instead via a command line option?
My own inclination to question 2 is simply to say no, fix bfast instead - we don't need to try and promote yet another format variant. However if the community feels it's needed then I'll put it in.
Comments anyone?
JamesTags: None
Latest Articles
Collapse
-
by seqadmin
In recent years, precision medicine has become a major focus for researchers and healthcare professionals. This approach offers personalized treatment and wellness plans by utilizing insights from each person's unique biology and lifestyle to deliver more effective care. Its advancement relies on innovative technologies that enable a deeper understanding of individual variability. In a joint documentary with our colleagues at Biocompare, we examined the foundational principles of precision...-
Channel: Articles
01-27-2025, 07:46 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Genetic Mapping of Plasmodium knowlesi Identifies Essential Genes and Drug Resistance Mechanisms
by seqadmin
Started by seqadmin, Yesterday, 09:30 AM
|
0 responses
16 views
0 likes
|
Last Post
by seqadmin
Yesterday, 09:30 AM
|
||
Started by seqadmin, 02-05-2025, 10:34 AM
|
0 responses
28 views
0 likes
|
Last Post
by seqadmin
02-05-2025, 10:34 AM
|
||
Started by seqadmin, 02-03-2025, 09:07 AM
|
0 responses
27 views
0 likes
|
Last Post
by seqadmin
02-03-2025, 09:07 AM
|
||
Started by seqadmin, 01-31-2025, 08:31 AM
|
0 responses
35 views
0 likes
|
Last Post
by seqadmin
01-31-2025, 08:31 AM
|
Leave a comment: