tl;dr: The bug is, some large contigs are missing from the file of output contigs 454AllContigs.fna
-----
I believe there is a bug in the current version of gsAssembler, 2.6 (20110517_1502)
I contacted Roche three weeks ago but still have not heard back. Maybe they have everyone making kits
Has anyone else here come across this bug? Any solutions?
I am assembling one plate plus titrations of an 8kb paired end run.
Here are the exact assembly parameters for the assembly, using command line:
runProject -siod -het -urt -cpu 28 -info -m -noace -a 1 -l 2000 -large -scaffold /454/assemblydir
Notice the parameter -a 1 which sets the minimum contig length to 1. I should get ALL contigs, no matter how short (and I do get ones that short)
Here are a few FASTA headers (omitting the sequence lines) from 454AllContigs.fna, notice that some contigs are missing, such as 75
...
>contig00068 length=2566 numreads=160
>contig00071 length=2081 numreads=130
>contig00072 length=776 numreads=27
>contig00073 length=1145 numreads=310
>contig00074 length=187 numreads=41
>contig00076 length=1834 numreads=456
>contig00077 length=1922 numreads=219
>contig00078 length=432 numreads=45
>contig00080 length=128 numreads=17
>contig00081 length=3488 numreads=454
>contig00082 length=2433 numreads=353
>contig00083 length=4226 numreads=351
...
>contig109403 length=1 numreads=7
Here is the corresponding section of 454ContigGraph.txt, note that contig00075 IS there, but out of order
...
68 contig00068 2566 11.2
71 contig00071 2081 11.0
72 contig00072 776 6.0
73 contig00073 1145 43.3
74 contig00074 187 20.3
76 contig00076 1834 45.1
77 contig00077 1922 16.5
78 contig00078 432 12.2
80 contig00080 128 12.4
81 contig00081 3488 23.4
82 contig00082 2433 26.1
75 contig00075 187 22.4
79 contig00079 18 7.3
83 contig00083 4226 15.8
...
Later on in that same file is the connection information, here is a summary
$ bb.454contiginfo --in=../assembly --contig=75 --out=-
>contig75
Length 187
Average Coverage 22.4
Edge 5' Connects to contig 73 3' with 28 reads
Edge 3' Connects to contig 76 5' with 25 reads
28 reads flow from 5' end of contig75 and terminate in contig 73
25 reads flow from 3' end of contig75 and terminate in contig 76
2 paired end reads flow from 5' end of contig75 and terminate in contig 105881 after passing through 7605.0 b.p. in other contig(s)
No paired end reads flow from 3' end of contig75
I want that contig! It goes between 73 and 76. Where is it?
I tried without the -scaffold parameter, contig numbers change, but there are still missing contigs.
-----
I believe there is a bug in the current version of gsAssembler, 2.6 (20110517_1502)
I contacted Roche three weeks ago but still have not heard back. Maybe they have everyone making kits
Has anyone else here come across this bug? Any solutions?
I am assembling one plate plus titrations of an 8kb paired end run.
Here are the exact assembly parameters for the assembly, using command line:
runProject -siod -het -urt -cpu 28 -info -m -noace -a 1 -l 2000 -large -scaffold /454/assemblydir
Notice the parameter -a 1 which sets the minimum contig length to 1. I should get ALL contigs, no matter how short (and I do get ones that short)
Here are a few FASTA headers (omitting the sequence lines) from 454AllContigs.fna, notice that some contigs are missing, such as 75
...
>contig00068 length=2566 numreads=160
>contig00071 length=2081 numreads=130
>contig00072 length=776 numreads=27
>contig00073 length=1145 numreads=310
>contig00074 length=187 numreads=41
>contig00076 length=1834 numreads=456
>contig00077 length=1922 numreads=219
>contig00078 length=432 numreads=45
>contig00080 length=128 numreads=17
>contig00081 length=3488 numreads=454
>contig00082 length=2433 numreads=353
>contig00083 length=4226 numreads=351
...
>contig109403 length=1 numreads=7
Here is the corresponding section of 454ContigGraph.txt, note that contig00075 IS there, but out of order
...
68 contig00068 2566 11.2
71 contig00071 2081 11.0
72 contig00072 776 6.0
73 contig00073 1145 43.3
74 contig00074 187 20.3
76 contig00076 1834 45.1
77 contig00077 1922 16.5
78 contig00078 432 12.2
80 contig00080 128 12.4
81 contig00081 3488 23.4
82 contig00082 2433 26.1
75 contig00075 187 22.4
79 contig00079 18 7.3
83 contig00083 4226 15.8
...
Later on in that same file is the connection information, here is a summary
$ bb.454contiginfo --in=../assembly --contig=75 --out=-
>contig75
Length 187
Average Coverage 22.4
Edge 5' Connects to contig 73 3' with 28 reads
Edge 3' Connects to contig 76 5' with 25 reads
28 reads flow from 5' end of contig75 and terminate in contig 73
25 reads flow from 3' end of contig75 and terminate in contig 76
2 paired end reads flow from 5' end of contig75 and terminate in contig 105881 after passing through 7605.0 b.p. in other contig(s)
No paired end reads flow from 3' end of contig75
I want that contig! It goes between 73 and 76. Where is it?
I tried without the -scaffold parameter, contig numbers change, but there are still missing contigs.
Comment