Hi A couple of questions about genome assembly of 454 using gsAssembler.
Some background: We are sequencing and assembling a eukaryotic genome using 454 titanium data. First we ran 7 slides of shotgun sequencing, and later one slide of paired end (with 3 kb inserts). We have made several assemblies:
Assembly A: Here we just used the shotgun data. All parameters were set to default, except that we used "-large". We got 114 Mbp in total. 7600 contigs with L50=52 Kb.
Assembly B: This time we used both the shotgun and PE data, and assembled everything together. The same parameters as in A were used. Now we got 110 Mbp in total. 9000 contigs with L50=35 Kb (and 1600 scaffolds with L50=160 Mbp).
Assembly C: Here we first made an assembly with just the shotgun data and then added the PE data and updated the assembly. This gave 114 Mbp in total. 7700 contigs with L50=53 Kb (and 1600 scaffolds with L50=160 Mbp).
Now the questions:
1. It's strange that we get less assembled sequence when we add the PE reads (assembly B vs assembly A). Does anybody have any possible explanation for this?
2. I would rather use assembly C than assembly B for the subsequent analysis since the contigs are longer. But I don't know which assembly to trust. Is there any way of knowing which of the two assemblies is more "correct"? (I'm thinking about computational things that don't involve any more sequencing...)
long post, hope somebody has some input...
thanks
/Jakub
Some background: We are sequencing and assembling a eukaryotic genome using 454 titanium data. First we ran 7 slides of shotgun sequencing, and later one slide of paired end (with 3 kb inserts). We have made several assemblies:
Assembly A: Here we just used the shotgun data. All parameters were set to default, except that we used "-large". We got 114 Mbp in total. 7600 contigs with L50=52 Kb.
Assembly B: This time we used both the shotgun and PE data, and assembled everything together. The same parameters as in A were used. Now we got 110 Mbp in total. 9000 contigs with L50=35 Kb (and 1600 scaffolds with L50=160 Mbp).
Assembly C: Here we first made an assembly with just the shotgun data and then added the PE data and updated the assembly. This gave 114 Mbp in total. 7700 contigs with L50=53 Kb (and 1600 scaffolds with L50=160 Mbp).
Now the questions:
1. It's strange that we get less assembled sequence when we add the PE reads (assembly B vs assembly A). Does anybody have any possible explanation for this?
2. I would rather use assembly C than assembly B for the subsequent analysis since the contigs are longer. But I don't know which assembly to trust. Is there any way of knowing which of the two assemblies is more "correct"? (I'm thinking about computational things that don't involve any more sequencing...)
long post, hope somebody has some input...
thanks
/Jakub
Comment