Originally posted by VidJa
View Post
SAM output will generate output for all reads, no matter they are mapped against a contig or not. Bowtie output only contains reads that are mapped. Since SSPACE only contains a fraction of the contigs (only begin and ends of each contig), the number of reads that maps is also low. SSPACE goes through all output lines, thus the higher the number of output lines, the slower the program goes.
Say there are 1M reads, and only 10.000 reads map. Bowtie will generate 10.000 output lines, while SAM produces 1M output lines, making it 100 times slower to read in the output file.
Therefore, I'll make it possible to insert tab-delimited files with information about paired reads in the format;
<read1_tig> <read1_start> <read1_end> <read2_tig> <read2_start> <read2_end>
I will provide a script that can convert SAM output files to a TAB file. This way, all SAM capable aligners can be used.
In addition, multiple TAB files of different libraries can be given, as well as a combination of TAB and normal paired-reads. For example; if you have a paired-end library of 200bp and one with 500bp. For both libraries you map the reads to the contigs, generating two SAM files, which you can convert to .tab file. Both could be given to SSPACE, first SSPACE scaffolds the contigs using the 200bp library. Next, the positions of the contigs are updated by determining their new position within the scaffolds. Then, the 500bp library is used for scaffolding the previous scaffolds generated with the 200bp library.
Still in testing fase though, but the results till now look ok. I get similar results if i input a paired-end fastQ file, or a .tab file.
I'll keep you updated!
Kind regards,
Boetsie
Comment