Seqanswers Leaderboard Ad

**boetsie** · 11-09-2011, 02:37 AM

Hi Lisa,

no problem, good that I could help you and it's all working now! Good luck and feel free to contact me if you have any questions.

Regards,
Boetsie

Originally posted by Lisa0508 View Post

Hi Boetzer,
Thank you so much for your quick reply and patient explanation! It's working now. I just got registered on this forum yesterday. All settings were in a default condition. I'm very sorry I did not check the private message option. Now it's O.K. to recieve private messages. Thank you again!
Regards,
Lisa

**boetsie** · 12-12-2011, 05:33 AM

Hi all,

we have released a new version of both SSPACE Basic and SSPACE Premium. SSPACE Basic is the previous version of SSPACE Premium. The new SSPACE premium contains the following new features:

included the readmapper BWA/BWA-sw
Changed the multithreading of Bowtie/BWA. instead of running the readmapping of the aligner in multithread mode, SSPACE calls the aligner in single-threaded mode with multiple instances. This will preserve the order of the reads for processing and read-tracking, speeding up the process and reducing memory consumption.
Readfiles are split into files with portions of 1 million paired-reads instead of one file. This will speed up the alignment (see previous feature).
During extension, contigs are extended with subsequences (k-mers) of the unmapped reads, instead of the full read. This will increase the coverage for extension, since k-mers have a better overlap with the contigs than full reads.
A file is generated with more detailed information about the extension process.
Included the option -S which makes it able to skip the reading and processing of the paired-read input files.
It is now possible to include .gz files (only if gunzip is installed).
Changed the folder structure.
Changed the format of the final scaffolds
Included some additional statistics in the summary file; GC content, N25/N75, number of gaps and total size of gaps
Added a tool for quality-trimming of paired-reads
Added a tool for estimation of the insert size

In addition, we have been working on a tool named GapFiller for closing gaps within scaffolds using paired-read data. Currently GapFiller is submitted for publication, and the basic sourcecode will be available upon acceptance. At that time we will make sure academic users can apply for a free license. However, before the manuscript is accepted, a pre-release is available at the cost of 250,- euro (applicable to both academic and commercial users).

See our website for more information about SSPACE and GapFiller: http://www.baseclear.com/landingpage...ics-solutions/

Kind regards,
Boetsie

**stevebaeyen** · 12-13-2011, 04:40 AM

k-mers and m parameter

Originally posted by boetsie View Post

During extension, contigs are extended with subsequences (k-mers) of the unmapped reads, instead of the full read. This will increase the coverage for extension, since k-mers have a better overlap with the contigs than full reads.

Dear Boetsie,
does this mean that we should have better results with the -m parameter optimised for k-mer size instead of read length ? How can we know the k-mer size used and how do we best adjust the -m value for example for a 50bp read?
regards,
Steve

**boetsie** · 12-14-2011, 05:46 AM

Hi Steve,

The kmer size used is just the (-m +1)value. -m thus actually means the overlap the kmer should have, and the extra nucleotide is the 'overhang'. The difference between the two;

previous method:

ctg: GTCGATAGATAGATCGTCGATAGTAGTCGA
read:...GATTGATAGATCGTCGATAGTAGTCGAG

The above read will not be used for extension, since it contains a mismatch and thus does not fully overlap with the contig. The new method cuts the read into k-mers;

Say we use a -m of 20, the kmers of the read is;

READ: GATAGATCGTCGATAGTAGTCGAGAT
kmer: GATAGATCGTCGATAGTAGTC
kmer: .ATAGATCGTCGATAGTAGTCG
kmer: ..TAGATCGTCGATAGTAGTCGA
kmer: ...AGATCGTCGATAGTAGTCGAG
kmer: ....GATCGTCGATAGTAGTCGAGA
kmer: .....ATCGTCGATAGTAGTCGAGAT
etc...

if we now extend the contig, the overlapping k-mer is;

ctg: GTCGATAGATAGATCGTCGATAGTAGTCGA
read:..........AGATCGTCGATAGTAGTCGAG

This will thus increase the coverage since it removes the errors, especially for longer reads.

Regards,
Boetsie

Originally posted by stevebaeyen View Post

Dear Boetsie,
does this mean that we should have better results with the -m parameter optimised for k-mer size instead of read length ? How can we know the k-mer size used and how do we best adjust the -m value for example for a 50bp read?
regards,
Steve

**sphil** · 01-09-2012, 06:50 AM

Hey all,

I got a quite strange problem: my contig fasta file looks like:

>22617
GTCTACTTCAGACAAGGAAGACGGTCTACTTCAGATGAGGAAGATGGTCTGCTACAAAGGGAAGACGGTCTGCTTCAGGCCAGGAAGACGGTCTGCTACA
>22619
CGTCTTCCAATTTTGAATCAGACCGTCTTGATTTTGAATTGGACCGTCTCCCCTGGGCGCATCTGCTGGGCCGCTGGGGCTGGAACCGTGGCTCAAAATT
>22621
TTCCTCAGCAACAACATTGATGGTGTCTTTTGTGTACATGTATGAGTAGTCAGTCAAGTAAAGTATGCGCACCTGTCTTTTGGTAAGCCTACGCAGCCTG
>22623
AGGCACTCTGCCCGAGTGGTTAAGGGGTAAGTCTCGAATACATTATTCGACCGTCCATCATGACGGGTTAACTTATAGGCTCTGCCTGCGTCGGTTCAAA

BUT

the programms tells me that:

ERROR: Invalid (-s) contig file /home/dpr..../de_novo_assembly_DNA/SOAPdenovo_39/PseudoAfi_K39.contig.fastasorted.fasta ...Exiting.

So can u tell me why my file should be corrupt?

Any help is kindly appreciated,

best

Phil

**boetsie** · 01-09-2012, 07:02 AM

Originally posted by sphil View Post

Hey all,

I got a quite strange problem: my contig fasta file looks like:

>22617
GTCTACTTCAGACAAGGAAGACGGTCTACTTCAGATGAGGAAGATGGTCTGCTACAAAGGGAAGACGGTCTGCTTCAGGCCAGGAAGACGGTCTGCTACA
>22619
CGTCTTCCAATTTTGAATCAGACCGTCTTGATTTTGAATTGGACCGTCTCCCCTGGGCGCATCTGCTGGGCCGCTGGGGCTGGAACCGTGGCTCAAAATT
>22621
TTCCTCAGCAACAACATTGATGGTGTCTTTTGTGTACATGTATGAGTAGTCAGTCAAGTAAAGTATGCGCACCTGTCTTTTGGTAAGCCTACGCAGCCTG
>22623
AGGCACTCTGCCCGAGTGGTTAAGGGGTAAGTCTCGAATACATTATTCGACCGTCCATCATGACGGGTTAACTTATAGGCTCTGCCTGCGTCGGTTCAAA

BUT

the programms tells me that:

ERROR: Invalid (-s) contig file /home/dpr..../de_novo_assembly_DNA/SOAPdenovo_39/PseudoAfi_K39.contig.fastasorted.fasta ...Exiting.

So can u tell me why my file should be corrupt?

Any help is kindly appreciated,

best

Phil

Hi Phil,

the error has nothing to do with the file format. The line where this error occurs is just checking whether the contig file exists or not. Somehow it does not find your file. Can you check if the file is really at the specified location and that the user rights are correct?

Boetsie

**sphil** · 01-13-2012, 12:25 AM

Hey,

sry for the late answer but I was not in the office last days. I checked the location and it is the right one so maybe i got something wrong in the library file.

here is the line containing my library...

TrueSeqStd /home/dpr/P/PA/SGII_ATCACG_L003_R1.fastq /home/dpr/P/PA/SGII_ATCACG_L003_R2.fastq 50 0.5 FR

maybe there is a fault?

Best,

Phil

got it, thanks for the help

**user1313** · 02-23-2012, 05:54 AM

Dear boetsie,

Is it possible to implement a feature in SSPACE for it to recognize inward-facing reads in a Illumina MP library? This is a serious problem for some library preparations. This feature is present in Ray assembler, for example:

Ray: a NEW MPI-based 100% parallel genome assembler - SEQanswers

http://seqanswers.com/forums/showthread.php?t=4301&page=7

Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

Regards,
Nestor

**boetsie** · 02-24-2012, 03:19 AM

Hi Nestor,

This is already implemented in SSPACE. Basically, Ray does the same as SSPACE by incoorperating a range of allowed reads, for example an insert size of 4000 with 0.25 deviation (range is thus 3000-5000). This will initialy filter out 'paired-end' reads, since these have smaller insert sizes (< 500bp). In addition, SSPACE requires for each library the orientation of the paired-reads. If you specify the orientation <-- -->, --> <-- paired-reads will not be taking into account for scaffolding.

Regards,
Boetsie

Originally posted by user1313 View Post

Dear boetsie,

Is it possible to implement a feature in SSPACE for it to recognize inward-facing reads in a Illumina MP library? This is a serious problem for some library preparations. This feature is present in Ray assembler, for example:

Ray: a NEW MPI-based 100% parallel genome assembler - SEQanswers

http://seqanswers.com/forums/showthread.php?t=4301&page=7

Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

Regards,
Nestor

**user1313** · 02-24-2012, 08:40 AM

Dear boetsie,

What's with the libraries, where number of "smaller insert size" read pairs is significantly higher, than of "long insert size" read pairs? Don't you think that using such libraries with SSPACE could lead to horrible results such as, in some cases, re-orienting the contigs? Is SSPACE capable now of detecting such libraries by counting PE/MP ratio of reads that were mapped within each contiguous sequence of DNA?

Regards,
Nestor

Originally posted by boetsie View Post

Hi Nestor,

This is already implemented in SSPACE. Basically, Ray does the same as SSPACE by incoorperating a range of allowed reads, for example an insert size of 4000 with 0.25 deviation (range is thus 3000-5000). This will initialy filter out 'paired-end' reads, since these have smaller insert sizes (< 500bp). In addition, SSPACE requires for each library the orientation of the paired-reads. If you specify the orientation <-- -->, --> <-- paired-reads will not be taking into account for scaffolding.

Regards,
Boetsie

**boetsie** · 02-29-2012, 06:43 AM

Originally posted by user1313 View Post

Dear boetsie,

What's with the libraries, where number of "smaller insert size" read pairs is significantly higher, than of "long insert size" read pairs? Don't you think that using such libraries with SSPACE could lead to horrible results such as, in some cases, re-orienting the contigs? Is SSPACE capable now of detecting such libraries by counting PE/MP ratio of reads that were mapped within each contiguous sequence of DNA?

Regards,
Nestor

That is indeed a problem, they might influence the scaffolding process. But since the smaller read pairs are --><-- orientated (and matepairs <-- --> orientated), they are filtered out.
I do not see the benefit of including the PE/MP ratio of reads mapped within a contig, they do not contribute to the scaffolding process. They can only influence the process when the pairs are aligned on different contigs, but as said, they will be filtered out because of orientation.

**user1313** · 02-29-2012, 07:19 AM

Dear boetsie,

Thank you for the answer. I still, however, would not agree. Correct me, please, if i am wrong.

If we have contig 1 and contig 2 with some PE reads (short arrow "->") and some MP reads (longer arrow "-->") like this:

Code:

    contig 1             contig 2
5`------------3`     5`------------3`
    <--    ->          <-    -->
            ->          <-
    ---------- 4000bp ----------

Now, let's assume that we have twice more of the PE reads than of MP reads.
We gave SSPACE the information that the library is MP with 4000bp insert size. Won't SSPACE reverse-complement contigs in this manner to make the more-abundant "PE" reads to fit the 4000bp "<-- -->" pattern?

Code:

  contig 1(RC)         contig 2 (RC)
5`------------3`     5`------------3`
    <-    -->          <--    ->
     <-                        ->
    ---------- 4000bp ----------

I don't say it will happen every time, but in some cases, where the length of the RC-contigs would fit the distance listed in the library file it could be a disastrous problem. To tell you the truth, with my limited experience, i have seen more problematic MP libraries than of good ones.

Regards,
Nestor

Originally posted by boetsie View Post

That is indeed a problem, they might influence the scaffolding process. But since the smaller read pairs are --><-- orientated (and matepairs <-- --> orientated), they are filtered out.
I do not see the benefit of including the PE/MP ratio of reads mapped within a contig, they do not contribute to the scaffolding process. They can only influence the process when the pairs are aligned on different contigs, but as said, they will be filtered out because of orientation.

**boetsie** · 02-29-2012, 08:30 AM

Yes, you are right, sorry. But this will only happen if both the contigs are short. Say the pair-end reads are mapped as following;

Code:

    contig 1 (1000bp)            contig 2 (8000 bp)
5`------------>3`     5`----------------------------->3`
            <-           <- 
           pos900    pos100

Since MP are <----> orientated, contig 2 should be reverse complement;

Code:

    contig 1 (1000bp)            contig 2 (8000 bp)
5`------------3`     3`<-----------------------------5`
            <-                                  -> 
           pos900                             pos7900

The distance is now (1000-900) + 7900 = 8000. This is a difference of 4000 compared with your library (8000-4000bp = 4000 difference).

I agree though, that if contig 2 is 4000bp smaller, the distance would be 4000bp. Near the size of your library! This could be a problem, especially with contig orientation and insert size estimation (distance is not 4000 for above example, but ~200bp (1000-900 of contig1) + (pos100 of contig2)).

Thanks for the direction, I'll try to dive deeper into this...

Regards,
Boetsie

**gaffa** · 03-18-2012, 03:30 PM

Is it possible to run SSPACE on external read mappings, i.e. can I perform the read mappings on my own and then have SSPACE do the scaffolding based on them?

**boetsie** · 03-19-2012, 12:37 AM

Originally posted by gaffa View Post

Is it possible to run SSPACE on external read mappings, i.e. can I perform the read mappings on my own and then have SSPACE do the scaffolding based on them?

yes, this is possible. The file should be in a TAB delimited format like:

<contig1> <startpos_on_contig1> <endpos_on_contig1> <contig2> <startpos_on_contig2> <endpos_on_contig2>

E.g.
contig1 100 150 contig1 350 300
contig1 4000 4050 contig2 110 60

There is a script in the 'tools' directory of the package to convert SAM/BAM to a tab format.

Regards,
Boetsie

Topics	Statistics	Last Post
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks by seqadmin Started by seqadmin, Yesterday, 05:31 AM	0 responses 10 views 0 likes	Last Post by seqadmin Yesterday, 05:31 AM
Small Blood Stem Cell Subset Linked to Immune System Aging by seqadmin Started by seqadmin, 10-24-2024, 06:58 AM	0 responses 20 views 0 likes	Last Post by seqadmin 10-24-2024, 06:58 AM
New AI Model Designs Synthetic DNA Switches for Targeted Gene Expression in Specific Cell Types by seqadmin Started by seqadmin, 10-23-2024, 08:43 AM	0 responses 48 views 0 likes	Last Post by seqadmin 10-23-2024, 08:43 AM
Microbes in Urban Spaces Adapt to Disinfectants and Scarce Resources by seqadmin Started by seqadmin, 10-17-2024, 07:29 AM	0 responses 58 views 0 likes	Last Post by seqadmin 10-17-2024, 07:29 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News