Seqanswers Leaderboard Ad

**rpdias** · 08-21-2012, 03:26 AM

Thanks Boetsie!

Cheers,
Ricardo

Originally posted by boetsie View Post

Hi Ricardo,

look at this post where colindaven suggests how to fix the problem;

SSPACE: a new stand-alone scaffolding tool for small and large genomes - SEQanswers

http://seqanswers.com/forums/showpost.php?p=50912&postcount=114

Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

Simply chmod a+x all directories of SSPACE.

Regards,
Boetsie

**mht** · 12-06-2012, 07:28 AM

Hi Boetsie

I'm trying to scaffold a set of contigs from a bacterial genome assembly. Before scaffolding, there were no non-ACGTN bases in my assembly, but after scaffolding with SSPACE, there were. Can you please let me know what is causing this, and if there is an option to turn it off?

I have already set the -x parameter to 0 to turn off extension.

Thanks.

**boetsie** · 12-10-2012, 07:54 AM

Hi mht,

SSPACE only adds 'n' or 'N' characters to the assembly, so it would be strange if there are other characters included after scaffolding. Could you please show me an example of what non-ACGTN characters there are included?

Regards,
Boetsie

Originally posted by mht View Post

Hi Boetsie

I'm trying to scaffold a set of contigs from a bacterial genome assembly. Before scaffolding, there were no non-ACGTN bases in my assembly, but after scaffolding with SSPACE, there were. Can you please let me know what is causing this, and if there is an option to turn it off?

I have already set the -x parameter to 0 to turn off extension.

Thanks.

**mht** · 12-10-2012, 07:52 PM

oops boetsie, my bad. they were lower-case ACGTN characters. I used Velvet as my assembler so the lower case acgts were from there. What is the difference between 'n' and 'N' characters in SSPACE?

Originally posted by boetsie View Post

Hi mht,

SSPACE only adds 'n' or 'N' characters to the assembly, so it would be strange if there are other characters included after scaffolding. Could you please show me an example of what non-ACGTN characters there are included?

Regards,
Boetsie

**boetsie** · 12-11-2012, 04:56 AM

It will generate a ‘n’ if a negative gap was found, meaning that there is potential overlap between the contigs but SSPACE could not find a full overlap.

It will generate a lower-case ‘acgt’ if there is actually an overlap found, e.g.;

Ctg1: AGTAGATAGATGATCGCGCTGA
Ctg2:.............ATCGCGCTGAAGTAGATAGATGAGATCGAC

Will be;
AGTAGATAGATGatcgcgctgaAGTAGATAGATGAGATCGAC

Regards,
Boetsie

**biocomfun** · 12-18-2012, 12:15 AM

for the TAB delimited format like:

<contig1> <startpos_on_contig1> <endpos_on_contig1> <contig2> <startpos_on_contig2> <endpos_on_contig2>

E.g.
contig1 100 150 contig1 350 300
contig1 4000 4050 contig2 110 60

if startpos greater than endpos means the reads mapped on to the - strand

I map my BAC end reads on the congits by BLAT , how do I contain the strand information in my TAB files?

**mht** · 12-27-2012, 07:07 AM

Hi,

Using SSPACE, will it always be better to do contig extension prior to scaffolding? And do I do extension with both paired end and single end reads, or just paired end?

Thanks.

**boetsie** · 12-27-2012, 01:26 PM

I'm not really sure what you mean. You could just add your region of alignment in the tab-file, e.g. if the BAC aligns from contig 1 at position 1000-3000 and at contig 2 at position 4000-2000 (so reverse), you can just add this info:

contig1 1000 3000 contig2 4000 2000

SSPACE can only handle links between two contigs, so if a BAC aligns on multiple contigs you have to split it so you only have only a contig-contig link, instead of contig-contig-contig.

Regards,
Boetsie

Originally posted by biocomfun View Post

for the TAB delimited format like:

<contig1> <startpos_on_contig1> <endpos_on_contig1> <contig2> <startpos_on_contig2> <endpos_on_contig2>

E.g.
contig1 100 150 contig1 350 300
contig1 4000 4050 contig2 110 60

if startpos greater than endpos means the reads mapped on to the - strand

I map my BAC end reads on the congits by BLAT , how do I contain the strand information in my TAB files?

**boetsie** · 12-27-2012, 01:31 PM

Hi,

I can't really judge that, since it depends on what you think is 'better'. Anyway, if you have a nice draft assembly, I would not use the contig extension option, main reason is that it is a time and memory-consuming process. Our current strategy is to to use SSPACE for generating the scaffolds followed by our tool GapFiller to close the gaps (N's) produced by SSPACE. GapFiller uses local information from the paired-read data for the extension, instead of all the unaligned reads. This extension is much faster and more reliable.

Regards,
Boetsie

Originally posted by mht View Post

Hi,

Using SSPACE, will it always be better to do contig extension prior to scaffolding? And do I do extension with both paired end and single end reads, or just paired end?

Thanks.

**sheepyuan** · 12-28-2012, 06:06 PM

hi,
I have a question.I have some single-end 454 data, how would the SSPACE run if I artificially make it a pair-end data whose sequence of the other side is all "NNNNNNNNNN"?

**boetsie** · 01-09-2013, 12:26 PM

No, this won't work, since both reads of a pair should be mapped along the contigs. You better make paired-end data by splitting the reads. For example if your read is 200 long, you can make a paired-end read of the first 100bp and the last 100bp. Specify your insert size as 200bp. I've never done this, but I think this could work.

Regards,
Boetsie

Originally posted by sheepyuan View Post

hi,
I have a question.I have some single-end 454 data, how would the SSPACE run if I artificially make it a pair-end data whose sequence of the other side is all "NNNNNNNNNN"?

**sheepyuan** · 01-10-2013, 01:18 AM

Originally posted by boetsie View Post

No, this won't work, since both reads of a pair should be mapped along the contigs. You better make paired-end data by splitting the reads. For example if your read is 200 long, you can make a paired-end read of the first 100bp and the last 100bp. Specify your insert size as 200bp. I've never done this, but I think this could work.

Regards,
Boetsie

Thank you very much, I'll try your method of splitting the read!

**aharkess** · 01-15-2013, 10:34 AM

SSPACE combining cDNA and PE/MP

Hi all,

I'm using SSPACE with a wealth of data, from small PE libraries up to 20kb and 40kb mate pair libraries. In addition, I have three lanes of 2x100nt RNAseq which I'm curious if could be incorporated. My genome is highly repetitive (70%) , so I'm hoping that the more gene space sequence, the better.

I've seen the nematode paper where RNApath was used to scaffold a genome with RNAseq reads, but has anyone successfully used cDNA + PE/MP WGS data in SSPACE? There are some obvious considerations with splicing, but perhaps the plus/minus insert size error can take this into account?

Thanks,
Alex

**yzzhang** · 02-21-2013, 06:12 AM

Hello, have you use SSPACE for scaffolding your genome using RNA-seq data? How did you determine your insert size data?Thanks.

Originally posted by aharkess View Post

Hi all,

I'm using SSPACE with a wealth of data, from small PE libraries up to 20kb and 40kb mate pair libraries. In addition, I have three lanes of 2x100nt RNAseq which I'm curious if could be incorporated. My genome is highly repetitive (70%) , so I'm hoping that the more gene space sequence, the better.

I've seen the nematode paper where RNApath was used to scaffold a genome with RNAseq reads, but has anyone successfully used cDNA + PE/MP WGS data in SSPACE? There are some obvious considerations with splicing, but perhaps the plus/minus insert size error can take this into account?

Thanks,
Alex

**CPCantalapiedra** · 02-27-2013, 07:40 AM

Hi!

I am getting very good results with SSPACE Boetsie, which I plan to use forward with GapFiller.
I have a bunch of questions though, but the one more important now is about the foundlinks files.

I am sure I am missing the true naming convention of the foundlinks file (I mean, r1 f1 does mean contig1 in formattedcontigs file, and so on?). Any light on this please?

If the question it is not well understood, read below

(if it is, skip it)

I have done several SSPACE runs over Velvet generated contigs, arranged in different fasta inputs:
- 1: contigs 1,3,4,6,7
- 2: contigs 2,3,4,6
- 3: contigs 1,2,4,6

I use SSPACE with two read libraries, in two runs. The first one with both libraries, the second one with the bigger insert size library. Both runs are free of scaffolds correct ones, and then I inspect the links. However, in the run1.big_insert_lib.foundlinks I have the same links than in run2.big_insert_lib.foundlinks, but I am not able to associate them to the same contigs, using the formattedcontigs file for name translation. (the question above

Topics	Statistics	Last Post
Expanded Genetic Insights into Blood Pressure Regulation by seqadmin Started by seqadmin, Yesterday, 12:17 PM	0 responses 13 views 0 likes	Last Post by seqadmin Yesterday, 12:17 PM
The Role of Enhancers in Defining Cell Fate by seqadmin Started by seqadmin, 04-29-2024, 10:49 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-29-2024, 10:49 AM
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 24 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 23 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News