Seqanswers Leaderboard Ad

**goldenflaw** · 02-13-2011, 02:39 PM

Thanks for your prompt reply.

I get the following error:

=>Sun Feb 13 22:37:39 2011: Building Bowtie index for contigs (tmp.alboxf_scaffolds_no_extension/subset_contigs.fasta)

Bowtie-build error; -1 at /scratch/yang/tools/SSPACE-1.1_linux-x86_64/bin/mapWithBowtie.pl line 38.
WARNING: No scaffolding, because no reads found on contigs

I believe it might have something to do with bowtie, but I am unsure.

Thanks again!

**e-summer-3** · 02-13-2011, 08:37 PM

What should I call?

SSPACE is very nice tool for us. Thank you for your good job.

By the way, what should I call SSPACE?

es es pace?
es pace?
es space?

Regards.

**boetsie** · 02-14-2011, 12:07 AM

Yes that's a common problem. What version do you have from SSPACE?

The problem was mainly solved by going through the directory were the main SSPACE script (SSPACE_v1-x.pl) and folders are stored using the command line. Then, do one of the following;

chmod a+x bowtie/*

or

chmod 777 *

in your command line.

If this won't work, then you may try to download the newest Bowtie version at http://sourceforge.net/projects/bowt...bowtie/0.12.7/

Replace the files in the bowtie folder with the ones you've downloaded.

Kind regards,
Boetsie

**goldenflaw** · 02-14-2011, 10:47 AM

Downloading the newest version of bowtie worked (I am using SSPACE-1.1_linux-x86_64). Also, I had extra annotation in my reference (assembled file) and that screwed up bowtie as well (if anyone else runs into the same problem).

Thanks again!

**rsw3284** · 02-15-2011, 02:19 PM

Error with '-a' and insert stdev values

I'm getting the following error when running the SSPACE perl script using: -a = 0.70 (default) and insert stdev of 0.50:

Code:

ERROR: -a must be a number between 0.00 and 1.00. Your inserted -a is .70 ...Exiting.
ERROR: Insert stdev must be a number between 0.00 and 1.00. Your library lib1 has insert size of 0.50. Exiting.

Here are the contents of library.txt:

Code:

lib1 s_6_1_sequence.txt s_6_2_sequence.txt 250 0.50 0

and the command that was run:

Code:

perl SSPACE_v1-1.pl -l libraries.txt -s sk2_originalreads_contigs.fa -x 0 -m 32 -o 20 -t 0 -k 5 -n 15 -p 1 -v 0 -b sk2_origreads_no_extension

This was run on a 64-bit OSX server w/ 32gb RAM.

Edit: I believe this issue was corrected by correcting the permissions on the files involved. However, I'm having the same issue as the user above: WARNING: No scaffolding, because no reads found on contigs
Edit #2: Nevermind - changed permissions to 777 in the directories took care of this issue.

Thanks,

Rsw3284

**boetsie** · 02-16-2011, 10:26 AM

Hi rsw3284,

is it fixed now? To be honest, we did not test SSPACE on a MacOSX 64 bit server, only on a 32-bit server. However, the above problems are looking more like a perl problem rather than a SSPACE problem.

Boetsie

**rsw3284** · 02-16-2011, 03:12 PM

Yes, it's working just fine now. Thanks!

- Rsw3284

**hliang** · 02-17-2011, 04:29 PM

Hi boetsie,
thank you for the SSPACE. I have a question while reading the MANUAL file coming with SSPACE:

The libraries.txt file contains information about each library. For each library, column 2 and 3 are Fasta or fastq files for both ends. Should these fasta/fastq files be different files? But I found, in MANUAL file, this example:

Lib1 file1.fasta file2.fasta 400 0.5 1
Lib1 file2.fasta file2.fasta 400 0.5 1
Lib2 file3.fastq file3.fastq 4000 0.75 0

I'm a bit confused. In what kind of cases, file2.fasta/ file3.fastq can be placed in both column 2 and 3?

**boetsie** · 02-18-2011, 12:37 AM

Hi Hliang,

Thank you for your question, i see some mistakes there in the MANUAL.

About your question;

Column 2 and 3 should always be the same in one line. For example, if the file with the first reads are fastA, then the file with the second reads should also be fastA

However, if you have multiple library files, you might also have paired reads in fastQ format, which could also be used;

so, these libraries are ok:

lib1 file1.1.fastA file1.2.fastA 400 0.5 0
lib1 file2.1.fastQ file2.2.fastQ 400 0.5 0

While these are not correct;
lib1 file1.1.fastA file1.2.fastQ 400 0.5 0
lib1 file2.1.fastQ file2.2.fastA 400 0.5 0

Is this what you mean?

Kind regards,
Boetsie

Originally posted by hliang View Post

Hi boetsie,
thank you for the SSPACE. I have a question while reading the MANUAL file coming with SSPACE:

The libraries.txt file contains information about each library. For each library, column 2 and 3 are Fasta or fastq files for both ends. Should these fasta/fastq files be different files? But I found, in MANUAL file, this example:

Lib1 file1.fasta file2.fasta 400 0.5 1
Lib1 file2.fasta file2.fasta 400 0.5 1
Lib2 file3.fastq file3.fastq 4000 0.75 0

I'm a bit confused. In what kind of cases, file2.fasta/ file3.fastq can be placed in both column 2 and 3?

**hliang** · 02-18-2011, 08:31 AM

Thanks for the info.

So column 2 and column 3 should be PAIRED and have the same file format ?

can I concatenate (separate the paired-end sequences by ":" ) file1.1.fastA and file1.2.fastA into one single file file_combo.fastA, and use the following line?
lib1 file_combo.fastA file_combo.fastA 400 0.5 0

One more question: is SSPACE suitable for scaffolding using 454 paired-end data? 454 paired-end reads are longer than illumina/solexa reads and have a mix of different lengths (200-500 bp).

Originally posted by boetsie View Post

Hi Hliang,

Thank you for your question, i see some mistakes there in the MANUAL.

About your question;

Column 2 and 3 should always be the same in one line. For example, if the file with the first reads are fastA, then the file with the second reads should also be fastA

However, if you have multiple library files, you might also have paired reads in fastQ format, which could also be used;

so, these libraries are ok:

lib1 file1.1.fastA file1.2.fastA 400 0.5 0
lib1 file2.1.fastQ file2.2.fastQ 400 0.5 0

While these are not correct;
lib1 file1.1.fastA file1.2.fastQ 400 0.5 0
lib1 file2.1.fastQ file2.2.fastA 400 0.5 0

Is this what you mean?

Kind regards,
Boetsie

**boetsie** · 02-18-2011, 10:56 AM

Hi Hliang,

no i'm sorry, this is not possible. They should be paired in two files.

We use bowtie for mapping, were we only use only reads that map entirely for scaffolding. If the whole read can be mapped to the contig (thus without gaps) it should be possible. If it really works... I really don't know. You can give it a try

The differences in size does not matter, Illumina reads with different read lengths is also possible. In the future it is a good idea to have a mapper for larger sequences, you know any?

Boetsie

**hliang** · 02-18-2011, 11:22 AM

gotcha.

I'm not doing a lot mapping at the moment. but there are a bunch of programs you can take a look at here: http://en.wikipedia.org/wiki/List_of...nment_software
MUMmer and MAQ can handle long reads.

There is another one called LAST not mentioned above: http://last.cbrc.jp/

Originally posted by boetsie View Post

We use bowtie for mapping, were we only use only reads that map entirely for scaffolding. If the whole read can be mapped to the contig (thus without gaps) it should be possible. If it really works... I really don't know. You can give it a try

The differences in size does not matter, Illumina reads with different read lengths is also possible. In the future it is a good idea to have a mapper for larger sequences, you know any?

Boetsie

**themwg** · 02-22-2011, 03:30 PM

I have a question or two about the mapping stage.

I'm working with datasets that consist of a contig file assembled by using both paired end and mate pair data. I'm running SSpace with that contig file against the mate pair reads for scaffolding. In my best case I have 80 million inserted pairs, 10 million single reads and 7 million pairs with pairing contigs. In other cases 25 million inserted pairs, 600k single reads and 400k pairs w/ pairing contigs.

in the first case I do end up with extensive scaffolding despite ~6% of the reads mapping. in the other cases with less than 1% reads used for mapping I get very little scaffolding. I'm a little concerned about the low level of reads mapping to my contigs. and without getting into details of my datasets (as they are different species and could be the source of the difference) I'm curious if you have any thoughts on this from the program's point of view.

Perhaps I just need some clarification of some of the terms.
#number of single reads found on contigs =
(I use an insert size of 3000bp with a std dev of .5)
regarding the mapping step, does this mean you take the 4500bp from the left and right edge of each contig to use for the mapping step or do you delete 4500 bp off each edge and just use the middle of the contigs for mapping. I assume it's the first option but you use the word "subtracted" in the readme file which is somewhat misleading.

#number of pairs found with pairing contigs =
for "pairing contigs" I get numbers that are greater than half the single reads. If SSPACE uses 10 million single reads for mapping, I would imagine that at most I could get 5 million pairs

#total pairs =
I'm unclear about what this number means. total read pairs used in mapping? if so, i'm unclear how this relates to the single reads. my understanding is that SSPACE/BOwtie takes all the read pairs that don't have Ns then maps each single read to the contigs. It then determines which of the reads are paired and what contigs those lie on etc.

any light you could shed would be greatly appreciated.. I'm fully ready to realize i'm just being dense.

**boetsie** · 02-23-2011, 01:13 AM

Hi themwg,

thank you for the good points you mention there. I see indeed some vague descriptions and mistakes in the summary file.

About your questions;

- I indeed take 4500bp from the left and right edge for scaffolding, which is of course the obvious method.

- You are absolutely right, the number of pairs should indeed be at least two times smaller than the number of single reads. I see that I displayed the wrong variable in my script. I will fix this in a next release.

- As said above, wrong calculation for the total number of single reads. Total pairs is a sort of filtering step for the pairs. The actual pairs used for scaffolding is the value given at "Assembled pairs".

I'm sorry for the mistakes, as said, i will fix this in next release which will probably come in the next week.

Kind regards,
Boetsie

**jstjohn** · 02-23-2011, 04:10 PM

file in ./reads/ folder really small

Hey,
I am a little worried because my input files were each about 3G of gzipped fastq, and the .fasta files in the ./reads/ folder are only about 100M each. I am pretty sure that there are more perfect reads without N's than that... I trimmed bases from the beginning and end of reads prior to running the program, and the file should only have reads that are over 30nt.

One possible bug is that I noticed a few of the fastq reads have 0 length in my input file, but they are still paired properly, and have the right new lines and everything so the two files are the right relative length. Do you think that is causing issues for the program?

UPDATE: fixed the above issue with the few 0 length reads, and the output files still have this issue of being very small compared with the size of the input files. Maybe I just don't understand what the files in the ./reads folder are?

Thanks!
-John

Topics	Statistics	Last Post
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks by seqadmin Started by seqadmin, Yesterday, 05:31 AM	0 responses 10 views 0 likes	Last Post by seqadmin Yesterday, 05:31 AM
Small Blood Stem Cell Subset Linked to Immune System Aging by seqadmin Started by seqadmin, 10-24-2024, 06:58 AM	0 responses 20 views 0 likes	Last Post by seqadmin 10-24-2024, 06:58 AM
New AI Model Designs Synthetic DNA Switches for Targeted Gene Expression in Specific Cell Types by seqadmin Started by seqadmin, 10-23-2024, 08:43 AM	0 responses 48 views 0 likes	Last Post by seqadmin 10-23-2024, 08:43 AM
Microbes in Urban Spaces Adapt to Disinfectants and Scarce Resources by seqadmin Started by seqadmin, 10-17-2024, 07:29 AM	0 responses 58 views 0 likes	Last Post by seqadmin 10-17-2024, 07:29 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News