Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Hi Boetsie
I'm trying to scaffold a set of contigs from a bacterial genome assembly. Before scaffolding, there were no non-ACGTN bases in my assembly, but after scaffolding with SSPACE, there were. Can you please let me know what is causing this, and if there is an option to turn it off?
I have already set the -x parameter to 0 to turn off extension.
Thanks.
Comment
-
Hi mht,
SSPACE only adds 'n' or 'N' characters to the assembly, so it would be strange if there are other characters included after scaffolding. Could you please show me an example of what non-ACGTN characters there are included?
Regards,
Boetsie
Originally posted by mht View PostHi Boetsie
I'm trying to scaffold a set of contigs from a bacterial genome assembly. Before scaffolding, there were no non-ACGTN bases in my assembly, but after scaffolding with SSPACE, there were. Can you please let me know what is causing this, and if there is an option to turn it off?
I have already set the -x parameter to 0 to turn off extension.
Thanks.
Comment
-
oops boetsie, my bad. they were lower-case ACGTN characters. I used Velvet as my assembler so the lower case acgts were from there. What is the difference between 'n' and 'N' characters in SSPACE?
Originally posted by boetsie View PostHi mht,
SSPACE only adds 'n' or 'N' characters to the assembly, so it would be strange if there are other characters included after scaffolding. Could you please show me an example of what non-ACGTN characters there are included?
Regards,
Boetsie
Comment
-
It will generate a ‘n’ if a negative gap was found, meaning that there is potential overlap between the contigs but SSPACE could not find a full overlap.
It will generate a lower-case ‘acgt’ if there is actually an overlap found, e.g.;
Ctg1: AGTAGATAGATGATCGCGCTGA
Ctg2:.............ATCGCGCTGAAGTAGATAGATGAGATCGAC
Will be;
AGTAGATAGATGatcgcgctgaAGTAGATAGATGAGATCGAC
Regards,
Boetsie
Comment
-
for the TAB delimited format like:
<contig1> <startpos_on_contig1> <endpos_on_contig1> <contig2> <startpos_on_contig2> <endpos_on_contig2>
E.g.
contig1 100 150 contig1 350 300
contig1 4000 4050 contig2 110 60
if startpos greater than endpos means the reads mapped on to the - strand
I map my BAC end reads on the congits by BLAT , how do I contain the strand information in my TAB files?
Comment
-
I'm not really sure what you mean. You could just add your region of alignment in the tab-file, e.g. if the BAC aligns from contig 1 at position 1000-3000 and at contig 2 at position 4000-2000 (so reverse), you can just add this info:
contig1 1000 3000 contig2 4000 2000
SSPACE can only handle links between two contigs, so if a BAC aligns on multiple contigs you have to split it so you only have only a contig-contig link, instead of contig-contig-contig.
Regards,
Boetsie
Originally posted by biocomfun View Postfor the TAB delimited format like:
<contig1> <startpos_on_contig1> <endpos_on_contig1> <contig2> <startpos_on_contig2> <endpos_on_contig2>
E.g.
contig1 100 150 contig1 350 300
contig1 4000 4050 contig2 110 60
if startpos greater than endpos means the reads mapped on to the - strand
I map my BAC end reads on the congits by BLAT , how do I contain the strand information in my TAB files?
Comment
-
Hi,
I can't really judge that, since it depends on what you think is 'better'. Anyway, if you have a nice draft assembly, I would not use the contig extension option, main reason is that it is a time and memory-consuming process. Our current strategy is to to use SSPACE for generating the scaffolds followed by our tool GapFiller to close the gaps (N's) produced by SSPACE. GapFiller uses local information from the paired-read data for the extension, instead of all the unaligned reads. This extension is much faster and more reliable.
Regards,
Boetsie
Originally posted by mht View PostHi,
Using SSPACE, will it always be better to do contig extension prior to scaffolding? And do I do extension with both paired end and single end reads, or just paired end?
Thanks.
Comment
-
No, this won't work, since both reads of a pair should be mapped along the contigs. You better make paired-end data by splitting the reads. For example if your read is 200 long, you can make a paired-end read of the first 100bp and the last 100bp. Specify your insert size as 200bp. I've never done this, but I think this could work.
Regards,
Boetsie
Originally posted by sheepyuan View Posthi,
I have a question.I have some single-end 454 data, how would the SSPACE run if I artificially make it a pair-end data whose sequence of the other side is all "NNNNNNNNNN"?
Comment
-
Originally posted by boetsie View PostNo, this won't work, since both reads of a pair should be mapped along the contigs. You better make paired-end data by splitting the reads. For example if your read is 200 long, you can make a paired-end read of the first 100bp and the last 100bp. Specify your insert size as 200bp. I've never done this, but I think this could work.
Regards,
Boetsie
Comment
-
SSPACE combining cDNA and PE/MP
Hi all,
I'm using SSPACE with a wealth of data, from small PE libraries up to 20kb and 40kb mate pair libraries. In addition, I have three lanes of 2x100nt RNAseq which I'm curious if could be incorporated. My genome is highly repetitive (70%) , so I'm hoping that the more gene space sequence, the better.
I've seen the nematode paper where RNApath was used to scaffold a genome with RNAseq reads, but has anyone successfully used cDNA + PE/MP WGS data in SSPACE? There are some obvious considerations with splicing, but perhaps the plus/minus insert size error can take this into account?
Thanks,
AlexLast edited by aharkess; 01-15-2013, 10:50 AM.==========
Alex Harkess
Leebens-Mack Lab
Plant Biology Department
University of Georgia, Athens GA
Comment
-
Hello, have you use SSPACE for scaffolding your genome using RNA-seq data? How did you determine your insert size data?Thanks.
Originally posted by aharkess View PostHi all,
I'm using SSPACE with a wealth of data, from small PE libraries up to 20kb and 40kb mate pair libraries. In addition, I have three lanes of 2x100nt RNAseq which I'm curious if could be incorporated. My genome is highly repetitive (70%) , so I'm hoping that the more gene space sequence, the better.
I've seen the nematode paper where RNApath was used to scaffold a genome with RNAseq reads, but has anyone successfully used cDNA + PE/MP WGS data in SSPACE? There are some obvious considerations with splicing, but perhaps the plus/minus insert size error can take this into account?
Thanks,
Alex
Comment
-
Hi!
I am getting very good results with SSPACE Boetsie, which I plan to use forward with GapFiller.
I have a bunch of questions though, but the one more important now is about the foundlinks files.
I am sure I am missing the true naming convention of the foundlinks file (I mean, r1 f1 does mean contig1 in formattedcontigs file, and so on?). Any light on this please?
If the question it is not well understood, read below (if it is, skip it)
I have done several SSPACE runs over Velvet generated contigs, arranged in different fasta inputs:
- 1: contigs 1,3,4,6,7
- 2: contigs 2,3,4,6
- 3: contigs 1,2,4,6
I use SSPACE with two read libraries, in two runs. The first one with both libraries, the second one with the bigger insert size library. Both runs are free of scaffolds correct ones, and then I inspect the links. However, in the run1.big_insert_lib.foundlinks I have the same links than in run2.big_insert_lib.foundlinks, but I am not able to associate them to the same contigs, using the formattedcontigs file for name translation. (the question above
Comment
Latest Articles
Collapse
-
by seqadmin
The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...-
Channel: Articles
05-06-2024, 07:48 AM -
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 05-14-2024, 07:03 AM
|
0 responses
19 views
0 likes
|
Last Post
by seqadmin
05-14-2024, 07:03 AM
|
||
Started by seqadmin, 05-10-2024, 06:35 AM
|
0 responses
44 views
0 likes
|
Last Post
by seqadmin
05-10-2024, 06:35 AM
|
||
Started by seqadmin, 05-09-2024, 02:46 PM
|
0 responses
54 views
0 likes
|
Last Post
by seqadmin
05-09-2024, 02:46 PM
|
||
Started by seqadmin, 05-07-2024, 06:57 AM
|
0 responses
42 views
0 likes
|
Last Post
by seqadmin
05-07-2024, 06:57 AM
|
Comment