Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • christinawu2008
    replied
    bowtie-build error

    Originally posted by boetsie View Post
    No problem, good luck with your further analysis.
    Hi boetsie,

    SSPACE must be very useful tool for scaffolding. But when I tried to use it, the process was failed by bowtie-build step. I only have contig file contains all name with super_contig sequences without other information and there are lots of 'N' gaps between. Do I need to modify some information and get bowtie-build works? If not, what's the problem?

    The reads I have are 100PE
    so the library is like
    lib1 ***1.fastq ***2.fastq 200 0.7 0
    or I should replace 200 to 400?

    Leave a comment:


  • boetsie
    replied
    No problem, good luck with your further analysis.

    Leave a comment:


  • Autotroph
    replied
    Hi boetsie,

    Thanks a lot for the patient explanation.

    Leave a comment:


  • boetsie
    replied
    Hi Autotroph,

    sorry but i think it's simply not possible to merge them with SSPACE with the method you try to do. SSPACE will only look at the end of the contigs if there is any overlap, while you try to change the "N" characters into DNA characters by merging.

    SSPACE does this;
    CATCGTACTACGTATCTGATAGCTAGCTAGCTACGATCGATC
    .............................
    GCTACGATCGATCAGTAGTAGATAGATAGATGATAG

    While you try to find an certain overlap, and determine the rest of the sequence;

    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCATCGTACTACGTATCTGATAGCTAGCTAGCTACGATCGATCGTCATCG


    TGTGTCAGCTAGCTACGAGCTAGCTAGCTACTACTAGCTACTAGCTAGCGCATCGTACTACGTATCTGATAGCTAGCTAGCTACGATCGATCGTCATCG
    .......

    As said, i think what you want to do is not possible with SSPACE. Maybe you can first do a gapclosure on the scaffolds (e.g. with SOAP's gapclosure method) so the N's will be removed out of your data.

    Boetsie

    Leave a comment:


  • Autotroph
    replied
    The point of giving an insert size of 100(50+50) is to not have any gaps in the final scaffold. I understood that the two reads could even overlap if an insert size less than 100 is given for 2*50 bp reads.

    Actual sequence (without any gaps)expected would be:

    "AGCTACTAGCTGCTACTAGCTCAGATGCATCGATCGACGATCTGATCGGCTGTGTCAGCTAGCTACGAGCTAGCTAGCTACTACTAGCTACTAGCTAGCGCATCGTACTACGTATCTGATAGCTAGCTAGCTACGATCGATCGTCATCG"

    I even tried with 200 as insert size, but it fails to merge the contigs "correctly".

    output given below :

    >scaffold1.1|size269
    AGCTACTAGCTGCTACTAGCTCAGATGCATCGATCGACGATCTGATCGGCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCATCGTACTACGTATCTGATAGCTAGCTAGCTACGATCGATCGTCATCGnCGATCGACGATCTGATCGGCTGTGTCAGCTAGCTACGAGCTAGCTAGCTACTACTAGCTACTAGCTAGCGCATCGTACTACGTATCTGATAGCTAGCTAGCTACGATCGATCGTCATCG

    Does it mean that the two reads of PE must have a gap between them?

    Why "TGTGTCAGCTAGCTACGAGCTAGCTAGCTACTACTAGCTACTAGCTAGCG" is not replacing the N's while it has overlap and also has PE read connecting the two 'contigs'?

    Leave a comment:


  • boetsie
    replied
    Originally posted by Autotroph View Post
    Could you please look at below example and let me know why SSPACE does not merge the 2 "contigs"?
    Hi Autotroph,

    I've had a look at it, and i think i know why it did not merge. You should increase the insert size in your library file. SSPACE includes the read lengths within the determination of the gap/overlap. With 100bp insert size, it did not satisfy the minimum allowed distance.

    The read lengths of your 2 reads are both 50bp. So increasing the insert size in your library with 100 (2*50bp of your reads) should do it, thus;

    lib1 read1.fa read2.fa 200 0.7 0

    If you need a more detailed description, please let me know

    Kind regards,
    Boetsie

    Leave a comment:


  • Ashu
    replied
    Hi Boetsie,
    Thank you for the information,
    I have a mate pair, with a distance, estimated by bioanalyzer,
    My library looks as follows

    MP1 /G1/2_5kb/s_a_sequence_1.fastq /G1/2_5kb/s_a_sequence_2.fastq 2500 0.7 1
    MP1 /G1/2_5kb/s_b_sequence_1.fastq /G1/2_5kb/s_b_sequence_2.fastq 2500 0.7 1
    MP1 /G2/2_5kb/s_a_sequence_1.fastq /G2/2_5kb/s_a_sequence_2.fastq 2500 0.7 1
    MP1 /G2/2_5kb/s_b_sequence_1.fastq /G2/2_5kb/s_b_sequence_2.fastq 2500 0.7 1
    MP1 /G2/2_5kb/s_c_sequence_1.fastq /G2/2_5kb/s_c_sequence_2.fastq 2500 0.7 1
    MP1 /G2/2_5kb/s_d_sequence_1.fastq /G2/2_5kb/s_d_sequence_2.fastq 2500 0.7 1

    I will try it with paired end form (0), but i cant imagine why it turns out to be paired end not matepair. In the pairing issue file, I also see that there is a lot of distance problem, is there a way to put this in graph.
    Thank you again for your kind reaction,
    regards,
    Ashu

    Leave a comment:


  • Autotroph
    replied
    unfortunately Minimus can be used to merge contigs only, not scaffolds.Bambus is able to merge scaffolds but does not allow N's in the input.

    It might be possible for me to use Minimus and SSPACE in some combination to merge the scaffolds.

    Could you please look at below example and let me know why SSPACE does not merge the 2 "contigs"?

    --------------------_________________--------------------------
    read1 read2(rev-comped) (common anchor sequence)

    Contigs.fa:

    >contig1
    AGCTACTAGCTGCTACTAGCTCAGATGCATCGATCGACGATCTGATCGGCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCATCGTACTACGTATCTGATAGCTAGCTAGCTACGATCGATCGTCATCG
    >contig2
    TGTGTCAGCTAGCTACGAGCTAGCTAGCTACTACTAGCTACTAGCTAGCGCATCGTACTACGTATCTGATAGCTAGCTAGCTACGATCGATCGTCATCG

    read1.fa

    >read1
    AGCTACTAGCTGCTACTAGCTCAGATGCATCGATCGACGATCTGATCGGC

    read2 .fa(first 50 bases of contig2 are reverse complemented)

    >read2
    CGCTAGCTAGTAGCTAGTAGTAGCTAGCTAGCTCGTAGCTAGCTGACACA

    lib file:

    lib1 read1.fa read2.fa 100 0.7 0

    command:

    perl SSPACE_v1-1.pl -l lib -s contigs.fa -k 1 -a 0.7 -x 1 -o 1 -b merger

    This gives me 2 scaffolds instead of the 1 scaffold that i am expecting. When the length of the anchor sequence is reduced, it gives a single scaffold with a "n" placed between the 2 scaffolds.

    Surprisingly if the same information is given in the form a set of 2 mate pairs, the 2 scaffolds are merged. My guess would be that SSPACE does not treat the initial set of N's in the same way as the N's added by it in the intermediate steps.
    Last edited by Autotroph; 03-08-2011, 03:03 AM. Reason: additional information

    Leave a comment:


  • boetsie
    replied
    Originally posted by Autotroph View Post
    Thanks for the clarification Boetsie,

    Bowtie can handle only reads that are a maximum of 1024 BP long. What does SSPACE do for reads that are longer than that?
    SSPACE can unfortunately not handle sequences longer than 1024 bp long. They simply are not used for mapping.

    I am interested in merging scaffolds, that is merging 2 sequences that look like below(SSPACE does not use reads with N's in the paired end files, am i correct?)
    Indeed SSPACE does not allow reads with N's in the paired-end files.

    I think you should consider another program for this, since you mention that you want to merge scaffolds, instead of extend them. You could try something like an alignment program if you want to merge 2 scaffolds. Maybe you can do something like Ken Kraaijeveld (http://www.kenkraaijeveld.nl/genomics/bioinformatics/). See the "combining contigs" section.

    Boetsie

    Leave a comment:


  • boetsie
    replied
    Originally posted by Ashu View Post
    HI Boetsie,
    I can't find any improvement before and after scaffolding ... Am I doing something wrong ??? Thanks
    Hi Ashu,

    i'm pretty sure you turned around the library file. Are you using paired-end (--> <-- direction) or mate pair (<-- --> direction) reads? If you use paired-end, your library should look something like this;

    libname file1.fasta file2.fasta 700 0.25 0

    With the last column containing a 0. For mate pairs, the last column should contain a 1;

    libname file1.fasta file2.fasta 700 0.25 1

    I think this should do it.

    Boetsie

    Leave a comment:


  • Autotroph
    replied
    longer reads

    Thanks for the clarification Boetsie,

    Bowtie can handle only reads that are a maximum of 1024 BP long. What does SSPACE do for reads that are longer than that?

    I am interested in merging scaffolds, that is merging 2 sequences that look like below(SSPACE does not use reads with N's in the paired end files, am i correct?):

    AGCTAGCTAGCTNNNNNNNNNCGATCGATGCNNNNNNNCGATCGATCGATCGNNNNCAGCTAGT


    ANNNNNTAGCTACGATCGATCGNNNNNNNNNGATGCACGTACGATNNCGATNNNNNNNNNNNCAGCTAGT

    Leave a comment:


  • Ashu
    replied
    SSPACE bo improvement in N50 or contig size

    HI Boetsie,
    I can't find any improvement before and after scaffolding ... Am I doing something wrong ??? Thanks

    -x = 0
    -k = 5
    -a = 0.7
    -n = 15
    -p = 0

    ==================================

    Number of single reads found on contigs = 84724494
    Number of pairs found with pairing contigs / total pairs = 47882393 / 48019708
    ------------------------------------------------------------

    READ PAIRS STATS:
    ------------------------------------------------------------
    At least one sequence/pair missing from contigs: 137314
    Assembled pairs: 47882393 (95764786 sequences)
    Satisfied in distance/logic within contigs (i.e. -> <-, distance on target: 2500 +/-1750): 22
    Unsatisfied in distance within contigs (i.e. distance out-of-bounds): 11
    Unsatisfied pairing logic within contigs (i.e. illogical pairing ->->, <-<- or <-->): 81
    ---
    Satisfied in distance/logic within a given contig pair (pre-scaffold): 26534237
    Unsatisfied in distance within a given contig pair (i.e. calculated distances out-of-bounds): 21348042
    ---
    Total satisfied: 26534259 unsatisfied: 21348134

    ------------------------------------------------------------

    ################################################################################

    SUMMARY:
    ------------------------------------------------------------
    Inserted contig file;
    Total number of contigs = 1060008
    Sum (bp) = 2114313317
    Max contig size = 56175
    Min contig size = 200
    Average contig size = 1988
    N50 = 3918

    After scaffolding MP1:
    Total number of scaffolds = 1060008
    Sum (bp) = 2114313317
    Max scaffold size = 56175
    Min scaffold size = 200
    Average scaffold size = 1988
    N50 = 3918
    Regards

    Leave a comment:


  • boetsie
    replied
    Hi,

    You say;
    The problem with using SSPACE is that it does not allow N's in the input contig file.
    while the SSPACE manual says;
    Contigs having a non-ACGT character like “.” or “N” are not discarded. They are used for extension, mapping and building scaffolds. However, contigs having such character at either end of the sequence, could fail for proper contig extension.
    So, they can be used for extending, only if the N's are at the end of a sequence it is unable to map reads.

    I don't know about Velvet... I know SSAKE (which has basically the same procedure as SSPACE) also can use contigs as 'seeds' and extends them with additional reads. Difference is that SSPACE first maps the reads to the pre-assembled contigs and only uses the unmapped reads for contig/scaffold extension. SSAKE does not include mapping.

    Kind regards,
    Boetsie

    Leave a comment:


  • Autotroph
    replied
    Thanks.

    Ya i guess i will be extending the previous scaffolds.

    The problem with using SSPACE is that it does not allow N's in the input contig file.

    The scaffolds which i have are having varying insert sizes. Should i break each of them into paired end reads and use as separate libraries to use it in SSPACE?

    Velvet is not able to handle long reads which are more than 20KB?

    Leave a comment:


  • boetsie
    replied
    Hi,

    do you want to scaffold the previous scaffold, or do you want to extend the previous scaffolds?

    Anyway, maybe you can try out SSPACE for this purpose, see this thread;

    Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


    Kind regards,
    Boetsie

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Recent Advances in Sequencing Analysis Tools
    by seqadmin


    The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
    05-06-2024, 07:48 AM
  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    04-22-2024, 07:01 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 07:03 AM
0 responses
14 views
0 likes
Last Post seqadmin  
Started by seqadmin, 05-10-2024, 06:35 AM
0 responses
36 views
0 likes
Last Post seqadmin  
Started by seqadmin, 05-09-2024, 02:46 PM
0 responses
43 views
0 likes
Last Post seqadmin  
Started by seqadmin, 05-07-2024, 06:57 AM
0 responses
38 views
0 likes
Last Post seqadmin  
Working...
X