Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • catfisher
    replied
    grommit script failed

    Marten, thanks for your quick reply. I editted my configure file as you suggested and run goBambus again, but still failed.
    I used the .conf as:
    # Priorities
    priority ALL 1
    # The following lines can be un-commented to specify certain
    # per-library settings

    # Redundancies
    # redundancy lib_some 1

    # allowed error
    # error MUMmer 0.5

    # overlaps allowed
    # overlaps MUMmer Y

    # Global redundancy
    redundancy 2

    # min group size
    mingroupsize 0

    The log information for goBambus is :
    Parsing links out of input file
    Step 100: running detective
    Combining XML files
    Step 200: making the xmls
    starting
    Done
    Step 300: Preparing contig links
    starting
    Done
    Step 400: Running scaffolder
    Grommit(/home/aubsxl/bin/bambus/bin/grommit -i ctg2660_BES_mapping_704.inp -o ctg2660_BES_mapping_704.out.xml -C c
    tg2660_BES_mapping_704.grommit.conf --append --logfile goBambus.log --debug 1) script failed

    The error information from goBambus.error file is:
    20100712|123807| 10451| Grommit(/home/aubsxl/bin/bambus/bin/grommit -i ctg2660_BES_mapping_704.inp -o ctg2660_BES_
    mapping_704.out.xml -C ctg2660_BES_mapping_704.grommit.conf --append --logfile goBambus.log --debug 1) script fail
    ed

    The first several lines from my mates files is:
    library libname 200 500
    HWUSI-EAS1665_0002:2:1:1022:18088#0/1 HWUSI-EAS1665_0002:2:1:1022:18088#0/2 libname
    HWUSI-EAS1665_0002:2:1:1029:11872#0/1 HWUSI-EAS1665_0002:2:1:1029:11872#0/2 libname
    HWUSI-EAS1665_0002:2:1:1029:11034#0/1 HWUSI-EAS1665_0002:2:1:1029:11034#0/2 libname
    HWUSI-EAS1665_0002:2:1:1030:19457#0/1 HWUSI-EAS1665_0002:2:1:1030:19457#0/2 libname
    HWUSI-EAS1665_0002:2:1:1031:12133#0/1 HWUSI-EAS1665_0002:2:1:1031:12133#0/2 libname

    Marten, could you look at these information and point out what's wrong with this? I have no idea. Thanks a lot,

    Kevin

    Leave a comment:


  • boetsie
    replied
    Hi catfisher,

    i´ve had this error too. To solve it, you should set a priority in the .conf file. A file named default.conf is generated once you have run Bambus. This file contains the default parameters. Change or edit the line to;

    priority ALL 1

    to the file.
    If you did not run Bambus yet, you should create one from scratch. See the below links for more information. Once you have the .conf file, you should add it to the command line options with for example;
    goBambus -c test.contig -m test.mates -C default.conf -o test-bambus

    For more information about the .config file see;
    Download AMOS for free. AMOS is a collection of tools for genome assembly. AMOS is a collection of tools and class interfaces for the assembly of DNA reads. The package includes a robust infrastructure, modular assembly pipelines, and tools for overlapping, consensus generation, contigging, and assembly manipulation.

    For an example see;
    Download AMOS for free. AMOS is a collection of tools for genome assembly. AMOS is a collection of tools and class interfaces for the assembly of DNA reads. The package includes a robust infrastructure, modular assembly pipelines, and tools for overlapping, consensus generation, contigging, and assembly manipulation.


    Marten

    Leave a comment:


  • catfisher
    replied
    Bambus error: library priority

    Boetsie and danix, I noticed that you may do a lot of work using Bambus, I also get the contigs generated from CLCbio. I know how to get the .contig file for Bambus, and I also got a mates file following your instructions, but when I rum goBambus, I got an error:
    20100710|193857| 16658| Grommit(/home/aubsxl/bin/bambus/bin/grommit -i ctg2660_BES_mapping_704.inp -o ctg2660_BES_
    mapping_704.out.xml -C ctg2660_BES_mapping_704.grommit.conf --append --logfile goBambus.log --debug 1) script fail
    ed
    20100710|204158|24277|grommit|FATAL|9: Priority not specified: at least one library must be assigned a priority

    I don't know what's the 'priority', how can I do to solve this problem? could you all give any help? Thanks in advance.

    Leave a comment:


  • danix
    replied
    Originally posted by boetsie View Post
    I have no idea... I've never used a .sff file. How does it look like? why do you want to use it, does it contain additional data?

    If the mates that are present in the .contig file, are all present in the two .fasta files, you can just use the two fasta files to create the .mates file.
    How do I create the .mates? I tried with the script u send me and the output isn't fine. Besides I don't understand why FZ92HC101CZUHH.1 and FZ92HC102IDBLW.2 are in the same line. How can I tell that they are mates? I'm really lost and confused now...

    FZ92HC101CZUHH.1 FZ92HC102IDBLW.2 libname
    FZ92HC101DJEHD.1 FZ92HC102JYG94.2 libname
    FZ92HC101DUWKQ.1 FZ92HC102HS1LU.2 libname
    FZ92HC101CUUV5.1 FZ92HC102G8H4Z.2 libname
    FZ92HC101EMKQX.1 FZ92HC102HOD38.2 libname
    FZ92HC101CE653.1 FZ92HC102HO0J7.2 libname
    FZ92HC101ECTBB.1 FZ92HC102IBNJJ.2 libname
    FZ92HC101DXMSC.1 TGATCCGGCGCAGGCGTATCTGGGCTCGGATCGTGCCTGGTGCCGACGGCGATGAACGAC
    libname
    FZ92HC101C587C.1 FZ92HC102F3E16.2 libname
    FZ92HC101BZ63S.1 CGGTCGGCCGCGGCCGATCTCGGGATTGCGCGGCGTGTGCAT
    libname
    FZ92HC101DEODE.1 CCGCGTGGACATGCCGTTCGAGGAACCGTGGACGCAACC
    libname
    FZ92HC101DP9HX.1 ATCGGCTATGCACAGGTCATCGAGTATCTCGACGGCG
    libname
    FZ92HC101EE90B.1 ACGTCCGACGTGATCAGGAGCGAGTCGGTGACGGCGCTTCGCACTCCGAGGG
    libname
    TTTGATGATCGACATCAAT GCGTTCGACTACCAGTTCGTCGGACCATCCGGGTAGCGTGTCGCAAGGGTCGGTTCCGAA
    libname
    CGTTCGCTGAGCACCGCCGAATCGAGCAGTTCGCGGATCTCGTCGAACGTCCNCGA FZ92HC102GE3MB.2 libname
    CGTACGGATGTAGCTGGTGAAGAGGTCCCTTGCGGGCGGAGAAGTCGAGTCGTTCCGTCG TCGAGAGGCCGCGGAAGCGGCCGGAAAGGACGGCAACGATGTTTGACCGTTTCAACTCAG
    libname
    FZ92HC101DBOTK.1 FZ92HC102GVOHT.2 libname
    FZ92HC101BEEQB.1 TCTGCGTGGAGACCGTGACGGCTGATCTACGGCCNCCTCGGCCGATGATCGCCGCCT

    Leave a comment:


  • danix
    replied
    Hi, the 454 output is sff (looks like a binary file), but we use a script called sff_extract to convert this data in fasta, xml and quality files. I was just reading now that "The 454 paired-end protocol will generate reads which contain the forward and reverse direction in one read, separated by a linker."
    So I think the key to generate .mates is .sff, but I don't know how.
    I think I shouldn't be so complicated...

    Leave a comment:


  • boetsie
    replied
    Originally posted by danix View Post
    Hi, I forgot to mention that I also have the .sff if I can use them to create .mates it'll be great.
    Can I? If so, how?
    I have no idea... I've never used a .sff file. How does it look like? why do you want to use it, does it contain additional data?

    If the mates that are present in the .contig file, are all present in the two .fasta files, you can just use the two fasta files to create the .mates file.

    Leave a comment:


  • danix
    replied
    Hi, I forgot to mention that I also have the .sff if I can use them to create .mates it'll be great.
    Can I? If so, how?

    Leave a comment:


  • danix
    replied
    Hi boetsie, thanx again for your quick reply.
    Here is a part of my .contig file. It was created by ace2contig (AMOS pack) and the input was the .ace that phrap generated after the assembly.
    I'll try to use the script u attached.
    Thank you so much again!

    ##Contig1 1 458 bases, 00000000 checksum.
    agttcggcatggggtcaggtggttccactgcgctattgccgccaggcaaattcttcaatc
    tgagaaagctgatgtaagtaattcgttcattcgctacaaggccagaaacacttcttgggt
    gttgtatggttaagcctcacgggtaattagtatgggttagctcaacgtatcgctacgctt
    acacaccccacctatcaacgttgtggtctccaacggccctttaggaccctcaaggggtca
    gggatgactcatctcagggctcgcttcccgcttagatgctttcagcggttatcgattccg
    aacttagctaccgggcagtgccactggcgtgacaacccgaacaccagaggttcgttcact
    ccggtcctctcgtactaggagcaactcccttcaatcatccaacgcccacggcagataggg
    accgaactgtctcacgacgttctgaacccagctcgcgt
    #FZ92HC101BPK62(0) [] 458 bases, 00000000 checksum. {1 458} <1 459>
    agttcggcatggggtcaggtggttccactgcgctattgccgccaggcaaattcttcaatc
    tgagaaagctgatgtaagtaattcgttcattcgctacaaggccagaaacacttcttgggt
    gttgtatggttaagcctcacgggtaattagtatgggttagctcaacgtatcgctacgctt
    acacaccccacctatcaacgttgtggtctccaacggccctttaggaccctcaaggggtca
    gggatgactcatctcagggctcgcttcccgcttagatgctttcagcggttatcgattccg
    aacttagctaccgggcagtgccactggcgtgacaacccgaacaccagaggttcgttcact
    ccggtcctctcgtactaggagcaactcccttcaatcatccaacgcccacggcagataggg
    accgaactgtctcacgacgttctgaacccagctcgcgt
    ##Contig2 1 379 bases, 00000000 checksum.
    ttctgagggaacacgcgttctgcgcgggttgtcttggtgctcactgttttccgccccgga
    gtttgtggggtgttgggggtggtgggtgtgtgttgtttgagaagtgcatagtggatgcga
    gcatctagcccggcgagttccttggtgttcttgttgggttgtgtgttctgcaatttcgat
    tctggtttgtgcgatcgcgtgttgtgatcgttgatttttgtttgttgtccgcattcgcgt
    ctcgggcactgtttggtgtgtggggtgtgtttgtgggtgttgttgtaagtgtttgagggc
    gttcggtggatgccttggtaccaggagccgatgaaggacggccgtgcggtgggtcagtga
    taaatcgacatgttaggtg
    #FZ92HC101BFQDN(0) [] 379 bases, 00000000 checksum. {1 379} <1 380>
    ttctgagggaacacgcgttctgcgcgggttgtcttggtgctcactgttttccgccccgga
    gtttgtggggtgttgggggtggtgggtgtgtgttgtttgagaagtgcatagtggatgcga
    gcatctagcccggcgagttccttggtgttcttgttgggttgtgtgttctgcaatttcgat
    tctggtttgtgcgatcgcgtgttgtgatcgttgatttttgtttgttgtccgcattcgcgt
    ctcgggcactgtttggtgtgtggggtgtgtttgtgggtgttgttgtaagtgtttgagggc
    gttcggtggatgccttggtaccaggagccgatgaaggacggccgtgcggtgggtcagtga
    taaatcgacatgttaggtg

    Leave a comment:


  • boetsie
    replied
    Originally posted by danix View Post
    Complementing the information I gave before:
    454Reads.01.MID4.fna is like this:
    >FZ92HC101CZUHH length=41 xy=1111_1155 region=1 run=R_2009_08_04_12_33_02_
    CGCGCGTTTCTCGTACGGCTCGCTGTATCCGACNCGCGCGC
    >FZ92HC101DJEHD length=46 xy=1334_0127 region=1 run=R_2009_08_04_12_33_02_
    GTCTCGCGTCGTGTCTTCGCGTCGTATGCGGTACTGGTCAGGCGTT

    454Reads.02.MID4.fna is like this:
    >FZ92HC102IDBLW length=40 xy=3315_0370 region=2 run=R_2009_08_04_12_33_02_
    CGCGCGTTCTCGTACGGCTCGCTGTATCCGACNCGCGCGC
    >FZ92HC102JYG94 length=40 xy=3966_0618 region=2 run=R_2009_08_04_12_33_02_
    CGCGCGTTCTCGTACGGCTCGCTGTATCCGACNCGCGCGC

    Can I extract any information from these fastas to create a .mates?
    Thanx
    Hmmm i see it, it's 454, that doesn't have a prefix like .x or /1. (sorry, i have never worked with 454 data before )

    Can you tell me how your .contig file looks like?

    The mate file should have the same name as the first string after the "#" line in the .contig file. This line represents which read has mapped to the contig (starting with ##).

    So if the line with "#" starts with e.g. FZ92HC102IDBLW, followed by the offset in parantheses, like;

    #FZ92HC102IDBLW(0)

    you should extract the names out of both files and put them in the same file

    If this is indeed the case, you can use my script i attached.
    Use it with;

    perl testmates.pl file1 file2

    It will generate a txt file with the mates. Only thing to do is put the library sizes at the top of the file.

    more info about .contig file at http://www.cbcb.umd.edu/research/con...entation.shtml

    Hope this helps.
    Attached Files
    Last edited by boetsie; 04-15-2010, 05:25 AM.

    Leave a comment:


  • danix
    replied
    Originally posted by danix View Post
    Thanx boetsie for your quick answer.
    But I can't use your script in this project because the 454 outputs I have 454Reads.01.MID4.fna and 454Reads.02.MID4.fna, have sequences with different names, so all id is unique and it creates a mates.txt empty.
    Besides, the other bacteria I'm working with has only one fasta from 454.

    Both fasta are like this:
    >F35ERS102DJ7GS rank=0000002 x=1343.0 y=826.0 length=56
    ATCAGACACGGAGGCGTACGCGCCGCTGTTCCAGGTGATGCTGGCATTCCAGAACA
    >F35ERS102DBYUE rank=0000006 x=1249.0 y=1428.0 length=69
    ATCAGACACGCCGCCGGCACCTTCGCCGCTGCCGCGCTCGCCACCGGTGGCACCCGTCGT
    GCTGTGGTC
    >F35ERS102C47FN rank=0000036 x=1172.0 y=1361.0 length=68
    ATCAGACACGAGGTGAAGACCGGTTTCCGTCGCGGCGGAGAATAGCCGAACATCAGCGCG
    CGATCGGG

    I'm wondering if there is a way to create the .mates from the data I have. Any other idea?

    Thanx
    Complementing the information I gave before:
    454Reads.01.MID4.fna is like this:
    >FZ92HC101CZUHH length=41 xy=1111_1155 region=1 run=R_2009_08_04_12_33_02_
    CGCGCGTTTCTCGTACGGCTCGCTGTATCCGACNCGCGCGC
    >FZ92HC101DJEHD length=46 xy=1334_0127 region=1 run=R_2009_08_04_12_33_02_
    GTCTCGCGTCGTGTCTTCGCGTCGTATGCGGTACTGGTCAGGCGTT

    454Reads.02.MID4.fna is like this:
    >FZ92HC102IDBLW length=40 xy=3315_0370 region=2 run=R_2009_08_04_12_33_02_
    CGCGCGTTCTCGTACGGCTCGCTGTATCCGACNCGCGCGC
    >FZ92HC102JYG94 length=40 xy=3966_0618 region=2 run=R_2009_08_04_12_33_02_
    CGCGCGTTCTCGTACGGCTCGCTGTATCCGACNCGCGCGC

    Can I extract any information from these fastas to create a .mates?
    Thanx

    Leave a comment:


  • danix
    replied
    building sacaffold using bambus - .mates problem

    Thanx boetsie for your quick answer.
    But I can't use your script in this project because the 454 outputs I have 454Reads.01.MID4.fna and 454Reads.02.MID4.fna, have sequences with different names, so all id is unique and it creates a mates.txt empty.
    Besides, the other bacteria I'm working with has only one fasta from 454.

    Both fasta are like this:
    >F35ERS102DJ7GS rank=0000002 x=1343.0 y=826.0 length=56
    ATCAGACACGGAGGCGTACGCGCCGCTGTTCCAGGTGATGCTGGCATTCCAGAACA
    >F35ERS102DBYUE rank=0000006 x=1249.0 y=1428.0 length=69
    ATCAGACACGCCGCCGGCACCTTCGCCGCTGCCGCGCTCGCCACCGGTGGCACCCGTCGT
    GCTGTGGTC
    >F35ERS102C47FN rank=0000036 x=1172.0 y=1361.0 length=68
    ATCAGACACGAGGTGAAGACCGGTTTCCGTCGCGGCGGAGAATAGCCGAACATCAGCGCG
    CGATCGGG

    I'm wondering if there is a way to create the .mates from the data I have. Any other idea?

    Thanx

    Leave a comment:


  • boetsie
    replied
    Originally posted by danix View Post
    Hi, I'm trying to run bambus but I don't have any .mates. Does anyone know how can I create this files?
    I have a 454 output (fasta + sff) from a bacteria genome and I assembled it with phrap, I already convert the .ace to .contig, using ace2contig from AMOS.
    Thanx!
    This script i got from Sergey Koren from AMOS, (which i adapted a bit):

    cat my.fasta |grep ">" |sed s/\>//g |sed 's/\/1*$/./g;s/\/2*$/./g'|awk -F "." '{print $1}' |sort |uniq -c |awk '{if ($1 == 2) print $2"/1\t"$2"/2\tsmall"}' > mates.txt

    You need to put in the fasta file with the read names as 'my.fasta'.

    The file 'my.fasta' requires filenames to end with /1 and /2.
    If you have other file names, like .x and .y. You should replace;

    sed 's/\/1*$/./g;s/\/2*$/./g'

    to for example;

    sed 's/.x*$/./g;s/.y*$/./g'

    in the code above.

    If you have two fasta files. Just insert one and change;
    if ($1 == 2) to if ($1 == 1)
    in the code, this way you only have to run it for one file.

    This will print the names to 'mates.txt'. Only thing to do is to set your library name and insert sizes on the top of this file.

    Bambus will probably generate a lot of errors, because some names are not found in the .contig file. But this shouldn't be a problem.

    Hope this works otherwise ask me.

    Leave a comment:


  • danix
    replied
    building sacaffold using bambus - .mates problem

    Hi, I'm trying to run bambus but I don't have any .mates. Does anyone know how can I create this files?
    I have a 454 output (fasta + sff) from a bacteria genome and I assembled it with phrap, I already convert the .ace to .contig, using ace2contig from AMOS.
    Thanx!

    Leave a comment:


  • boetsie
    replied
    Originally posted by mack View Post
    How big is your dataset? I were able to export my dataset as ace with 17k contigs + 250k singletons.
    more than 1 million contigs

    Leave a comment:


  • mack
    replied
    Originally posted by boetsie View Post
    For large datasets, somehow no .ace files are produced.
    How big is your dataset? I were able to export my dataset as ace with 17k contigs + 250k singletons.

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    04-22-2024, 07:01 AM
  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 11:49 AM
0 responses
15 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-24-2024, 08:47 AM
0 responses
16 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
61 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
60 views
0 likes
Last Post seqadmin  
Working...
X