Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Hi boetsie, thanx again for your quick reply.
    Here is a part of my .contig file. It was created by ace2contig (AMOS pack) and the input was the .ace that phrap generated after the assembly.
    I'll try to use the script u attached.
    Thank you so much again!

    ##Contig1 1 458 bases, 00000000 checksum.
    agttcggcatggggtcaggtggttccactgcgctattgccgccaggcaaattcttcaatc
    tgagaaagctgatgtaagtaattcgttcattcgctacaaggccagaaacacttcttgggt
    gttgtatggttaagcctcacgggtaattagtatgggttagctcaacgtatcgctacgctt
    acacaccccacctatcaacgttgtggtctccaacggccctttaggaccctcaaggggtca
    gggatgactcatctcagggctcgcttcccgcttagatgctttcagcggttatcgattccg
    aacttagctaccgggcagtgccactggcgtgacaacccgaacaccagaggttcgttcact
    ccggtcctctcgtactaggagcaactcccttcaatcatccaacgcccacggcagataggg
    accgaactgtctcacgacgttctgaacccagctcgcgt
    #FZ92HC101BPK62(0) [] 458 bases, 00000000 checksum. {1 458} <1 459>
    agttcggcatggggtcaggtggttccactgcgctattgccgccaggcaaattcttcaatc
    tgagaaagctgatgtaagtaattcgttcattcgctacaaggccagaaacacttcttgggt
    gttgtatggttaagcctcacgggtaattagtatgggttagctcaacgtatcgctacgctt
    acacaccccacctatcaacgttgtggtctccaacggccctttaggaccctcaaggggtca
    gggatgactcatctcagggctcgcttcccgcttagatgctttcagcggttatcgattccg
    aacttagctaccgggcagtgccactggcgtgacaacccgaacaccagaggttcgttcact
    ccggtcctctcgtactaggagcaactcccttcaatcatccaacgcccacggcagataggg
    accgaactgtctcacgacgttctgaacccagctcgcgt
    ##Contig2 1 379 bases, 00000000 checksum.
    ttctgagggaacacgcgttctgcgcgggttgtcttggtgctcactgttttccgccccgga
    gtttgtggggtgttgggggtggtgggtgtgtgttgtttgagaagtgcatagtggatgcga
    gcatctagcccggcgagttccttggtgttcttgttgggttgtgtgttctgcaatttcgat
    tctggtttgtgcgatcgcgtgttgtgatcgttgatttttgtttgttgtccgcattcgcgt
    ctcgggcactgtttggtgtgtggggtgtgtttgtgggtgttgttgtaagtgtttgagggc
    gttcggtggatgccttggtaccaggagccgatgaaggacggccgtgcggtgggtcagtga
    taaatcgacatgttaggtg
    #FZ92HC101BFQDN(0) [] 379 bases, 00000000 checksum. {1 379} <1 380>
    ttctgagggaacacgcgttctgcgcgggttgtcttggtgctcactgttttccgccccgga
    gtttgtggggtgttgggggtggtgggtgtgtgttgtttgagaagtgcatagtggatgcga
    gcatctagcccggcgagttccttggtgttcttgttgggttgtgtgttctgcaatttcgat
    tctggtttgtgcgatcgcgtgttgtgatcgttgatttttgtttgttgtccgcattcgcgt
    ctcgggcactgtttggtgtgtggggtgtgtttgtgggtgttgttgtaagtgtttgagggc
    gttcggtggatgccttggtaccaggagccgatgaaggacggccgtgcggtgggtcagtga
    taaatcgacatgttaggtg

    Comment


    • #17
      Hi, I forgot to mention that I also have the .sff if I can use them to create .mates it'll be great.
      Can I? If so, how?

      Comment


      • #18
        Originally posted by danix View Post
        Hi, I forgot to mention that I also have the .sff if I can use them to create .mates it'll be great.
        Can I? If so, how?
        I have no idea... I've never used a .sff file. How does it look like? why do you want to use it, does it contain additional data?

        If the mates that are present in the .contig file, are all present in the two .fasta files, you can just use the two fasta files to create the .mates file.

        Comment


        • #19
          Hi, the 454 output is sff (looks like a binary file), but we use a script called sff_extract to convert this data in fasta, xml and quality files. I was just reading now that "The 454 paired-end protocol will generate reads which contain the forward and reverse direction in one read, separated by a linker."
          So I think the key to generate .mates is .sff, but I don't know how.
          I think I shouldn't be so complicated...

          Comment


          • #20
            Originally posted by boetsie View Post
            I have no idea... I've never used a .sff file. How does it look like? why do you want to use it, does it contain additional data?

            If the mates that are present in the .contig file, are all present in the two .fasta files, you can just use the two fasta files to create the .mates file.
            How do I create the .mates? I tried with the script u send me and the output isn't fine. Besides I don't understand why FZ92HC101CZUHH.1 and FZ92HC102IDBLW.2 are in the same line. How can I tell that they are mates? I'm really lost and confused now...

            FZ92HC101CZUHH.1 FZ92HC102IDBLW.2 libname
            FZ92HC101DJEHD.1 FZ92HC102JYG94.2 libname
            FZ92HC101DUWKQ.1 FZ92HC102HS1LU.2 libname
            FZ92HC101CUUV5.1 FZ92HC102G8H4Z.2 libname
            FZ92HC101EMKQX.1 FZ92HC102HOD38.2 libname
            FZ92HC101CE653.1 FZ92HC102HO0J7.2 libname
            FZ92HC101ECTBB.1 FZ92HC102IBNJJ.2 libname
            FZ92HC101DXMSC.1 TGATCCGGCGCAGGCGTATCTGGGCTCGGATCGTGCCTGGTGCCGACGGCGATGAACGAC
            libname
            FZ92HC101C587C.1 FZ92HC102F3E16.2 libname
            FZ92HC101BZ63S.1 CGGTCGGCCGCGGCCGATCTCGGGATTGCGCGGCGTGTGCAT
            libname
            FZ92HC101DEODE.1 CCGCGTGGACATGCCGTTCGAGGAACCGTGGACGCAACC
            libname
            FZ92HC101DP9HX.1 ATCGGCTATGCACAGGTCATCGAGTATCTCGACGGCG
            libname
            FZ92HC101EE90B.1 ACGTCCGACGTGATCAGGAGCGAGTCGGTGACGGCGCTTCGCACTCCGAGGG
            libname
            TTTGATGATCGACATCAAT GCGTTCGACTACCAGTTCGTCGGACCATCCGGGTAGCGTGTCGCAAGGGTCGGTTCCGAA
            libname
            CGTTCGCTGAGCACCGCCGAATCGAGCAGTTCGCGGATCTCGTCGAACGTCCNCGA FZ92HC102GE3MB.2 libname
            CGTACGGATGTAGCTGGTGAAGAGGTCCCTTGCGGGCGGAGAAGTCGAGTCGTTCCGTCG TCGAGAGGCCGCGGAAGCGGCCGGAAAGGACGGCAACGATGTTTGACCGTTTCAACTCAG
            libname
            FZ92HC101DBOTK.1 FZ92HC102GVOHT.2 libname
            FZ92HC101BEEQB.1 TCTGCGTGGAGACCGTGACGGCTGATCTACGGCCNCCTCGGCCGATGATCGCCGCCT

            Comment


            • #21
              Bambus error: library priority

              Boetsie and danix, I noticed that you may do a lot of work using Bambus, I also get the contigs generated from CLCbio. I know how to get the .contig file for Bambus, and I also got a mates file following your instructions, but when I rum goBambus, I got an error:
              20100710|193857| 16658| Grommit(/home/aubsxl/bin/bambus/bin/grommit -i ctg2660_BES_mapping_704.inp -o ctg2660_BES_
              mapping_704.out.xml -C ctg2660_BES_mapping_704.grommit.conf --append --logfile goBambus.log --debug 1) script fail
              ed
              20100710|204158|24277|grommit|FATAL|9: Priority not specified: at least one library must be assigned a priority

              I don't know what's the 'priority', how can I do to solve this problem? could you all give any help? Thanks in advance.

              Comment


              • #22
                Hi catfisher,

                i´ve had this error too. To solve it, you should set a priority in the .conf file. A file named default.conf is generated once you have run Bambus. This file contains the default parameters. Change or edit the line to;

                priority ALL 1

                to the file.
                If you did not run Bambus yet, you should create one from scratch. See the below links for more information. Once you have the .conf file, you should add it to the command line options with for example;
                goBambus -c test.contig -m test.mates -C default.conf -o test-bambus

                For more information about the .config file see;
                Download AMOS for free. AMOS is a collection of tools for genome assembly. AMOS is a collection of tools and class interfaces for the assembly of DNA reads. The package includes a robust infrastructure, modular assembly pipelines, and tools for overlapping, consensus generation, contigging, and assembly manipulation.

                For an example see;
                Download AMOS for free. AMOS is a collection of tools for genome assembly. AMOS is a collection of tools and class interfaces for the assembly of DNA reads. The package includes a robust infrastructure, modular assembly pipelines, and tools for overlapping, consensus generation, contigging, and assembly manipulation.


                Marten

                Comment


                • #23
                  grommit script failed

                  Marten, thanks for your quick reply. I editted my configure file as you suggested and run goBambus again, but still failed.
                  I used the .conf as:
                  # Priorities
                  priority ALL 1
                  # The following lines can be un-commented to specify certain
                  # per-library settings

                  # Redundancies
                  # redundancy lib_some 1

                  # allowed error
                  # error MUMmer 0.5

                  # overlaps allowed
                  # overlaps MUMmer Y

                  # Global redundancy
                  redundancy 2

                  # min group size
                  mingroupsize 0

                  The log information for goBambus is :
                  Parsing links out of input file
                  Step 100: running detective
                  Combining XML files
                  Step 200: making the xmls
                  starting
                  Done
                  Step 300: Preparing contig links
                  starting
                  Done
                  Step 400: Running scaffolder
                  Grommit(/home/aubsxl/bin/bambus/bin/grommit -i ctg2660_BES_mapping_704.inp -o ctg2660_BES_mapping_704.out.xml -C c
                  tg2660_BES_mapping_704.grommit.conf --append --logfile goBambus.log --debug 1) script failed

                  The error information from goBambus.error file is:
                  20100712|123807| 10451| Grommit(/home/aubsxl/bin/bambus/bin/grommit -i ctg2660_BES_mapping_704.inp -o ctg2660_BES_
                  mapping_704.out.xml -C ctg2660_BES_mapping_704.grommit.conf --append --logfile goBambus.log --debug 1) script fail
                  ed

                  The first several lines from my mates files is:
                  library libname 200 500
                  HWUSI-EAS1665_0002:2:1:1022:18088#0/1 HWUSI-EAS1665_0002:2:1:1022:18088#0/2 libname
                  HWUSI-EAS1665_0002:2:1:1029:11872#0/1 HWUSI-EAS1665_0002:2:1:1029:11872#0/2 libname
                  HWUSI-EAS1665_0002:2:1:1029:11034#0/1 HWUSI-EAS1665_0002:2:1:1029:11034#0/2 libname
                  HWUSI-EAS1665_0002:2:1:1030:19457#0/1 HWUSI-EAS1665_0002:2:1:1030:19457#0/2 libname
                  HWUSI-EAS1665_0002:2:1:1031:12133#0/1 HWUSI-EAS1665_0002:2:1:1031:12133#0/2 libname

                  Marten, could you look at these information and point out what's wrong with this? I have no idea. Thanks a lot,

                  Kevin

                  Comment


                  • #24
                    Hi catfisher,

                    hmmm weird error, since it doesn't point out where it goes wrong. Is that the only error?

                    Some thing that might help;

                    replace ":" and "#" in the readnames to underscores ("_"). E.g.;

                    HWUSI-EAS1665_0002:2:1:1022:18088#0/1
                    will be;
                    HWUSI-EAS1665_0002_2_1_1022_18088_0/1

                    do this both in the .mates file and .contig file.

                    Code to do this is;

                    cat input.mates | sed s/#/_/g | sed s/:/_/g > output.mates

                    where input.mates is the input file, and output.mates the converted output file.

                    I don't know if this really works...

                    Otherwise it might be a good idea to contact Bambus developers, since i'm not to familiar with Bambus.

                    Good luck.

                    Cheers,
                    Marten

                    Comment


                    • #25
                      Catfisher,

                      I had the same error months ago. I ended up filtering my contigs so I only kept longer contigs (>500nts) with high coverage (depends on your dataset). I didn't change my mates file and then it suddenly worked. I'm not quite sure why, but it might be worth a shot.

                      Jason

                      Comment


                      • #26
                        Originally posted by themerlin View Post
                        Catfisher,

                        I had the same error months ago. I ended up filtering my contigs so I only kept longer contigs (>500nts) with high coverage (depends on your dataset). I didn't change my mates file and then it suddenly worked. I'm not quite sure why, but it might be worth a shot.

                        Jason
                        I headed 100k lines of the contig and mates files and rerun the program for these data, the program also worked now.
                        Does anyone know how much data size we can handle with the bambus? I am afraid that it has a built-in limit for how big the input data can be input. I have 704 contigs in the .contig file and 3434936 x2 paired ends, the program didn't work if I loaded all of them. I tested one with contigs less than 500bp (some are about 200bp), it worked also. How big were the input data when you all used the Bambus? Thanks,

                        Kevin

                        Comment


                        • #27
                          This thread was solved by a program developed by myself which can scaffold assembled contigs in .fasta format with paired-end and/or mate pair sequences. No conversion of file formats are required. See this thread;

                          Comment


                          • #28
                            You can try SSPACE too. It is a scaffolder for next-gen data.

                            Bioinformatics paper:


                            SEQanswers thread:
                            Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


                            Download:


                            -seb

                            Comment


                            • #29
                              Originally posted by seb567 View Post
                              You can try SSPACE too. It is a scaffolder for next-gen data.

                              -seb
                              Hmmm, that is the program I meant, since I'm the developer haha. I refered to the wrong link in my previous reply

                              Boetsie

                              Comment


                              • #30
                                Originally posted by boetsie View Post
                                Hmmm, that is the program I meant, since I'm the developer haha. I refered to the wrong link in my previous reply

                                Boetsie
                                That's funny !

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Exploring the Dynamics of the Tumor Microenvironment
                                  by seqadmin




                                  The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
                                  07-08-2024, 03:19 PM
                                • seqadmin
                                  Exploring Human Diversity Through Large-Scale Omics
                                  by seqadmin


                                  In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
                                  06-25-2024, 06:43 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 07:20 AM
                                0 responses
                                23 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 07-16-2024, 05:49 AM
                                0 responses
                                38 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 07-15-2024, 06:53 AM
                                0 responses
                                43 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 07-10-2024, 07:30 AM
                                0 responses
                                41 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X