Header Leaderboard Ad

Collapse

Software packages for next gen sequence analysis

Collapse

Announcement

Collapse
No announcement yet.
This topic is closed.
X
This is a sticky topic.
X
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #76
    Originally posted by spirit View Post
    Thank you for your interest. I will answer these questions as I could.

    1. What are the longest and shortest reads it can handle effectively?

    Now, ZOOM could handle reads of length ranging from 15bp to 64bp. In fact, the kernel idea of ZOOM is quite easy to be extended to longer reads. It is the implementation that limits the length to be no more than 64bp. We will come to the 454 data later after the version for Illumina/Solexa and ABI SOLiD is stable.

    2. how does it compare to Eland or MAQ in reads aligned per minute?

    Since ELAND is the fastest software to deal with Illumina/Solexa data as we know, we compare the speed with ELAND in our benchmark. By mapping reads of length 15bp to 32bp with same sensitivity, ZOOM took half time of ELAND, even 1/3 when short reads are concerned. Furthermore, ELAND can only deal with no more than about 16 million reads. ZOOM has no limitation on the reads number as long as your RAM accepts. Both ELAND and ZOOM hash read and scan the reference sequence. So, if you process more reads in one scan pass, you could even save more time. Since the speed of ZOOM correlates closely to the length of reference sequence and the read length, it’s hard to give the number of reads aligned per minutes. To give you an impression, there is some data from our benchmark. When achieving full sensitivity of two mismatches:
    It aligns 3.4 million reads of 36bp BAC reads to the 162k region (where the BAC comes from) in 37 seconds with 1.1G RAM.
    It aligns 24 million reads of 36bp (5X of human chromosome 6) to chromosome 6 in 17 minutes 17 seconds with 6.5G RAM.
    It aligns 22 million reads of 17bp CHIP-SEQ data to whole human genome in 4 hours and 22 minutes with 4.2G RAM.
    For ABI/SOLiD data, the speed is slower than Illumina/Solexa data. ZOOM aligns 28 million reads of 25bp to E.coli genome(4M) with automatic sequencing error correction in 5 minutes.
    We tried to compare the speed and sensitivity with MAQ since it’s famous. However, I am totally puzzled with its input format and output format. So lazy me gave up since its website declare it’s slower than ELAND.

    3. How many mismatches does it handle?

    In principle, you can decide the mismatch number as you like as long as it is less than the read length.  ZOOM guarantee 100% sensitivity for a large range of <read length, mismatch number> cases.
    When mismatches required is larger than the mismatch number in the cases of <read length, mismatch number> ZOOM used, sensitivity will decrease slightly. For example, mapping read of length 50bp could achieve 100% sensitivity with 4 mismatches. If you require 5 mismatches, then the sensitivity will decrease slightly. However, if you do need 100% sensitivity in these cases, feel free to contact us, we will satisfy you.

    4. Does it have a gapped mode?
    Yes, ZOOM can handle insertion/deletion between reads and the reference sequence. For Illumina/Solexa data, one gap but with any length you wish are allowed besides mismatches required. However, ZOOM can’t guarantee 100% sensitivity to find alignments with gap. I think nobody using filtering strategy could. 

    5. What format is required for the reference genome?

    The format of reference genome would be a fasta file or multiple fasta files.
    The format of Illumina/Solexa reads file can be in fasta, *_seq.txt or *_prb.txt. The format of ABI SOLiD *.csfasta is supported too.

    6. What format are the alignments reported in?

    For Illumina/Solexa data, the output of this release of ZOOM is reported in the format of “read_name reference_seq_name: position_of_mapped +/- mismatch_number” . If assembly is required, ZOOM will output the assembly consensus, coverage and frequency of {A,C,T,G} on each position of consensus.
    For ABI/SOLiD data, besides the alignment information, ZOOM could output the reads decoded into the base space, with polymorphism on base space and sequencing error on color space highlighted.
    In our next release, we will show the alignment in a GUI view showing the multiple alignment of mapped reads on the reference sequence and those heterozygous sites.

    7. Can you comment on the cost/licenses it will be provided under?

    About the cost of full version of ZOOM, maybe it’s a better way to ask the sales person when the website is ready next week.  I think an academic-free version for Illumina/Solexa data with limited function will be provided too.

    8. Can you give us the link to the download when it's ready?

    Sure. I will offer the latest news when it’s ready.
    Wow. You are well documented.

    Comment


    • #77
      Added two more ChIP-SEQ tools.

      Comment


      • #78
        Hi everyone,

        I'm new to this forum. Thanks to Sci_guy for posting this one-stop-shop article of NextGen sw packages. It is very useful.

        We just started to use Illumina GAII for our genome project (mostly microbial genome and transcriptome sequencing) and would like people's experience on assembly tools, especially with using hybrid approach such as 454/Illumina, Sanger/Illumina, etc. With so many tools out there, can someone suggest their favorite de novo / aligner and reasons of their choice?

        Recently I tried a new short read aligner called Bowtie (http://bowtie-bio.sourceforge.net/) designed for fast mapping of Illumina reads. It was developed by
        Steven Salzberg's group at University of Maryland and is claimed to be 10 times faster than MAQ. Bowtie is open source and comes with a script to convert its output to use MAQ's downstream SNP tools.

        I tried it and it was pretty easy to use and it WAS fast. I am wondering if anybody else had tried this tool and can share with pros and cons of this tool compared with other existing tools.

        Thanks!
        Jerry Liu

        Comment


        • #79
          Bowtie

          Hi Jerry.
          How well (if at all) are indels handled by Bowtie? I have found Novoalign to handle gaps better than Maq and SOAP. By the way, I am looking forward to seeing the release of TopHat.
          Regards,

          Ryan

          Comment


          • #80
            Hi Ryan,

            According to its manual (http://bowtie-bio.sourceforge.net/manual.html), it currently does not support indels, PE reads, or ABI color space.

            Jerry

            Comment


            • #81
              Just saw this paper about SOCS (short oligonucleotides in color space), looking forward to trying it against corona-lite and maq.

              Documentation says it's multithreaded and RAM used can be set by user. It's great to see tools for dealing with colorspace directly...

              Comment


              • #82
                I like bowtie, it has great performance using the BW transform search routine but still not as mature in features (indels, PE ) as Novoalign.
                I think both packages will progress quite nicely with enhanced features as this field moves.
                A Good job by the Bowtie developers.

                Comment


                • #83
                  Just added Slider to the first post.

                  Comment


                  • #84
                    Thanks for doing this, ECO! This is a huge help to those of us just getting started on NG sequencing.

                    I see that most of the discussion is focused around genomic alignment and variant discovery, but I'm interested in methods for analyzing transcriptome sequence data, e.g. for quantitation and/or ID of alternative transcripts. There's a tool called QPALMA that's designed specifically for alignment of spliced sequences from short reads, and includes sequence quality information in generating the alignments. Anyone here have any experience with QPALMA?

                    Comment


                    • #85
                      BFAST?
                      https://secure.genome.ucla.edu/index.php/BFAST

                      Anyone has experience with it?
                      --
                      bioinfosm

                      Comment


                      • #86
                        Originally posted by bioinfosm View Post
                        BFAST?
                        https://secure.genome.ucla.edu/index.php/BFAST

                        Anyone has experience with it?
                        I am the author of BFAST. Let me know if you have any questions. Please see the site about obtaining the source code (available for academicu se). Also, I will be giving a talk on Friday November 14 morning about BFAST at the Annual Meeting for American Society of Human Genetics. I will be in Philadelphia that week, so if you are interested in meeting to discuss sequence alignment, let me know.

                        Nils Homer

                        Comment


                        • #87
                          I'm not sure if we want to add base calling algorithms/software to the list, but I just came across an interesting one: Rolexa

                          How many people have experimented with alternate base-calling software? Or are people generally content with the quality of the sequences with the manufacturer supplied software (we are using Illumina in particular)?

                          Comment


                          • #88
                            Originally posted by lparsons View Post
                            I'm not sure if we want to add base calling algorithms/software to the list, but I just came across an interesting one: Rolexa
                            Sure, why not. Rolexa, Alta-cyclic...any others out there?

                            Comment


                            • #89
                              Originally posted by nilshomer View Post
                              I am the author of BFAST. Let me know if you have any questions. Please see the site about obtaining the source code (available for academicu se). Also, I will be giving a talk on Friday November 14 morning about BFAST at the Annual Meeting for American Society of Human Genetics. I will be in Philadelphia that week, so if you are interested in meeting to discuss sequence alignment, let me know.

                              Nils Homer
                              Thanks Nils. Could you possible post your presentation somewhere?
                              I probably wont get a chance to look at the tool for a while...

                              thanks.
                              --
                              bioinfosm

                              Comment


                              • #90
                                DNASTAR has changed the name of its integrated tool from SeqMan Genome Assembler to SeqMan NGen. it also works on Vista since that it what we are using with it.

                                Comment

                                Working...
                                X