Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • myrna
    Member
    • Feb 2008
    • 44

    #16
    Novoalign update?

    Hi Colin.
    I have been working with Novoalign a bit and am finding it useful in picking up indels and SNPs missed by other aligners. I am wondering if it can also pick up structural aberrations that I have missed using other approaches. Is there an update on the timelines for the following features, mentioned in the documentation:

    "novostruct Uses paired end alignments to identify locations where the individual being sequenced is structurally different to the
    reference sequences. This could be inter sequence variations such as large insertions, deletions and inversions or inter sequence variations.

    Jul'08

    novoasm Using results from novoalign and novopair calls SNPs and short indels.
    ACE format output is provided for viewing of alignments.

    Aug '08

    novodensity Read density analysis for copy number, expression level and, peak detection.

    Aug '08"

    ?

    Thanks,

    Ryan
    Last edited by myrna; 08-11-2008, 12:39 PM.

    Comment

    • zee
      NGS specialist
      • Apr 2008
      • 249

      #17
      Hey Myrna,

      If you're interested in knowing more about what we're doing with SNP/Assembly, see http://www.novocraft.com/wiki/tiki-v...desc&forumId=1

      Comment

      • myrna
        Member
        • Feb 2008
        • 44

        #18
        Novocraft and Maq

        Thanks for the link, this was just what I needed. I will give the Novoalign->Eland->Maq conversion a try. What do you see as the largest problem/concern caused by the loss of mapping scores in doing this conversion? Do you think there would be some way to scale the Novoalign scores to Maq's mapping quality scale such that you could include them?

        Comment

        • zee
          NGS specialist
          • Apr 2008
          • 249

          #19
          This is an area we're trying to perfect at the moment. Basically you gotta know that novoalign mapping quality scores are meant to be as close to maq mapping qualities as we hope to get. Therefore scaling may not be necessary if we can show that low quality novoalign mapping qualities are the same as those for maq , and vice versa for maq.
          The .map file is the key here because it contains this information and we're neglecting these by using eland format Therefore it's crucial for us to go from the text format in novoalign to the maq format whilst keeping all that useful information.
          The good news is that because we're mapping more with novoalign you have more SNPs being called. We hope to have this format conversion with quality scores ready by next week.
          Perhaps you can send me a private msg and I can provide you with some charts showing how these mapping qualities compare between novoalign and maq??

          Comment

          • myrna
            Member
            • Feb 2008
            • 44

            #20
            novoalign2maq

            I would think that using the export file format as an intermediate (instead of the eland format) would allow you to get around the base (and mapping) quality issue. Heng Li, have you (or anyone else) attempted to convert novo* outputs into native Maq alignment files?

            Comment

            • zee
              NGS specialist
              • Apr 2008
              • 249

              #21
              Hey Myrna,
              It's ready to try out. Pls see

              Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

              Comment

              • sparks
                Senior Member
                • Mar 2008
                • 126

                #22
                Hi Myrna,
                We've aded a function to maq that converts native novo... report formats into maq map format. The source code is available in our forum. This conversion maintains the quality values and also converts gapped alignments, which is not possible if conversion is done from the Eland report format.
                With this conversion you can use maq to do the assemblies and call SNPs and Indels. You can even use maq indelpe on single end reads aligned by novoalign and then converted to maq.
                Our plans for our own assembly, SNP caller etc are running a bit behind.
                Cheers, Colin

                Comment

                • zee
                  NGS specialist
                  • Apr 2008
                  • 249

                  #23
                  Mulithreading now supported in NovoCraft Aligners

                  Multithreading has been added to novoalign and novopaired. The results look really good.

                  We ran some tests on our new multithreaded version to evaluate alignment performance on a small set of 200K Illumina reads versus the Human Genome NCBI36. The 200x36x37-071207_EAS51_0064-s_2_1.fastq and 200x36x37-071207_EAS51_0064-s_2_2.fastq FASTQ-formatted files were downloaded from the ftp://ftp.ncbi.nih.gov/pub/TraceDB/S...A000271/fastq/ FTP site. The first 200,000 reads in these files were used.
                  A linux server with eight 2.33 Ghz CPU Cores and 32Gb RAM were used. Time was monitored from the elapsed time figure in novopaired/novoalign output reports using UNIX tail.


                  CPU usage was monitored and it was found that using 8/8 cores didnt improve performance much over using 7/8 cores.

                  There appears to be a significant gain in performance of the multithreaded versions of novopaired and novoalign ( figure 1).



                  Table 1: Performance of multithreaded novoalign and novopaired on 200,000 Illumina reads searched against the NCBI36 Human Genome





                  Columns 4 and 5 are % of time taken with 1 CPU therefore 4 Cores takes 1/4 time of using 1 CPU, and 7 cores 14.8% (table 1). Each alignment process consumed at most 16.1Gb (52% RAM).

                  Comment

                  • bioinfosm
                    Senior Member
                    • Jan 2008
                    • 483

                    #24
                    I finally got to use novoalign and use novo2maq to make SNP calls. It seems the depth of coverage I see on SNP calls from novo aligned data is much lesser than that from MAQ.. almost 1/3.

                    Why would that be?

                    Of the 4 million reads in the lane, novoalign mapped only 1.6 million (all default params)
                    --
                    bioinfosm

                    Comment

                    • zee
                      NGS specialist
                      • Apr 2008
                      • 249

                      #25
                      Bioinfosm , that's interesting. I'd expect that you would firstly find more high mapping quality reads with novoalign and that would improve the depth. However, if it's doing the opposite then it is something we'll need to look at.

                      If you've run the same data with MAQ then I assume you're using fastq-formatted reads.
                      I'm interested to see what the `maq mapstat' output for the novoalign and maq .map files are.
                      Something else to look at is when you did novo2maq did it convert the headers correctly. This is easily checked with maq mapview.

                      Could you perhaps send me a tail of the novoalign output and version as well?


                      Originally posted by bioinfosm View Post
                      I finally got to use novoalign and use novo2maq to make SNP calls. It seems the depth of coverage I see on SNP calls from novo aligned data is much lesser than that from MAQ.. almost 1/3.

                      Why would that be?

                      Of the 4 million reads in the lane, novoalign mapped only 1.6 million (all default params)

                      Comment

                      • sparks
                        Senior Member
                        • Mar 2008
                        • 126

                        #26
                        Hi Bioinfosm,
                        Further to Zees request could you include a head of the novoalign output as well as the tail.

                        Can you email directly to support at novocraft dot com

                        Thanks, Colin

                        Comment

                        • bioinfosm
                          Senior Member
                          • Jan 2008
                          • 483

                          #27
                          Thanks for the response. There was something with the headers which I noticed and correcting that gave me a lot more reads mapped by novoalign compared to maq. However, the qualities of some of them are pretty low, along with lots of flags when looking at the mapstat output.

                          I will email that data to support for further analysis...
                          btw whats your homopolymer filter?
                          --
                          bioinfosm

                          Comment

                          • sparks
                            Senior Member
                            • Mar 2008
                            • 126

                            #28
                            The homopolymer filter picks up reads that are all A's or all C's etc. i.e. the same base called in every position in the read. Some Illumina read files have a significiant percentage of these. They can be caused by dust on the slide or by camera picking up the edge of a lane.

                            Comment

                            • sparks
                              Senior Member
                              • Mar 2008
                              • 126

                              #29
                              With regard the flag values, the novo2maq module was incorrectly setting paired end flags on single end reads. I've posted and updated source file in our support forum at www.novocraft.com

                              Comment

                              • myrna
                                Member
                                • Feb 2008
                                • 44

                                #30
                                Flags

                                Oh no! I was just reveling in the fact that novo2maq did set flags as paired in single end data. This has allowed me to run indelpe and find some very convincing indels. Not sure how many of them are real, but looking at the coverage a lot are convincing by eye. Without the ability to run indelpe, many of these sites are mistakenly called SNPs. Is there still an option to pull the indels from a novoalign output? I suppose as long as flag 130 is still set it should work fine. I understand the rationale that Maq only trusts indels from paired data (and only does gapped alignment when reads are anchored by a mate), but I would like to get Colin's opinion about whether we can trust indels from single end reads (and if so, what mapping quality thresholds?)

                                Thanks,

                                Ryan
                                Last edited by myrna; 09-13-2008, 07:02 AM.

                                Comment

                                Latest Articles

                                Collapse

                                • SEQadmin2
                                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                  by SEQadmin2


                                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                  ...
                                  06-02-2026, 10:05 AM
                                • SEQadmin2
                                  Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                  by SEQadmin2


                                  With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                  Introduction

                                  Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                  05-22-2026, 06:42 AM
                                • SEQadmin2
                                  Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                  by SEQadmin2

                                  Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                  Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                  05-06-2026, 09:04 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by SEQadmin2, Yesterday, 08:59 AM
                                0 responses
                                14 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-02-2026, 12:03 PM
                                0 responses
                                22 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-02-2026, 11:40 AM
                                0 responses
                                19 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 05-28-2026, 11:40 AM
                                0 responses
                                32 views
                                0 reactions
                                Last Post SEQadmin2  
                                Working...