Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • baohua100
    Senior Member
    • Jun 2008
    • 103

    MapView: a viewer for short reads alignment

    MapView 3.4.1 (Developed for Windows. But you can also run it on linux use mono : http://www.mono-project.com/)


    MapView: visualization of short reads alignment on a desktop computer. Bioinformatics, 2009, 25 (12) : 1554-1555



    Download: http://evolution.sysu.edu.cn/mapview/

    Visualization of next-generation huge amount of alignment data on desktop computer presents many informatics challenges. The great majority of alignment viewers were designed for loading and processing big assembly file in the ACE format. This memory based design requires huge amount of memory (>10G) not typically available to desktop computer users.

    We introduce a new visual analytics tool MapView to facilitate visualization of large-scale short reads alignment data and genetic variation analysis. MapView can handle hundreds of millions of short reads on desktop computer with limited memory. We developed a novel binary file format and fast loading algorithm for superfast (<2s) and memory efficiency (<60M) visualization of huge amount of short reads alignmnet. Moreover, MapView is well established for its multitasking and multithreading. It can process multiple tasks (i.e. SNP detection on whole-genome scale, coverage computation and visualization of alignment) in parallel.

    Windows:


    Linux:


    For Linux, e.g. Ubuntu:

    sudo apt-get install libmono-winforms2.0-cil

    mono MapView.exe




    Computational efficiency comparison



    INPUT
    1. single-end reads
    Preparing the reference sequence in Fasta format and text-based alignment results file(output by Eland, Maq, SOAP, MapNext, SeqMap … …).

    (1) If the reference file and alignment results file contains only one reference sequence id, then you can click MVFMaker to input these 2 files and make a MVF format file. Then you can select the MVF file to view alignments and SNPs.
    (2) If the reference file or alignment results file contains multiple reference sequence id, then you can click Splitter to split the file into multiple files and one file only contains only one reference id. Then you can follow (1).
    (3) If the alignment results file are not Eland, Maq, SOAP or MapNext format, then you can define the format in the file MVFmaker_NewFormat.txt. And when you input reference and alignment results to make MVF file, you could select format <User-defined> (which you defined in the file MVFmaker_NewFormat.txt).

    2. Pair-end reads
    MapView has preliminary support for pair-end reads. Preparing the reference sequence in Fasta format and pair-end alignment results file (one or two files) from SOAP.(Note: Current version of MapView only support for SOAP’s pair-end output files).

    1. Click MVFmaker and choose the checkbox of Paired-end data.
    2. Upload the paired alignment results file and the reference file. You can also upload the unpaired alignment data (output by SOAP) at the same time (optional).
    3. Click "Save as" button, and specify the .MVF file name.
    4. Go back to the main manual and click "Open MVF file" to upload the MVF file you just generated.

    3. About text-based alignment results file

    (1) MAQ
    8:18:1354:1553 chr1 597 - 0 0 99 99 99 0 01 0 35 GGTGGGACCGTTCGTGAAGGCTGGCCCATTGAGGA GGBFPOLQOPLHYRDOCLYM`OO^VSP``YR`_]T
    8:22:173:821 chr1 2597 - 0 0 99 99 99 0 0 10 35 GGTGGGACCGTTCGTGAAGGCTGGCCCATTGAGGA NHELOUPVTZUT_WU^```````````````````
    8:29:309:1409 chr1 5397 - 0 0 99 99 99 0 01 0 35 GGTGGGACCGTTCGTGAAGGCTGGCCCATTGAGGA HLHOBQCOOEGTP]LNJXVX`````J`````````

    (2) SOAP
    8:1:3:1697 GTCTAGATATCGCACAATCTTNAATCTTTAAAATG hhhhhhhhhhhhhhhhhhhhh;hhhhhhhhhhhhh 1 a 35 - chr1 1266 0
    8:1:3:1804 CCTAGGGTTGATTTAGAAACGNGAGCATTTTGTTG hhhhb^hhhhhhhhhhhhhhh;hhhhhhhhhhhhh 1 a 35 - chr1 208 0
    8:1:3:1247 AGGTTAATCCTGCNTATACATGCAGCTCTAATTCA hhhhhhhhhhhhh;hhhhhhhhhhhhhhhhhhh^h 1 a 35 + chr1 122 0

    (3) ELAND
    I326_2_FC306FCAAXX:8:1:211:212 GATTTATCATGTAAGAGGTTGTCATTCAGAATGGT U0 1 0 0 chr1 4 R
    I326_2_FC306FCAAXX:8:1:913:158 GAAAAAGGAACCTCTGCAGACATATCATGCAACTG U0 1 0 0 chr1 164 R
    I326_2_FC306FCAAXX:8:1:183:435 GGGAGATTCCCATATCTTTTCCACTTTCCTCTTCC U0 1 0 0 chr1 530 F

    (4)User-defined
    R A Q C P S x <User-defined> 1 0
    [Using TAB as a separator; Symbolic meaning as follows] x: (neglected / pass over) R: Read ID A: ATGC... P: Position Q: Quality score(maybe x) S: Strand(F/R or +/-) C: Chromosome <FormatName> Sort(0/1) Reverse(0/1)
    Sort 1 means the alignment position is not sorted. So MapView will sort.
    Reverse 1 means the read sequence (-) must be complementary reverse when display.

    NOTE: MapView 3.1.2 only support for ungapped alignment.

    HELP
    1.Main window
    Click the nucleotide on short read:
    P: Position on reference sequence
    Q: Quality score of the nucleotide you clicked
    PP: Pair read alignment positon
    PD: Pair distance

    Click the nucleotide on reference sequence:
    Count of A,C,G,T and N
    Coverage information
    Variant frequency

    2.Quality score
    Solexa quality score: ASCII code-64
    For example 'h' means quality score:40.
    Phred quality score: ASCII code-33
    For example 'I' means quality score:40.

    3.MVR file
    The file of SNPs list.

    4.SNP detection

    The SNP detection will look at each position in the contig to determine if there is a SNP at this position. In order to make a qualified and significant assessment, it needs three thresholds:

    (1). Minimum quality of central base. Bases with a quality score below this value are not considered in the SNP calculation at this position.

    (2). Minimum coverage. If SNPs were called in areas of low coverage, you would get a higher amount of false positives. Therefore you can set the minimum coverage for a SNP to be called. Note that the coverage is counted as the number of valid reads at the current position (i.e. the reads remaining when the quality assessment has filtered out the bad ones).

    (3). Minimum variant frequency. If only one read has a variant base, you probably do not want this to count as a SNP. This threshold is used to determine the minimum frequency for a variant to be called a SNP. Per default, the value is set to 0.4, which means that there should be a variant base in at least 40% of the bases in the valid reads before a SNP is called. Note that if you have two different variants with each having e.g. 20% frequency, it will not be counted as a SNP. If you sequence diploid genomes, you may have to lower this value to detect all SNPs.
    Last edited by baohua100; 06-05-2009, 05:54 PM. Reason: 3.4.0 version
  • baohua100
    Senior Member
    • Jun 2008
    • 103

    #2
    if you have any problems or suggestions, please reply here!

    Comment

    • jfshao1984
      Junior Member
      • Mar 2008
      • 5

      #3
      MapView sounds usefull.
      I find a bug: if the reference file is a fasta format one with 60 nt one line, mapview will add one more nt after each line, and the alignment view will be in disorder. I change the reference sequence into one line, that is to say, all nucleotides are in one line, then the view is correct.

      Comment

      • bioinfosm
        Senior Member
        • Jan 2008
        • 483

        #4
        Is it still useful for human data? Loading the entire human refseq, or you suggest that I rather upload just a single chromosome and view it..

        I have mate-pair data that I wish to visualize; any changes when using mate-pair reads? Can I upload the 2 reads files?

        Thanks
        --
        bioinfosm

        Comment

        • dvh
          Member
          • Jul 2008
          • 35

          #5
          As well as the paired-end read Q above, does mapview cope with gapped alignments in each read? (e.g. where maqview, developed alongside maq package, does not).
          I would also be interested to hear from Heng Li re SAMtools viewer if any developments.

          Comment

          • baohua100
            Senior Member
            • Jun 2008
            • 103

            #6
            The reference sequence and alignment result should be contain only one ref seq id. So you'd better load only one chromosome. if your reference sequence or alignment file contain multiple reference seq id , you could use spliter to split the file. Then view separately!

            Now we develop the function for viewing pair end data.

            Comment

            • lh3
              Senior Member
              • Feb 2008
              • 686

              #7
              To dvh:

              The viewer that comes with samtools displays gapped alignment. It shows whether the read is an orphan but does not show pairing information directly. For now, I do not have any plan to implement a fancy alignment viewer. The current viewer is just in ~300 lines of C code based on the samtools library and I want to keep as it is. I would appreciate if someone else would like to implement a nice viewer. A good viewer will benefit the community and the samtools project as well. In addition, you may also keep an eye on the GAP5 development.

              To baohua100:

              I think it would be nice if an alignment viewer may have the following features:

              * scalability. It should work with huge alignment with limited memory (e.g. 10~100GB compressed alignment)

              * portability. At least here at where I am working, Linux and Mac dominate. A Windows-only application would push away many potential users.

              * efficiency over network. We prefer to put huge alignments on a supercomputer or a large cluster while viewing the alignment on a small personal desktop or laptop. This will require transfer alignment data/graphics over network. It would be nice to have a built-in server-client mode or alternatively support X11.

              * usability. you can learn this from those main-stream assembly viewers such as consed, hawkeye, eagleview and staden/gap4.

              Comment

              • yasutake
                Member
                • Sep 2008
                • 11

                #8
                To baofua100,

                This must be useful !

                I tried to use Mapview but I couldn't understand how to use it.

                I'll be happy if you make the tutorial or the manual.

                Besides, let me ask some questions.

                1. Can map files of MAQ be imported to Mapview directly ?
                2. Should I prepare the ref. sequence as a fasta format ?

                Comment

                • baohua100
                  Senior Member
                  • Jun 2008
                  • 103

                  #9
                  to yasutake

                  you should prepare ref.fasta and map file, and click MVFmaker to make a MVF file, then you select this MVF file to view alignment results.

                  We are writing the manual now and will put it on the website. And I updated the software and add a guide for how to input files.

                  You can download from http://evolution.sysu.edu.cn/software/mapview.rar
                  Last edited by baohua100; 01-16-2009, 01:12 AM.

                  Comment

                  • baohua100
                    Senior Member
                    • Jun 2008
                    • 103

                    #10
                    To lh3

                    thanks for your suggestions!

                    Comment

                    • Aengus
                      Junior Member
                      • Sep 2008
                      • 6

                      #11
                      Just to clarify regarding maq format files.

                      You need to run "maq mapview" and then use the resultant mapview files along with the reference sequence in fasta format to build the mapview MVF format file in the MVFMaker. The rest of the interaction is pretty intuitive. Arrow keys move around and the "Fast Positioning" box lets you enter a base position to jump to.

                      I have had to have a look at mapview as I have a scientist who uses only a PC and I couldnt get maqview-0.2.4 to build under cygwin on the PC.

                      If there was a cross-platform viewer as Heng Li has said that would be great, and would go some way to alleviating my issues with supporting scientists and their data analysis.

                      Comment

                      • Aengus
                        Junior Member
                        • Sep 2008
                        • 6

                        #12
                        Sorry - I hit send too soon.

                        From my maq mapview data mapview is not showing the quality scores associated with each base. This is quite important to see, especially when evaluating SNPs.

                        Comment

                        • bioinfosm
                          Senior Member
                          • Jan 2008
                          • 483

                          #13
                          Feature request - Zoom-out

                          There is no zoom feature on mapview tool? (apart from quality information as in previous post)
                          --
                          bioinfosm

                          Comment

                          • Vince_Funari
                            Junior Member
                            • Jan 2009
                            • 2

                            #14
                            Paired end data

                            I am also interested, this looks very useful as many people have had issues lately with recent build from maqview for maq. And this looks like very user friendly tool, although displaying paired end data is critical for these short reads as it really helps the confidence levels of the data alignments.

                            My question is Does this support Paired end alignments generated by MAQ yet?

                            thank you
                            vince
                            Originally posted by bioinfosm View Post
                            Is it still useful for human data? Loading the entire human refseq, or you suggest that I rather upload just a single chromosome and view it..

                            I have mate-pair data that I wish to visualize; any changes when using mate-pair reads? Can I upload the 2 reads files?

                            Thanks

                            Comment

                            • unionicola
                              Junior Member
                              • Feb 2009
                              • 4

                              #15
                              MapView looks great, but I'm having trouble loading data into it. I used MVFMaker tool and loaded in the reference genome I used to align my data to and the .map file that I obtained from MAQ output (using the easyrun command).

                              However, when I load the MVF file into MapView, an error message appears: "No Reads in MVF!". Is there a specific setting I should have in the "Single Line Format" of the MVFMaker tool? I have been using the format with "maq" in the form since I used MAQ to build the .map file.

                              Any help anyone can provide would be very appreciated.

                              Thanks!

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM
                              • SEQadmin2
                                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                by SEQadmin2


                                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                Introduction

                                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                05-22-2026, 06:42 AM
                              • SEQadmin2
                                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                by SEQadmin2

                                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                05-06-2026, 09:04 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Today, 08:59 AM
                              0 responses
                              8 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              21 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 11:40 AM
                              0 responses
                              17 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-28-2026, 11:40 AM
                              0 responses
                              29 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...