Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • MapView: a viewer for short reads alignment

    MapView 3.4.1 (Developed for Windows. But you can also run it on linux use mono : http://www.mono-project.com/)


    MapView: visualization of short reads alignment on a desktop computer. Bioinformatics, 2009, 25 (12) : 1554-1555



    Download: http://evolution.sysu.edu.cn/mapview/

    Visualization of next-generation huge amount of alignment data on desktop computer presents many informatics challenges. The great majority of alignment viewers were designed for loading and processing big assembly file in the ACE format. This memory based design requires huge amount of memory (>10G) not typically available to desktop computer users.

    We introduce a new visual analytics tool MapView to facilitate visualization of large-scale short reads alignment data and genetic variation analysis. MapView can handle hundreds of millions of short reads on desktop computer with limited memory. We developed a novel binary file format and fast loading algorithm for superfast (<2s) and memory efficiency (<60M) visualization of huge amount of short reads alignmnet. Moreover, MapView is well established for its multitasking and multithreading. It can process multiple tasks (i.e. SNP detection on whole-genome scale, coverage computation and visualization of alignment) in parallel.

    Windows:


    Linux:


    For Linux, e.g. Ubuntu:

    sudo apt-get install libmono-winforms2.0-cil

    mono MapView.exe




    Computational efficiency comparison



    INPUT
    1. single-end reads
    Preparing the reference sequence in Fasta format and text-based alignment results file(output by Eland, Maq, SOAP, MapNext, SeqMap … …).

    (1) If the reference file and alignment results file contains only one reference sequence id, then you can click MVFMaker to input these 2 files and make a MVF format file. Then you can select the MVF file to view alignments and SNPs.
    (2) If the reference file or alignment results file contains multiple reference sequence id, then you can click Splitter to split the file into multiple files and one file only contains only one reference id. Then you can follow (1).
    (3) If the alignment results file are not Eland, Maq, SOAP or MapNext format, then you can define the format in the file MVFmaker_NewFormat.txt. And when you input reference and alignment results to make MVF file, you could select format <User-defined> (which you defined in the file MVFmaker_NewFormat.txt).

    2. Pair-end reads
    MapView has preliminary support for pair-end reads. Preparing the reference sequence in Fasta format and pair-end alignment results file (one or two files) from SOAP.(Note: Current version of MapView only support for SOAP’s pair-end output files).

    1. Click MVFmaker and choose the checkbox of Paired-end data.
    2. Upload the paired alignment results file and the reference file. You can also upload the unpaired alignment data (output by SOAP) at the same time (optional).
    3. Click "Save as" button, and specify the .MVF file name.
    4. Go back to the main manual and click "Open MVF file" to upload the MVF file you just generated.

    3. About text-based alignment results file

    (1) MAQ
    8:18:1354:1553 chr1 597 - 0 0 99 99 99 0 01 0 35 GGTGGGACCGTTCGTGAAGGCTGGCCCATTGAGGA GGBFPOLQOPLHYRDOCLYM`OO^VSP``YR`_]T
    8:22:173:821 chr1 2597 - 0 0 99 99 99 0 0 10 35 GGTGGGACCGTTCGTGAAGGCTGGCCCATTGAGGA NHELOUPVTZUT_WU^```````````````````
    8:29:309:1409 chr1 5397 - 0 0 99 99 99 0 01 0 35 GGTGGGACCGTTCGTGAAGGCTGGCCCATTGAGGA HLHOBQCOOEGTP]LNJXVX`````J`````````

    (2) SOAP
    8:1:3:1697 GTCTAGATATCGCACAATCTTNAATCTTTAAAATG hhhhhhhhhhhhhhhhhhhhh;hhhhhhhhhhhhh 1 a 35 - chr1 1266 0
    8:1:3:1804 CCTAGGGTTGATTTAGAAACGNGAGCATTTTGTTG hhhhb^hhhhhhhhhhhhhhh;hhhhhhhhhhhhh 1 a 35 - chr1 208 0
    8:1:3:1247 AGGTTAATCCTGCNTATACATGCAGCTCTAATTCA hhhhhhhhhhhhh;hhhhhhhhhhhhhhhhhhh^h 1 a 35 + chr1 122 0

    (3) ELAND
    I326_2_FC306FCAAXX:8:1:211:212 GATTTATCATGTAAGAGGTTGTCATTCAGAATGGT U0 1 0 0 chr1 4 R
    I326_2_FC306FCAAXX:8:1:913:158 GAAAAAGGAACCTCTGCAGACATATCATGCAACTG U0 1 0 0 chr1 164 R
    I326_2_FC306FCAAXX:8:1:183:435 GGGAGATTCCCATATCTTTTCCACTTTCCTCTTCC U0 1 0 0 chr1 530 F

    (4)User-defined
    R A Q C P S x <User-defined> 1 0
    [Using TAB as a separator; Symbolic meaning as follows] x: (neglected / pass over) R: Read ID A: ATGC... P: Position Q: Quality score(maybe x) S: Strand(F/R or +/-) C: Chromosome <FormatName> Sort(0/1) Reverse(0/1)
    Sort 1 means the alignment position is not sorted. So MapView will sort.
    Reverse 1 means the read sequence (-) must be complementary reverse when display.

    NOTE: MapView 3.1.2 only support for ungapped alignment.

    HELP
    1.Main window
    Click the nucleotide on short read:
    P: Position on reference sequence
    Q: Quality score of the nucleotide you clicked
    PP: Pair read alignment positon
    PD: Pair distance

    Click the nucleotide on reference sequence:
    Count of A,C,G,T and N
    Coverage information
    Variant frequency

    2.Quality score
    Solexa quality score: ASCII code-64
    For example 'h' means quality score:40.
    Phred quality score: ASCII code-33
    For example 'I' means quality score:40.

    3.MVR file
    The file of SNPs list.

    4.SNP detection

    The SNP detection will look at each position in the contig to determine if there is a SNP at this position. In order to make a qualified and significant assessment, it needs three thresholds:

    (1). Minimum quality of central base. Bases with a quality score below this value are not considered in the SNP calculation at this position.

    (2). Minimum coverage. If SNPs were called in areas of low coverage, you would get a higher amount of false positives. Therefore you can set the minimum coverage for a SNP to be called. Note that the coverage is counted as the number of valid reads at the current position (i.e. the reads remaining when the quality assessment has filtered out the bad ones).

    (3). Minimum variant frequency. If only one read has a variant base, you probably do not want this to count as a SNP. This threshold is used to determine the minimum frequency for a variant to be called a SNP. Per default, the value is set to 0.4, which means that there should be a variant base in at least 40% of the bases in the valid reads before a SNP is called. Note that if you have two different variants with each having e.g. 20% frequency, it will not be counted as a SNP. If you sequence diploid genomes, you may have to lower this value to detect all SNPs.
    Last edited by baohua100; 06-05-2009, 05:54 PM. Reason: 3.4.0 version

  • #2
    if you have any problems or suggestions, please reply here!

    Comment


    • #3
      MapView sounds usefull.
      I find a bug: if the reference file is a fasta format one with 60 nt one line, mapview will add one more nt after each line, and the alignment view will be in disorder. I change the reference sequence into one line, that is to say, all nucleotides are in one line, then the view is correct.

      Comment


      • #4
        Is it still useful for human data? Loading the entire human refseq, or you suggest that I rather upload just a single chromosome and view it..

        I have mate-pair data that I wish to visualize; any changes when using mate-pair reads? Can I upload the 2 reads files?

        Thanks
        --
        bioinfosm

        Comment


        • #5
          As well as the paired-end read Q above, does mapview cope with gapped alignments in each read? (e.g. where maqview, developed alongside maq package, does not).
          I would also be interested to hear from Heng Li re SAMtools viewer if any developments.

          Comment


          • #6
            The reference sequence and alignment result should be contain only one ref seq id. So you'd better load only one chromosome. if your reference sequence or alignment file contain multiple reference seq id , you could use spliter to split the file. Then view separately!

            Now we develop the function for viewing pair end data.

            Comment


            • #7
              To dvh:

              The viewer that comes with samtools displays gapped alignment. It shows whether the read is an orphan but does not show pairing information directly. For now, I do not have any plan to implement a fancy alignment viewer. The current viewer is just in ~300 lines of C code based on the samtools library and I want to keep as it is. I would appreciate if someone else would like to implement a nice viewer. A good viewer will benefit the community and the samtools project as well. In addition, you may also keep an eye on the GAP5 development.

              To baohua100:

              I think it would be nice if an alignment viewer may have the following features:

              * scalability. It should work with huge alignment with limited memory (e.g. 10~100GB compressed alignment)

              * portability. At least here at where I am working, Linux and Mac dominate. A Windows-only application would push away many potential users.

              * efficiency over network. We prefer to put huge alignments on a supercomputer or a large cluster while viewing the alignment on a small personal desktop or laptop. This will require transfer alignment data/graphics over network. It would be nice to have a built-in server-client mode or alternatively support X11.

              * usability. you can learn this from those main-stream assembly viewers such as consed, hawkeye, eagleview and staden/gap4.

              Comment


              • #8
                To baofua100,

                This must be useful !

                I tried to use Mapview but I couldn't understand how to use it.

                I'll be happy if you make the tutorial or the manual.

                Besides, let me ask some questions.

                1. Can map files of MAQ be imported to Mapview directly ?
                2. Should I prepare the ref. sequence as a fasta format ?

                Comment


                • #9
                  to yasutake

                  you should prepare ref.fasta and map file, and click MVFmaker to make a MVF file, then you select this MVF file to view alignment results.

                  We are writing the manual now and will put it on the website. And I updated the software and add a guide for how to input files.

                  You can download from http://evolution.sysu.edu.cn/software/mapview.rar
                  Last edited by baohua100; 01-16-2009, 01:12 AM.

                  Comment


                  • #10
                    To lh3

                    thanks for your suggestions!

                    Comment


                    • #11
                      Just to clarify regarding maq format files.

                      You need to run "maq mapview" and then use the resultant mapview files along with the reference sequence in fasta format to build the mapview MVF format file in the MVFMaker. The rest of the interaction is pretty intuitive. Arrow keys move around and the "Fast Positioning" box lets you enter a base position to jump to.

                      I have had to have a look at mapview as I have a scientist who uses only a PC and I couldnt get maqview-0.2.4 to build under cygwin on the PC.

                      If there was a cross-platform viewer as Heng Li has said that would be great, and would go some way to alleviating my issues with supporting scientists and their data analysis.

                      Comment


                      • #12
                        Sorry - I hit send too soon.

                        From my maq mapview data mapview is not showing the quality scores associated with each base. This is quite important to see, especially when evaluating SNPs.

                        Comment


                        • #13
                          Feature request - Zoom-out

                          There is no zoom feature on mapview tool? (apart from quality information as in previous post)
                          --
                          bioinfosm

                          Comment


                          • #14
                            Paired end data

                            I am also interested, this looks very useful as many people have had issues lately with recent build from maqview for maq. And this looks like very user friendly tool, although displaying paired end data is critical for these short reads as it really helps the confidence levels of the data alignments.

                            My question is Does this support Paired end alignments generated by MAQ yet?

                            thank you
                            vince
                            Originally posted by bioinfosm View Post
                            Is it still useful for human data? Loading the entire human refseq, or you suggest that I rather upload just a single chromosome and view it..

                            I have mate-pair data that I wish to visualize; any changes when using mate-pair reads? Can I upload the 2 reads files?

                            Thanks

                            Comment


                            • #15
                              MapView looks great, but I'm having trouble loading data into it. I used MVFMaker tool and loaded in the reference genome I used to align my data to and the .map file that I obtained from MAQ output (using the easyrun command).

                              However, when I load the MVF file into MapView, an error message appears: "No Reads in MVF!". Is there a specific setting I should have in the "Single Line Format" of the MVFMaker tool? I have been using the format with "maq" in the form since I used MAQ to build the .map file.

                              Any help anyone can provide would be very appreciated.

                              Thanks!

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Exploring the Dynamics of the Tumor Microenvironment
                                by seqadmin




                                The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
                                07-08-2024, 03:19 PM
                              • seqadmin
                                Exploring Human Diversity Through Large-Scale Omics
                                by seqadmin


                                In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
                                06-25-2024, 06:43 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Today, 11:09 AM
                              0 responses
                              16 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 07-19-2024, 07:20 AM
                              0 responses
                              148 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 07-16-2024, 05:49 AM
                              0 responses
                              121 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 07-15-2024, 06:53 AM
                              0 responses
                              111 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X