Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Hi,
    I'm wondering if there's a way to convert from SAM to Maq's map format. I'd like to do this because I want to use Maq's simulate command to generate a set of simulated reads, align them (using an aligner other than Maq), output to SAM, then convert the SAM file to .map for use with Maq's simustat command. This way, I can get statistics for how well the other aligner aligned Maq's simulated reads. Does such a tool exist to make this possible?
    Thanks!

    Comment


    • You can use wgsim from SAM tools which as far as I understand is the "son" of mac simulate.

      Comment


      • I prefer Maq's simulate over wgsim because, as far as I understand, wgsim generates errors in the reads uniformly, whereas Maq uses the quality scores of a set of actual reads that must be provided to Maq as a training set. Therefore, in addition to being less realistic, wgsim produces output that actually creates an inherent disadvantage for certain aligners (e.g. bowtie) that assume (usually correctly) that errors are not uniformly distributed, but that their incidence rises from the first base in the read to the last. Consequently, using wgsim as a means of generating simulated reads for alignment software comparison will not produce results that are as accurate as would be obtained by using Maq's simulate.

        This brings me back to my original question: Is there a way to convert from SAM to Maq's .map?

        Comment


        • You may simulate reads with "maq simulate" and evaluate the accuracy with "wgsim_eval.pl" given the SAM format. I think wgsim_eval.pl works with reads simulated with maq. Even if I am wrong, you may modify wgsim_eval.pl. There should not be much changes and it is a simple Perl script.

          Comment


          • Ah, I didn't even think to use wgsim_eval.pl; it hadn't occurred to me that I could use it without using wgsim. I just ran: maq2sam-long simu_align.map | perl wgsim_eval.pl. It worked perfectly, and gave me the same results as maq simustat simu_align.map.

            Thank you very much!

            Comment


            • Originally posted by jmj1091 View Post
              I prefer Maq's simulate over wgsim because, as far as I understand, wgsim generates errors in the reads uniformly, whereas Maq uses the quality scores of a set of actual reads that must be provided to Maq as a training set. Therefore, in addition to being less realistic, wgsim produces output that actually creates an inherent disadvantage for certain aligners (e.g. bowtie) that assume (usually correctly) that errors are not uniformly distributed, but that their incidence rises from the first base in the read to the last. Consequently, using wgsim as a means of generating simulated reads for alignment software comparison will not produce results that are as accurate as would be obtained by using Maq's simulate.

              This brings me back to my original question: Is there a way to convert from SAM to Maq's .map?
              FYI, the modified version of wgsim in DNAA uses per base error rates that can be input to the program.

              In wgsim in samtools does not fairly represent color space data, since it is made for BWA. In short, BWA ignores the adapter and first base in color space. When wgsim simulates a color space read, it does not give the read with an adapter and then "n" color calls, but instead it creates an "n+1" length read, trims the adapter and first color, and gives a "n" length color read, so that BWA does not need to trim the adapter and first color. Therefore when user specifies a 50bp color space read, of which BWA should only be able to use 49bp, it outputs it so that BWA can use all 50.

              I adapted wgsim into one of my own projects (http://dnaa.sourceforge.net) to correct this problem, as well as to make some of my own customizations, including the ability to give per base error rates instead of a uniform error rate.

              Nils

              Comment


              • SAM to MAQ converter

                Dear All,
                it might seem a strange question, but since many tools for detecting structural variants, such as BreakDancer and GASV, rely on a MAQ alignment file, I was wondering if anyone of you has already written a SAM to MAQ file converter.

                Thank you in advance.

                Comment


                • Hi,

                  samtools-0.1.6_x86_64-linux; precompiled version downloaded today

                  $ bowtie --version
                  bowtie version 0.11.3
                  64-bit
                  Built on myserver
                  Fri Oct 23 13:27:05 MDT 2009
                  Compiler: gcc version 4.1.2 20070115 (prerelease) (SUSE Linux)
                  Options: -O3
                  Sizeof {int, long, long long, void*, size_t, off_t}: {4, 8, 8, 8, 8, 8}

                  $ samtools faidx hs_ref_chr10.fa
                  $ cat hs_ref_chr10.fa.fai
                  gi|89161187|ref|NC_000010.9|NC_000010 135374737 105 70 71

                  Created sam format with bowtie in two ways (behavior described below is the same for both methods):
                  1. using -S in bowtie command
                  2. using samtools bowtie2sam.pl on a bowtie --refout map file

                  sam file from method 2 (chromosome 10 only):

                  $ cat ref00000.map.sam
                  @0-0-3-9833 0 gi|89161187|ref|NC_000010.9|NC_000010 62380535 0 15M * 0 0 GCAAAGGNNATCATT IIIIIIIIIIIIIII NM:i:1 X1:i:5 MD:Z:3A3N0N6
                  @0-0-3-9833 16 gi|89161187|ref|NC_000010.9|NC_000010 62382480 0 15M * 0 0 GGGCTANNGCTCATC IIIIIIIIIIIIIII NM:i:1 X1:i:5 MD:Z:7N0N6
                  @0-0-6-12817 0 gi|89161187|ref|NC_000010.9|NC_000010 6095909 0 15M * 0 0 TACCACCNNGCCCTT IIIIIIIIIIIIIII NM:i:1 X1:i:265 MD:Z:1A5N0N6
                  @0-0-6-12817 16 gi|89161187|ref|NC_000010.9|NC_000010 6097174 0 15M * 0 0 GCATCANNCTCCCGA IIIIIIIIIIIIIII NM:i:1 X1:i:265 MD:Z:7N0N6


                  $ samtools view -bt ~/work/hs_ref_chr/hs_ref_chr10.fa.fai -o out.bam ref00000.map.sam
                  [sam_header_read2] 1 sequences loaded.
                  [sam_read1] reference '16 gi|89161187|ref|NC_000010.9|NC_000010 6097174 0 15M * 0 0 GCATCANNCTCCCGA IIIIIIIIIIIIIII NM:i:1 X1:i:265MD:Z:7N0N6

                  ' is recognized as '*'.
                  [main_samview] truncated file.

                  out.bam is created, but I cannot do anything further with it.

                  Thanks.
                  Susan

                  Comment


                  • File with the Pileup format was generated by the command below:
                    samtools pileup -f ref.fas reads.aln.sorted.bam > reads.pileup

                    Here are partial results:
                    chr1 7 C 1 ^~. B
                    chr1 8 A 1 . C
                    chr1 9 A 1 . C
                    chr1 10 A 1 . C
                    chr1 11 G 1 . A
                    chr1 12 C 1 . B
                    chr1 13 C 1 . B
                    chr1 14 A 1 . C
                    chr1 15 A 1 . C
                    chr1 16 A 1 . B

                    But when I ran:
                    samtools.pl varFilter test -d 3

                    error was shown as:
                    Use of uninitialized value in numeric lt (<) at /mnt/01/liu3zhen/analysisTool/SAMtools/samtools-0.1.6_x86_64-linux/samtools.pl line 78.


                    pileup file only contain 6 columns. In the samtools.pl, at least 8 columns are required for the command of line 78.


                    My questions are:

                    1. Is the pileup output correct?
                    2. Am I understanding corretly about code of line 78?

                    Thank you very much.

                    Comment


                    • Originally posted by liu3zhen View Post
                      File with the Pileup format was generated by the command below:
                      samtools pileup -f ref.fas reads.aln.sorted.bam > reads.pileup

                      Here are partial results:
                      chr1 7 C 1 ^~. B
                      chr1 8 A 1 . C
                      chr1 9 A 1 . C
                      chr1 10 A 1 . C
                      chr1 11 G 1 . A
                      chr1 12 C 1 . B
                      chr1 13 C 1 . B
                      chr1 14 A 1 . C
                      chr1 15 A 1 . C
                      chr1 16 A 1 . B

                      But when I ran:
                      samtools.pl varFilter test -d 3

                      error was shown as:
                      Use of uninitialized value in numeric lt (<) at /mnt/01/liu3zhen/analysisTool/SAMtools/samtools-0.1.6_x86_64-linux/samtools.pl line 78.


                      pileup file only contain 6 columns. In the samtools.pl, at least 8 columns are required for the command of line 78.


                      My questions are:

                      1. Is the pileup output correct?
                      2. Am I understanding corretly about code of line 78?

                      Thank you very much.
                      I know what's going on here. I have use -cv for the pileup step to report variations. Thanks.

                      Comment


                      • Hi,
                        I have a question about how samtools pileup generates the consensus sequence. I recently ran an alignment that output mappings only when there were no mismatches between the read and the reference. I proceeded to generate the pileup, and noticed that, although there are no mismatches throughout, there are multiple positions where the consensus sequence differs from the reference. tview confirmed that, at these positions, all of the reads mapped agree with the reference, yet the consensus base is different! I did notice that the reads' bases were of low Phred quality at those locations, and I was wondering if perhaps pileup occasionally mutates the consensus base if its quality score is sufficiently low. I tried looking through the source code, but I couldn't figure it out. Is this a bug?
                        Thanks!

                        Comment


                        • SAM format versions?

                          Can some one help me understand if there are any versions in SAM format. Cufflinks program works well with SAM format output by TOPHAT program but fails when the SAM output comes from Bowtie or bwa. Did anyone had encountered this before, if so please suggest.

                          Comment


                          • @jmj1091

                            This is a known issue and only affects repetitive regions. It is annoying, I admit. In the SVN version of samtools, I implemented a simplified version of SOAPsnp model which does not have this issue.

                            @bekkari

                            It is likely that cufflinks is using some optional tags specific to tophat but not available in all SAM files.

                            Comment


                            • Heng,
                              Thanks for your help! I checked out the latest revision from the svn repository, and, indeed, the new pileup (as generated with the -a flag) does not have the disagreements.

                              Comment


                              • Hi
                                I amnot able to use picard.
                                I get the following error.

                                Code:
                                java picard-1.07.jar -h
                                Exception in thread "main" java.lang.NoClassDefFoundError: picard-1/07/jar
                                Thanks.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Essential Discoveries and Tools in Epitranscriptomics
                                  by seqadmin




                                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                  04-22-2024, 07:01 AM
                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-25-2024, 11:49 AM
                                0 responses
                                19 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-24-2024, 08:47 AM
                                0 responses
                                18 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                62 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                60 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X