Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #61
    I'm not sure if it's the case here, but I've noticed the CIGAR string has major issues if you attempt to include gaps in the clipped sequence.

    Or rather CIGAR works fine I assume, but samtools does not. (It's not really a big issue as the only time I've seen this happen is someone manually trimming an alignment back.)

    Comment


    • #62
      Originally posted by zee View Post
      Is there a way to convert a SAM consensus output (using -c option for pileup) to the old maq-style .cns consensus?

      I have some maq-based pipelines I would like to use on my BWA results.
      maybe it's related.

      Is possible get the consensus sequence in a simple fasta format with SAMtools?

      Comment


      • #63
        I tried using the -c option,bt the pileup output is same evn widout this option! I gave d command smfink like dis:


        samtools pileup -f ref.fasta aln_sorted.bam -s -c -v >test.pileup

        Let me know wher I m gng Wrong!

        Comment


        • #64
          ok! So i knw where i ws gng wrong...
          the .aln file shud be put in last after all d options.

          Comment


          • #65
            samtools.pl now updated at SVN:

            Download SAM tools for free. SAM (Sequence Alignment/Map) is a flexible generic format for storing nucleotide sequence alignment. SAMtools provide efficient utilities on manipulating alignments in the SAM format.


            pileup2fq is implemented, similar to maq's cns2fq. Please note that samtools.pl filters based on the RMS mapping quality (-Q) while maq's cns2fq filters on the maximum mapping quality. Also, pileup2fq masks a small region around an potential indel, but maq's cns2fq does not. The overall accuracy looks similar to maq, though.

            Comment


            • #66
              Thanks Heng. I will try and let you know if I get stuck in something.

              Comment


              • #67
                Thank you for your speedy response.

                I have one more question. I got following results by using bwa(0.4.9), my favorite.
                seq-name#0 69 * 0 0 * * 0 0 (sequence) (quality)
                seq-name#0 133 * 0 0 * * 0 0 (sequence) (quality)

                Both reads do not be mapped but the flag for "the mate is unnmapped" are 0.
                How should I interpret it?

                Comment


                • #68
                  This is a flaw in bwa when generating SAM. I will fixed it.

                  It is not so easy to generate absolutely correct SAM due to the dependency between fields and between mates. We tried to minimize the dependency in design, but reducing dependency causes inconvenience in other cases. There is always a balance.

                  Comment


                  • #69
                    I appreciate that you immediately replied to my question.
                    I would like to handle the sam format files.

                    Comment


                    • #70
                      genome likelihood format

                      Hi,
                      where can I find further documentation on the genome likelihood format 3.0 ?
                      thanks,
                      peter

                      Comment


                      • #71
                        Hi,
                        could anybody, please, explain the output format of the wgsim_eval.pl script?
                        I used this script to evaluate aln.sam file after making alignment with BWA.
                        06x 1654169 / 3308330 3308330 5.000e-01
                        05x 31765 / 63530 3371860 5.000e-01
                        04x 4938 / 9872 3381732 5.000e-01
                        03x 163891 / 327252 3708984 5.001e-01
                        02x 65120 / 129918 3838902 5.001e-01
                        01x 2669 / 5090 3843992 5.001e-01
                        00x 113748 / 141416 3985408 5.109e-01
                        BTW, in the BWA-man is written that " These reads are mapped with bowtie, bwa, maq and soap... The resultant alignments were then evaluated with wgsim_eval.pl script. "
                        How could I use this script for alignments from other programs such as bowtie, soap?
                        thanks,
                        Mike.

                        Comment


                        • #72
                          hi, I have trouble conveting sam to bam.. I tried both:

                          samtools import ref .fai in.sam out.bam
                          got error:
                          [sam_header_read2] 22 sequences loaded.
                          [sam_read1] reference '-143963499' is recognized as '*'.
                          Parse error at line 1: invalid CIGAR operation
                          Aborted

                          samtools view -bt ref .fai -o in.sam out.bam
                          and got similar error:
                          [sam_header_read2] 22 sequences loaded.
                          [sam_read1] reference '' is recognized as '*'.
                          [main_samview] truncated file.

                          thanks,

                          Comment


                          • #73
                            Lincoln has released SAM/BAM perl APIs a few days (weeks?) ago. It is here:



                            Compiling this module requires samtools C source codes. Bio:B::Sam is known to work with samtools-0.1.4 and 0.1.5 (released today).

                            BTW, the latest samtools supports opening BAM files over FTP. For example:

                            samtools tview ftp://ftp.ncbi.nih.gov/1000genomes/f...32.2009_06.bam

                            Comment


                            • #74
                              Bio:B::Sam perl APIs need to start from BAM files (-bam) , not SAM files(no "-sam" at all). I only have SAM files which from bwa, all I need is to convert SAM to BAM.
                              I am stuck with SAM files.....
                              samtools import ref .fai in.sam out.bam
                              got error:
                              [sam_header_read2] 22 sequences loaded.
                              [sam_read1] reference '-143963499' is recognized as '*'.
                              Parse error at line 1: invalid CIGAR operation
                              Aborted

                              thanks,

                              Comment


                              • #75
                                Bit of a newbie question. I've been trying to use the pileup analysis on a BWA dataset. Is there any way to switch of the read bases, read quality and alignment quality information in the output file and get a summarized format instead?

                                I'm looking at a small number of sequences that have a coverage of 50.000X upwards, and as a result the pileup output becomes almost unmanageable.

                                Thanks!

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Essential Discoveries and Tools in Epitranscriptomics
                                  by seqadmin




                                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                  04-22-2024, 07:01 AM
                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-25-2024, 11:49 AM
                                0 responses
                                19 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-24-2024, 08:47 AM
                                0 responses
                                18 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                62 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                60 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X