Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • is there a package available to directly analyze BAM files in R?
    --
    Jeremy Leipzig
    Bioinformatics Programmer
    --
    My blog
    Twitter

    Comment


    • SAM/SAMtools FAQ page.

      Comment


      • BWA/SAMTOOLS SAM-> BAM - invalid CIGAR operation.

        I am encountering exactly the same problem as the post from around #72 in this thread,
        Originally posted by gcrdb View Post
        ... APIs need to start from BAM files (-bam) , not SAM files(no "-sam" at all). I only have SAM files which from bwa, all I need is to convert SAM to BAM.

        got error:
        [sam_header_read2] 22 sequences loaded.
        [sam_read1] reference '-143963499' is recognized as '*'.
        Parse error at line 1: invalid CIGAR operation
        Aborted

        thanks,
        Here is a sample of my .SAM file, produced by bwa
        Code:
        >SRR003966.14 1 1
        chr14   -37395245       0
        >SRR003966.49 1 1
        chr12   -25942795       0
        >SRR003966.85 5 14
        chr14   +102382726      1
        chr3    -24756032       2
        chr2    -47795311       2
        chr5    -176639213      2
        chr3    -149933203      2
        This SAM file seems a little thin (according to the SAM file format specification I think there should be more fields).

        Here is the subsequent command line I used to try to convert it to a BAM file, and its accompanying error
        Code:
        -sh-3.1$ ~/bin/samtools import hg18.fa.gz.fai SRR003966-05.sam SRR003966-05.bam
        [sam_header_read2] 49 sequences loaded.
        [sam_read1] reference '-37395245' is recognized as '*'.
        Parse error at line 1: invalid CIGAR operation
        Aborted
        I am running BWA 0.5.5 and SAMTOOLS v 0.1.7.

        I apologize in advance if this is a newbie question, but I am generally following Heng Li's process as outlined in Short-read Alignment with MAQ and BWA

        Any feedback or pointer would be most welcome.

        -- bg

        Comment


        • Am I doing something wrong ??



          I used MAQGene to align sequence fragments of a mutant against the reference sequence. I want to use SAMTools to see the alignment of all the fragments against the ref. But I have a problem with the sam file ...

          Code:
          [root@localhost MaqToSam]# /home/.../SAMtools_svn/samtools/misc/maq2sam-long
          Version: r439
          Usage: maq2sam <in.map> [<readGroup>]
          [root@localhost MaqToSam]# /home/.../SAMtools_svn/samtools/misc/maq2sam-long fr6_34.map > fr6_34.sam
          [root@localhost MaqToSam]# ../samtools faidx c_elegans.WS200.dna.fa
          [root@localhost MaqToSam]# ../samtools view -bS -T c_elegans.WS200.dna.fa -o fr6_34.bam fr6_34.sam
          [COLOR="Red"][main_samview] fail to open file for reading.[/COLOR]
          [root@localhost MaqToSam]# ../samtools view fr6_34.sam
          [COLOR="Red"][bam_header_read] EOF marker is absent.
          [main_samview] fail to read the header.[/COLOR]
          [root@localhost MaqToSam]# ../samtools import c_elegans.WS200.dna.fa fr6_34.sam fr6_34.bam
          [COLOR="Red"][main_samview] fail to open file for reading.[/COLOR]
          The problem is that I do it once time and I could see the alignment with tview (I would like to see it in an other format : http://seqanswers.com/forums/showthread.php?t=3249) and now I cannot reiterate this ... what could be the problem ??
          Last edited by Fred13; 12-02-2009, 06:33 AM.

          Comment


          • Parse error at line 1: invalid CIGAR operation

            RE: Posts 72 & 74 .
            Originally posted by gcrdb View Post
            hi, I have trouble conveting sam to bam.. I tried both:

            samtools import ref .fai in.sam out.bam
            got error:
            [sam_header_read2] 22 sequences loaded.
            [sam_read1] reference '-143963499' is recognized as '*'.
            Parse error at line 1: invalid CIGAR operation
            Aborted

            samtools view -bt ref .fai -o in.sam out.bam
            and got similar error:
            [sam_header_read2] 22 sequences loaded.
            [sam_read1] reference '' is recognized as '*'.
            [main_samview] truncated file.

            thanks,
            While this may have been answered in the interim, I didn't see one, so I thought I'd post my own.

            This happens when you try to use the
            Code:
            bwa samse
            command with the -n argument to generate a .sam file. With the -n argument, this command lists multiple matches per read. The resulting output file is a list of matches and NOT a .sam file. If that output file is treated as a .SAM file and fed into
            Code:
            samtools import
            to create a BAM file, then samtools will generate the sort of error output you described. If you remove the "-n" argument to "bwa samse" then the output will be a proper SAM file, but with only one alignment per read.

            I ran into this myself, and was only able to figure it out when I ran bwa in a debugger and traced the behavior.

            --bg

            Comment


            • eference '163285701' is recognized as '*', sequence and quality are inconsistent

              >
              Originally posted by lh3 View Post
              @luisczul

              >Samtools indicates that the error happens to line 164507. What does >that line look like?As for the second problem, it seems like a bug. You >are using very short reference. Could you send me an example? >Thanks.

              >@henry
              >Are you generating results with "samse -n"? With -n, the output is NOT >sam. You can tell this from the bwa manual page.

              >These three questions are less relevant to samtools. They are mostly related to bwa.

              I am having similar problem with converting sam to bam:

              [root@master maq_bwa_run]# samtools view -bt hg19.fa.fai -o aln.bam aln.sam
              [samopen] SAM header is present: 93 sequences.
              [sam_read1] reference '163285701' is recognized as '*'.
              Parse warning at line 29354: mapped sequence without CIGAR
              Parse error at line 29354: sequence and quality are inconsistent
              Aborted

              What could happen?

              the line 29354 in aln.sam looks like this

              0 chr5 163285701 0 0M * 0 0 * XT:A:R NM:i:0 X0:i:100307042 XM:i:0 XO:i:0 XG:i:0 MD:Z:0
              ~


              the lines 29352, 29353, 29354 and 29355 look like this:


              ran_melanocytes:853_1092_1366 4 * 0 0 * * 0 0 TTCGCGGGGGATAGACTGTACAATCTAGGCCGGTGGTGTATGGCCCTGT AA>>AA@=>A>A;><>>>;>?<:<:7?<='6<(,))<6/&38&&(531)
              ran_melanocytes:853_1093_11 4 * 0 0 * * 0 0 NANTNGNAGGCGNNANNNTNGNCNGACNNNNNNGNGGANANTNTNNNGA -"1-"=-"/-"):8,:-"-"5-"-"-"<-"1-"1-";,,-"-"-"-"-"
              0 chr5 163285701 0 0M * 0 0 * XT:A:R NM:i:0 X0:i:100307042 XM:i:0 XO:i:0 XG:i:0 MD:Z:0
              ran_melanocytes:853_1093_36 4 * 0 0 * * 0 0 TATAAATTTCGAGGAAATTAGACATATCTCCGTCGTAGGGATCCCCTGG =3:=;=<<<=@/=?7:7;8?><?15A57:?=;;6:=95?:8,,,,,/
              ~



              Thanks
              Alexey

              Comment


              • I am working with the Blast aligner and therefore I have to convert my blast alignment files to sam format. I think I have found a bug in the third party perl script "blast2sam.pl" which is available with samtools on Sourceforge.

                The problem comes from the "aln2cm" subroutine used to build the CIGAR for a query. Apparently, the script switches 'D' with 'I'. For instance, if the CIGAR must be "103M1I2D10M", the script outputs the following "103M1D2I10M".

                If you look at the blast2sam.pl script, line 88:

                "push(@$cigar, $cmaux->[1] . substr("MDI", $cmaux->[0], 1));"

                and replace this line with:

                "push(@$cigar, $cmaux->[1] . substr("MID", $cmaux->[0], 1));" (to swith 'I' and 'D' in the "MDI" string)

                the problem seems to be fixed.

                Note that this problem is only visible when you are working with gapped queries, which leads to a "CIGAR length and query length are inconsistent" error message, and that you have to use the -s option with blast2sam.pl so that the query string can be included in the resulting sam file.

                I think a user named corthay made a post some time ago about a similar problem. If other people have faced the same problem, please let me know.

                I hope this helps

                Best regards

                David

                Comment


                • Originally posted by torrey View Post
                  I was curious if anyone has found a workaround to this error? It is coming up for me when viewing novoalign alignments. I am using samtools 0.1.6 (r453), and get the following error when loading a samtools converted bam file from novoalign's sam file

                  samtools: bam_lpileup.c:116: tview_func: Assertion `l == tv->n_pre' failed.


                  thanks
                  I am also having this problem on linux. Did you find a solution?

                  Comment


                  • question about scores of INDEL in pileup output file

                    I have a question about output file of pileup command.

                    According to samtools FAQ pagehttp://sourceforge.net/apps/mediawik...pileup_output., column 5 is phred-scaled consensus score and column 6 is phred-scaled SNP score.

                    For SNP line, it seems to be true that it is phred-scaled.
                    But for INDEL line, values in column 5,6 seem too large to be phred-scaled. In my pileup output file, these values (column 5, 6 of INDEL line) are larger than 1000 very often. (sometimes larger than 4000.)

                    Please let me know how these values (column 5,6 of INDEL line) are calculated.

                    thanks in advance

                    Comment


                    • hg18 --> hg19 bam conversion program ???

                      Anybody written the great hg18 --> hg19 bam conversion program, yet?

                      I mean in "C", not perl or ruby (just kidding, as long as it finishes by 8:00 AM)

                      Comment


                      • coordinates off in samtools tview

                        First, I just want to thank the author of samtools - it is a great package for the community. I especially appreciate the streaming abilities.

                        I have a odd problem when I use samtools tview to visualize my alignments.

                        I have found that the coordinates are completely off ! For example if I use this command:

                        samtools tview my.bowtie.sorted.bam hg18.fa

                        and I go to chr22, I find that the position of the visualized read alignment is more than 100kb off than the reported read position in the sam output ! However, when I look at chrM, I find the coordinates to work just fine. All chromosomes except chrM have this problem. Is this an fasta indexing issue? I have also separated out the individual chromosome reference sequence and indexed that by itself, but I still get the same problem! If indexing were a problem, then that should have solved it, no?

                        I know I am using the same reference fa that I used to build the search index in bowtie. I converted my sam to bam, then sorted, and indexed. I don't know what is going on here. Has anyone experienced this? Is there something I am overlooking?

                        Thank you for reading this and thank you for any help/suggestions.
                        Last edited by NGSfan; 01-11-2010, 01:53 AM.

                        Comment


                        • Another issue I found is after doing a sort:

                          samtools sort aln.bam aln.sorted

                          samtools view -H aln.sorted.bam

                          The header SO: field remains unsorted

                          @HD VN:1.0 SO:unsorted


                          I have to use picard's sorter to get a header with SO:sorted.

                          programs such as GATK complain when this header is not changed.

                          Comment


                          • Hi,

                            I am trying to find a script able to convert from a output file from the solexa pipeline called 'realign' ( page 9 ) into SAM format. I will keep my search, but in the meantime I thought someone may know about it in this thread.

                            Or maybe there is a different (better) way of doing this before I write a parser.

                            The format I goes like this :
                            TACCATATTTATATTACATTAAATTCCTATATTTAC 14859 1 hs_ref_chr14:21265482 F TACCATATTTATAGTACATTAAATTCCTAGATATAC 14859

                            where each column means :

                            sequence

                            best score

                            numbers of hits at that score

                            target: pos

                            strand

                            target sequence

                            next best score

                            Thank you !
                            Last edited by polivares; 01-14-2010, 08:33 AM.

                            Comment


                            • I wished to convert a recent eland export alignment file to use for QuEST analysis. Does the export2sam.pl script work for the 1.3+ versions? Is there any format converter that can convert this type of file around?

                              Comment


                              • Originally posted by NGSfan View Post
                                Another issue I found is after doing a sort:

                                samtools sort aln.bam aln.sorted

                                samtools view -H aln.sorted.bam

                                The header SO: field remains unsorted

                                @HD VN:1.0 SO:unsorted


                                I have to use picard's sorter to get a header with SO:sorted.

                                programs such as GATK complain when this header is not changed.

                                Ok, there is a solution to this problem:



                                "BAM files created by samtools which are sorted often don't have the sort order set to 'coordinate' in the BAM header (instead it's marked as 'unsorted'. Because BAMs are binary files, there's no way to easily change the tag. This walker rewrites a BAM file so that the output is identical to the input except that the sort order tag is set to 'coordinate'. "

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Essential Discoveries and Tools in Epitranscriptomics
                                  by seqadmin




                                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                  04-22-2024, 07:01 AM
                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-25-2024, 11:49 AM
                                0 responses
                                19 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-24-2024, 08:47 AM
                                0 responses
                                19 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                62 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                60 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X