Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • pmiguel
    Senior Member
    • Aug 2008
    • 2328

    454 .ace to .bam conversion issue

    I have heard that the "next" version of the Roche 454 software will include a .bam output format.

    Until then (and presuming this is actually the case) I am stuck with the brutal amos2bnk methodology outlined here:

    Download AMOS for free. AMOS is a collection of tools for genome assembly. AMOS is a collection of tools and class interfaces for the assembly of DNA reads. The package includes a robust infrastructure, modular assembly pipelines, and tools for overlapping, consensus generation, contigging, and assembly manipulation.


    There are numerous non-documented gotchas ready to pounce on the misguided novice who attempts this protocol. But I have managed to traverse the procedure a couple of times and emerge bloodied but (largely) unbroken.

    One final issue involves actually view the resulting .bam file in IGV. (BTW, you definitely want to turn off "show soft-clipped bases" in the preferences.) I think the issue derives from the crazy long cigar strings produced. Some of this is to be expected because of the 454's well-known homopolymer issues. However, it looks to me like the cigar strings are being produced from the padded reads in the .ace file. That legions of "deletions" versus the consensus are shown in the viewer.

    Anyone seen this? Anyone have solution to suggest?

    --
    Phillip
  • maubp
    Peter (Biopython etc)
    • Jul 2009
    • 1544

    #2
    Do you have to have a SAM/BAM file? Why not stick with the ACE file and use a viewer that supports that?

    Comment

    • pmiguel
      Senior Member
      • Aug 2008
      • 2328

      #3
      cigar string hacking

      Here is an example cigar string produced:
      Code:
      69S10M1P4M1P2M1P12M1P11M1P2M1P10M1P18M1P2M1P3M1P30M1P6M1P3M1P1M1P3M1P9M1P9M1P5M1P7M1P1M1P1M2P7M1P1M1P2M1P2M2P3M1P6M1P3M1P3M1P6M1P3M1P5M1P3M1P1M3P1M3P1M1P1M1P1M4P1M1P1M2P4M1P1M1P2M1P3M1P1M1P5M1P2M1P7M1P2M1P7M2P5M1P1M1P2M1P1M1P2M1P3M1P2M1P1M1P1M1P1M2P1M1P3M1I1D9M1P1M1P1M1P3M2P1M1P1M1P3M1P1M1I1D6M1P2M1P3M1I2D2M1I2M1I1D2M1P7M1P12M2P6M1P2M1D1I15M1P9M1P3M1P4M1P1M1P3M1P3M1P22M1P7M1P26M1P15M1P1M1P4M1P6M2P9M1P6M1P2M2P2M1P9M1P2M1P3M4S
      The "P", from the sam specification says it denotes: "padding (silent deletion from padded reference)". So maybe I could parse through, deleting the P\d+ fields and collapsing adjacent "M" no longer separated by the pads?

      --
      Phillip

      Comment

      • pmiguel
        Senior Member
        • Aug 2008
        • 2328

        #4
        Originally posted by maubp View Post
        Do you have to have a SAM/BAM file? Why not stick with the ACE file and use a viewer that supports that?
        Newbler does not currently produce BAM files.

        I really like IGV.
        The only ACE file viewer I use is consed. Great for BAC sized assemblies. Not good for full bacterial genomes. Do you have an ACE viewer you would recommend?

        --
        Phillip

        Comment

        • maubp
          Peter (Biopython etc)
          • Jul 2009
          • 1544

          #5
          I personally use Tablet for viewing ACE files (and SAM/BAM files too). It supports some other formats as well: http://bioinf.hutton.ac.uk/tablet/

          If you are interested in editing the ACE file, GAP4 or GAP5 would be worth a look too.

          Comment

          • sklages
            Senior Member
            • May 2008
            • 628

            #6
            Originally posted by maubp View Post
            I personally use Tablet for viewing ACE files (and SAM/BAM files too). It supports some other formats as well: http://bioinf.hutton.ac.uk/tablet/

            If you are interested in editing the ACE file, GAP4 or GAP5 would be worth a look too.
            AFAIK these won't work with ACE ... ACE is bad ;-)

            Comment

            • pmiguel
              Senior Member
              • Aug 2008
              • 2328

              #7
              Originally posted by maubp View Post
              I personally use Tablet for viewing ACE files (and SAM/BAM files too). It supports some other formats as well: http://bioinf.hutton.ac.uk/tablet/

              If you are interested in editing the ACE file, GAP4 or GAP5 would be worth a look too.
              Okay, Tablet is a good viewer for large .ace files. For editing .ace files, even giant ones, consed is fine.

              It did not work for the BAM file I tried. I get:
              SAM validation error: ERROR: Record 44, Read name H-148_49:1:1208:17598:140372, Mate Alignment start should != 0 because reference name != *.

              Errors encountered by Tablet when processing BAM files are often related to using files that have not been sorted,
              or where the index file is out of date. Please resort and/or reindex this file using samtools 0.1.8 or higher.

              But we are running samtools 0.1.5, so possibly that is the issue.

              Anyway, thanks for the advice.

              --
              Phillip

              Comment

              • pmiguel
                Senior Member
                • Aug 2008
                • 2328

                #8
                Tablet wants future version of samtools?

                Originally posted by pmiguel View Post

                It did not work for the BAM file I tried. I get:
                SAM validation error: [...]
                Please resort and/or reindex this file using samtools 0.1.8 or higher.

                But we are running samtools 0.1.5, so possibly that is the issue.
                Now that I check:

                Download SAM tools for free. SAM (Sequence Alignment/Map) is a flexible generic format for storing nucleotide sequence alignment. SAMtools provide efficient utilities on manipulating alignments in the SAM format.


                the current version of samtools is 0.1.6

                *****Note added after posting:
                Arrgg! The most recent version of samtools is 0.1.16! I seem to be unable to read decimal numbers today.
                Well, I don't want to let the facts get in the way of my (lame) joke. So, please continue reading...
                *****

                Possibly Tablet works so well because it was sent back in time by coders from the future using advanced methodologies? But they failed to account for the lack of certain key dependencies?

                Ah well, it works for ACE files...

                --
                Phillip
                Last edited by pmiguel; 06-23-2011, 05:06 AM. Reason: Erroneous info on current samtools version

                Comment

                • ulz_peter
                  Senior Member
                  • Feb 2010
                  • 219

                  #9
                  We've got the same issue. As we only do resequencing the ACe file format is definitely not the best solutoin for outputting alignment (especially against the human reference genome, as it is realyy large even for small projects).

                  Tablet can visualize ACE files, however it is a pain to get some additional data to visualize easily (in our case gene annotations).

                  I#M really looking forward to the next software version with BAM output. I was already thinking of coding something myself...

                  Does anyone know when the new version will arrive?

                  Comment

                  • pmiguel
                    Senior Member
                    • Aug 2008
                    • 2328

                    #10
                    Well, this is little more than speculation...
                    But, generally major chemistry/hardware releases are accompanied by a new software version. The new longer read chemistry upgrades are either happening now or rolling out over the summer.
                    So that would suggest that the answer is "soon"?

                    --
                    Phillip

                    Comment

                    • imilne
                      Member
                      • Jan 2010
                      • 68

                      #11
                      Originally posted by pmiguel View Post
                      Okay, Tablet is a good viewer for large .ace files. For editing .ace files, even giant ones, consed is fine.

                      It did not work for the BAM file I tried. I get:
                      [INDENT]SAM validation error: ERROR: Record 44, Read name H-148_49:1:1208:17598:140372, Mate Alignment start should != 0 because reference name != *.
                      Use Tablet's Options to change the BAM validation setting from stringent to lenient.
                      Our software: Tablet | Flapjack | Strudel | CurlyWhirly | TOPALi

                      Comment

                      • imilne
                        Member
                        • Jan 2010
                        • 68

                        #12
                        Originally posted by pmiguel View Post
                        Now that I check:

                        Download SAM tools for free. SAM (Sequence Alignment/Map) is a flexible generic format for storing nucleotide sequence alignment. SAMtools provide efficient utilities on manipulating alignments in the SAM format.


                        the current version of samtools is 0.1.6
                        No, it's 0.1.16, which is quite a few versions on from 0.1.6.

                        Tablet tries to read the statistics (specifically read counts per contig) for a BAM's index file (.bai) and will warn if it can't do this, which usually happens if the index file was created using a version of samtools earlier than 0.1.8, which is the first version (we're aware of) that added these stats. It's been available since last summer.
                        Our software: Tablet | Flapjack | Strudel | CurlyWhirly | TOPALi

                        Comment

                        • pmiguel
                          Senior Member
                          • Aug 2008
                          • 2328

                          #13
                          Tablet SAM validation error workaround.

                          Originally posted by pmiguel View Post
                          Okay, Tablet is a good viewer for large .ace files. For editing .ace files, even giant ones, consed is fine.

                          It did not work for the BAM file I tried. I get:
                          SAM validation error: ERROR: Record 44, Read name H-148_49:1:1208:17598:140372, Mate Alignment start should != 0 because reference name != *.

                          Errors encountered by Tablet when processing BAM files are often related to using files that have not been sorted,
                          or where the index file is out of date. Please resort and/or reindex this file using samtools 0.1.8 or higher.

                          But we are running samtools 0.1.5, so possibly that is the issue.

                          Anyway, thanks for the advice.

                          --
                          Phillip
                          We were running samtools 0.1.15 -- not the problem.

                          The Tablet guys responded to an email I sent them about this issue with:

                          The SAM validation error is an error message which is given out by the PICARD API which we use for loading BAM files. It’s indicating that the file you’re loading doesn’t fully conform to the BAM spec. You can tweak Tablet so that it will ignore these errors, but we have it set to flag them up by default. If you go to the Tablet application menu, then access Tablet Options and select the Importing tab. Make sure the “Set BAM validation stringency to lenient rather than strict (BAM only)” option is selected and click OK. You should now be able to load the data as the underlying PICARD API will ignore the error messages rather than show error messages.
                          This works for me!

                          --
                          Phillip

                          Comment

                          • RCJK
                            Senior Member
                            • May 2009
                            • 156

                            #14
                            v2.6 is a part of the upgrade package for the XL+ upgrade. Last I've heard they (my local Roche reps) seem somewhat confident that the upgrade will launch at the end of June. A recent article on GenomeWeb also mentioned end of June, but we'll see. That's what they were saying last year.

                            Comment

                            • flxlex
                              Moderator
                              • Nov 2008
                              • 412

                              #15
                              Originally posted by RCJK View Post
                              v2.6 is a part of the upgrade package for the XL+ upgrade. Last I've heard they (my local Roche reps) seem somewhat confident that the upgrade will launch at the end of June. A recent article on GenomeWeb also mentioned end of June, but we'll see. That's what they were saying last year.
                              We are getting the upgrade 'most likely' in mid July...

                              Comment

                              Latest Articles

                              Collapse

                              • GATTACAT
                                Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                by GATTACAT
                                Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                                Yesterday, 11:43 AM
                              • SEQadmin2
                                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                by SEQadmin2


                                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                                Here are nine questions we think about, in roughly the order they matter, before...
                                06-18-2026, 07:11 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Today, 11:08 AM
                              0 responses
                              6 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-30-2026, 05:37 AM
                              0 responses
                              11 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-26-2026, 11:10 AM
                              0 responses
                              19 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-17-2026, 06:09 AM
                              0 responses
                              53 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...