Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • 454 .ace to .bam conversion issue

    I have heard that the "next" version of the Roche 454 software will include a .bam output format.

    Until then (and presuming this is actually the case) I am stuck with the brutal amos2bnk methodology outlined here:

    Download AMOS for free. AMOS is a collection of tools for genome assembly. AMOS is a collection of tools and class interfaces for the assembly of DNA reads. The package includes a robust infrastructure, modular assembly pipelines, and tools for overlapping, consensus generation, contigging, and assembly manipulation.


    There are numerous non-documented gotchas ready to pounce on the misguided novice who attempts this protocol. But I have managed to traverse the procedure a couple of times and emerge bloodied but (largely) unbroken.

    One final issue involves actually view the resulting .bam file in IGV. (BTW, you definitely want to turn off "show soft-clipped bases" in the preferences.) I think the issue derives from the crazy long cigar strings produced. Some of this is to be expected because of the 454's well-known homopolymer issues. However, it looks to me like the cigar strings are being produced from the padded reads in the .ace file. That legions of "deletions" versus the consensus are shown in the viewer.

    Anyone seen this? Anyone have solution to suggest?

    --
    Phillip

  • #2
    Do you have to have a SAM/BAM file? Why not stick with the ACE file and use a viewer that supports that?

    Comment


    • #3
      cigar string hacking

      Here is an example cigar string produced:
      Code:
      69S10M1P4M1P2M1P12M1P11M1P2M1P10M1P18M1P2M1P3M1P30M1P6M1P3M1P1M1P3M1P9M1P9M1P5M1P7M1P1M1P1M2P7M1P1M1P2M1P2M2P3M1P6M1P3M1P3M1P6M1P3M1P5M1P3M1P1M3P1M3P1M1P1M1P1M4P1M1P1M2P4M1P1M1P2M1P3M1P1M1P5M1P2M1P7M1P2M1P7M2P5M1P1M1P2M1P1M1P2M1P3M1P2M1P1M1P1M1P1M2P1M1P3M1I1D9M1P1M1P1M1P3M2P1M1P1M1P3M1P1M1I1D6M1P2M1P3M1I2D2M1I2M1I1D2M1P7M1P12M2P6M1P2M1D1I15M1P9M1P3M1P4M1P1M1P3M1P3M1P22M1P7M1P26M1P15M1P1M1P4M1P6M2P9M1P6M1P2M2P2M1P9M1P2M1P3M4S
      The "P", from the sam specification says it denotes: "padding (silent deletion from padded reference)". So maybe I could parse through, deleting the P\d+ fields and collapsing adjacent "M" no longer separated by the pads?

      --
      Phillip

      Comment


      • #4
        Originally posted by maubp View Post
        Do you have to have a SAM/BAM file? Why not stick with the ACE file and use a viewer that supports that?
        Newbler does not currently produce BAM files.

        I really like IGV.
        The only ACE file viewer I use is consed. Great for BAC sized assemblies. Not good for full bacterial genomes. Do you have an ACE viewer you would recommend?

        --
        Phillip

        Comment


        • #5
          I personally use Tablet for viewing ACE files (and SAM/BAM files too). It supports some other formats as well: http://bioinf.hutton.ac.uk/tablet/

          If you are interested in editing the ACE file, GAP4 or GAP5 would be worth a look too.

          Comment


          • #6
            Originally posted by maubp View Post
            I personally use Tablet for viewing ACE files (and SAM/BAM files too). It supports some other formats as well: http://bioinf.hutton.ac.uk/tablet/

            If you are interested in editing the ACE file, GAP4 or GAP5 would be worth a look too.
            AFAIK these won't work with ACE ... ACE is bad ;-)

            Comment


            • #7
              Originally posted by maubp View Post
              I personally use Tablet for viewing ACE files (and SAM/BAM files too). It supports some other formats as well: http://bioinf.hutton.ac.uk/tablet/

              If you are interested in editing the ACE file, GAP4 or GAP5 would be worth a look too.
              Okay, Tablet is a good viewer for large .ace files. For editing .ace files, even giant ones, consed is fine.

              It did not work for the BAM file I tried. I get:
              SAM validation error: ERROR: Record 44, Read name H-148_49:1:1208:17598:140372, Mate Alignment start should != 0 because reference name != *.

              Errors encountered by Tablet when processing BAM files are often related to using files that have not been sorted,
              or where the index file is out of date. Please resort and/or reindex this file using samtools 0.1.8 or higher.

              But we are running samtools 0.1.5, so possibly that is the issue.

              Anyway, thanks for the advice.

              --
              Phillip

              Comment


              • #8
                Tablet wants future version of samtools?

                Originally posted by pmiguel View Post

                It did not work for the BAM file I tried. I get:
                SAM validation error: [...]
                Please resort and/or reindex this file using samtools 0.1.8 or higher.

                But we are running samtools 0.1.5, so possibly that is the issue.
                Now that I check:

                Download SAM tools for free. SAM (Sequence Alignment/Map) is a flexible generic format for storing nucleotide sequence alignment. SAMtools provide efficient utilities on manipulating alignments in the SAM format.


                the current version of samtools is 0.1.6

                *****Note added after posting:
                Arrgg! The most recent version of samtools is 0.1.16! I seem to be unable to read decimal numbers today.
                Well, I don't want to let the facts get in the way of my (lame) joke. So, please continue reading...
                *****

                Possibly Tablet works so well because it was sent back in time by coders from the future using advanced methodologies? But they failed to account for the lack of certain key dependencies?

                Ah well, it works for ACE files...

                --
                Phillip
                Last edited by pmiguel; 06-23-2011, 05:06 AM. Reason: Erroneous info on current samtools version

                Comment


                • #9
                  We've got the same issue. As we only do resequencing the ACe file format is definitely not the best solutoin for outputting alignment (especially against the human reference genome, as it is realyy large even for small projects).

                  Tablet can visualize ACE files, however it is a pain to get some additional data to visualize easily (in our case gene annotations).

                  I#M really looking forward to the next software version with BAM output. I was already thinking of coding something myself...

                  Does anyone know when the new version will arrive?

                  Comment


                  • #10
                    Well, this is little more than speculation...
                    But, generally major chemistry/hardware releases are accompanied by a new software version. The new longer read chemistry upgrades are either happening now or rolling out over the summer.
                    So that would suggest that the answer is "soon"?

                    --
                    Phillip

                    Comment


                    • #11
                      Originally posted by pmiguel View Post
                      Okay, Tablet is a good viewer for large .ace files. For editing .ace files, even giant ones, consed is fine.

                      It did not work for the BAM file I tried. I get:
                      [INDENT]SAM validation error: ERROR: Record 44, Read name H-148_49:1:1208:17598:140372, Mate Alignment start should != 0 because reference name != *.
                      Use Tablet's Options to change the BAM validation setting from stringent to lenient.
                      Our software: Tablet | Flapjack | Strudel | CurlyWhirly | TOPALi

                      Comment


                      • #12
                        Originally posted by pmiguel View Post
                        Now that I check:

                        Download SAM tools for free. SAM (Sequence Alignment/Map) is a flexible generic format for storing nucleotide sequence alignment. SAMtools provide efficient utilities on manipulating alignments in the SAM format.


                        the current version of samtools is 0.1.6
                        No, it's 0.1.16, which is quite a few versions on from 0.1.6.

                        Tablet tries to read the statistics (specifically read counts per contig) for a BAM's index file (.bai) and will warn if it can't do this, which usually happens if the index file was created using a version of samtools earlier than 0.1.8, which is the first version (we're aware of) that added these stats. It's been available since last summer.
                        Our software: Tablet | Flapjack | Strudel | CurlyWhirly | TOPALi

                        Comment


                        • #13
                          Tablet SAM validation error workaround.

                          Originally posted by pmiguel View Post
                          Okay, Tablet is a good viewer for large .ace files. For editing .ace files, even giant ones, consed is fine.

                          It did not work for the BAM file I tried. I get:
                          SAM validation error: ERROR: Record 44, Read name H-148_49:1:1208:17598:140372, Mate Alignment start should != 0 because reference name != *.

                          Errors encountered by Tablet when processing BAM files are often related to using files that have not been sorted,
                          or where the index file is out of date. Please resort and/or reindex this file using samtools 0.1.8 or higher.

                          But we are running samtools 0.1.5, so possibly that is the issue.

                          Anyway, thanks for the advice.

                          --
                          Phillip
                          We were running samtools 0.1.15 -- not the problem.

                          The Tablet guys responded to an email I sent them about this issue with:

                          The SAM validation error is an error message which is given out by the PICARD API which we use for loading BAM files. It’s indicating that the file you’re loading doesn’t fully conform to the BAM spec. You can tweak Tablet so that it will ignore these errors, but we have it set to flag them up by default. If you go to the Tablet application menu, then access Tablet Options and select the Importing tab. Make sure the “Set BAM validation stringency to lenient rather than strict (BAM only)” option is selected and click OK. You should now be able to load the data as the underlying PICARD API will ignore the error messages rather than show error messages.
                          This works for me!

                          --
                          Phillip

                          Comment


                          • #14
                            v2.6 is a part of the upgrade package for the XL+ upgrade. Last I've heard they (my local Roche reps) seem somewhat confident that the upgrade will launch at the end of June. A recent article on GenomeWeb also mentioned end of June, but we'll see. That's what they were saying last year.

                            Comment


                            • #15
                              Originally posted by RCJK View Post
                              v2.6 is a part of the upgrade package for the XL+ upgrade. Last I've heard they (my local Roche reps) seem somewhat confident that the upgrade will launch at the end of June. A recent article on GenomeWeb also mentioned end of June, but we'll see. That's what they were saying last year.
                              We are getting the upgrade 'most likely' in mid July...

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              8 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              8 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              66 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X