Announcement

Collapse
No announcement yet.

.ace consensus sequence

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • .ace consensus sequence

    Hello all,
    Does anyone know a method to create a consensus sequence from an .ace file? I am assembling solexa data to a reference using MOSAIK. I can convert the .ace file to .fasta using a script, however this does not make a consensus, instead it creates an individual sequence for each read:

    >.MosaikReference
    AAGGACTTATGAGGACAGAGAGCGGGATGACTCTTCATATGCCACATGCAGCCCACATCCCAACACCTCT
    CCCATGTCTCAAGTTGAAGGATGCGAGACCATTTCGCAGAGAAGCCGTACCAAACGCAAACACGACCATA
    TCACCTGTTAGTGCCGAAGTTGGAAGCTTCCCCACCCTTGACTGCAGCACCTGGGAAGAAAGCACTTAAG
    AAACCCAGGCATCCTACTGTGGATGACGATGGGGTTCATAGAGGCCATCGACTGACTAGGGCAGTGACGC
    AAAGACAAGCCACTGGCGCTCGAAGTGGCCACATTTCAAGACCCAAGCGGCTCTACCCCAGACTGGACTG
    ATGTGGATATTCCATTGCTGCCTAGCCCTTGGAGACGATGCGATTCGCTCTCCCTGGAGGGTAATTTGTC
    AATAGATCCCATGAGACATCGAGATTGATTGAGGGCCGGTAAGGCTGAAACGGACAATGATTCAGTCATA
    AAGTGTAGTGGTCACTTGATGTTGAGATGCGACTTGCCTGCAAATCGCTGTGTGCTTCGTGTACAGTCAC
    TATTTGGATAACATCGAGTCTGGAGTTTTCTTTCAAGCGCCAGTTATCACAGTACTTTGAGTTTTTCTGT
    TTATTGGTTGACCACCCAAAGCATCCGATTTCACGGAAGGGACACGATGGCGGTCTGCACTTATTTCCGT
    GATTGAACCGGTTCAAATTAGAAAACGGGCGACAAAGTGCCACGTGCTATGCCATCGAGTATCACTTACG
    AGGTTCTACCATGCTGGATGTAGGGAGCGGGGCAAAAGCCATAATTGCTGCTTTTCGGTGCCAGGTCGTA
    CAAGATACGGAAAGGCATGTGTACCATACATGCACCAAGATTTCAACCTGCGGTGTAGTTGTGGCCCCCT
    TCTTTGACAACACATCCCTAGATTACCCCTAAACGCTTTCCCTCAGCTTACCAGACATAATCTGCCTCTT
    ATCTTATTCAGTCAGCCACG
    >HWI-EAS240_0001:2:70:591:1131#0/1
    ----------------------------------------------------------------------
    ----------------------------------------------------------------------
    ----------------------------------------------------------------------
    ----------------------------------------------------------------------
    ----------------------------------------------------------------------
    ----------------------------------------------------------------------
    ----------------------------------------------------------------------
    ----------------------------------------------------------------------
    ----------------------------------------------------------------------
    ----------------------------------------------------------------------
    ----------------------------------------------------------------------
    -------------------------------------------------GCTTTTCGGTGCCAGGTCGTA
    CAAGATACGGAAAGG
    >HWI-EAS240_0001:2:22:830:70#0/1
    ----------------------------------------------------------------------
    ----------------------------------------------------------------------
    ----------------------------------------------------------------------
    ----------------------------------------------------------------------
    ----------------------------------------------------------------------
    ----------------------------------------------------------------------
    ----------------------------------------------------------------------
    ----------------------------------------------------------------------
    ----------------------------------------------------------------------
    ----------------------------------------------------------------------
    ----------------------------------------------------------------------
    -----------------------------------------------------TTCGGTGCCAGGTCGTA
    CAAGATACGGAAAGGCATG
    >HWI-EAS240_0001:2:59:1080:696#0/1
    ----------------------------------------------------------------------
    ----------------------------------------------------------------------
    ----------------------------------------------------------------------
    ----------------------------------------------------------------------
    ----------------------------------------------------------------------
    ----------------------------------------------------------------------
    ----------------------------------------------------------------------
    ----------------------------------------------------------------------
    ----------------------------------------------------------------------
    ----------------------------------------------------------------------
    ----------------------------------------------------------------------
    ----------------------------------------------------------------------
    -------CGGAAAGGCATGTGCACCATACATGCACCAAGATTT
    >HWI-EAS240_0001:2:16:926:1755#0/1
    ----------------------------------------------------------------------
    ----------------------------------------------------------------------
    ----------------------------------------------------------------------
    ----------------------------------------------------------------------
    ----------------------------------------------------------------------
    ----------------------------------------------------------------------
    ----------------------------------------------------------------------
    ----------------------------------------------------------------------
    ----------------------------------------------------------------------
    ----------------------------------------------------------------------
    ----------------------------------------------------------------------
    ----------------------------------------------------------------------
    ---------GAAAGGCATGTGCACCATACATGCACCAAGATTTCA

    Ideally I would like an aligned consensus sequence of the reads to the reference. Any ideas?

    Thanks!

    John

  • #2
    I don't know about Mosaik but I worked a lot with ace files produced by assemblers like CAP3, Phrap etc. and if Mosaik is a real assembler and follows the same format then in the ace file the sequence block following the "CO " line should be the consensus (assembly) sequence that you are after, e.g.:
    Code:
    CO CL10003_1 836 2 0 U
    ACAACTCAGCTAAGATACAGAAAAAGTCAAGAAACAAACAATAGTGAATGCTAGGAAGGC
    AAAAAGAGAGATTCAATCAACATACCAAGCTCCATGTAAGTATATCAAGTATATTAACGA
    CATTGAAAGCCACAAGATAGCCATTCCTTTAGCTCTTAAATGGAAAAACGACCTAACCCC
    CATTGAAAGCCACAAGATA [i]... etc. (until an empty line)[/i]
    So if you want just to pull these consensus sequences you can use a simple script taking all these sequence blocks found between a line starting with "CO " and the next empty line, something like this (using perl and assuming Linux/Unix as your platform):
    Code:
    {
    local $/="\n\n";
    while (<>) {
      s/^(\s+)//s;
      if (length>20 && s/^CO\s+(\S+)[^\n]+//s) { 
        chomp;
        print ">$1\n";
        tr/-*\n//d; # removing gaps and line endings
        # reformat and print sequence:
        print join("\n", (unpack('(A72)*',$_)))."\n";
        }
     }# while text blocks
    }

    Comment


    • #3
      .ace consensus sequence

      Hello,

      I am also assembling Solexa data to a reference using MOSAIK and I am looking for a program to create a consensus sequence from an .ace file.

      But I don't think the sequence after the "CO" is the consensus of the assembled reads. I think it is the reference sequence? Correct me if I am wrong.

      Doe's anyone have a good solution to this?

      Thank you!

      Kajsa

      Comment


      • #4
        For traditional de-novo assembly, the contig consensus should be stored in the sequence lines just after the CO line (as discussed above). Getting this sequence (and removing any gaps which are stored as * characters) to convert into FASTA format is quite easy.

        A Python alternative using Biopython 1.52 or later would be just two lines:
        Code:
        from Bio import SeqIO
        SeqIO.convert("input.ace", "ace", "output.fasta", "fasta")
        However, if you are mapping reads onto an existing reference, it sounds like the CO section contains the original reference (plus probably some extra gap characters where an insertion was required to map a read). The ACE file doesn't explicitly contain a consensus of the reference and reads - you would have to look at the aligned reads, and calculate one. This is possible, but much more complicated. For example, how would you proceed if the middle of the reference had no reads mapped to it?

        I may have misunderstood, but I think you want to look at reference guided assembly instead of mapping reads onto a reference.

        Comment


        • #5
          Yeas that's correct, I am mapping reads onto an existing reference. I guess I have to calculate the consensus myself in some way. If no one else already have done a program for that.......?

          Kajsa

          Comment


          • #6
            Originally posted by kajsa View Post
            Yeas that's correct, I am mapping reads onto an existing reference. I guess I have to calculate the consensus myself in some way. If no one else already have done a program for that.......?
            Do you want a consensus if the reference AND the reads, or just the reads?

            Either way, if you know Python this cookbook entry on treating an ACE file as an alignment might be a useful start. Once you have a Biopython alignment object it would be easier to calculate the consensus: http://www.biopython.org/wiki/ACE_contig_to_alignment

            Comment


            • #7
              I want to have a consensus of the assembly of only the reads I recon. Thank you for your help, I will look at that coockbook entry.

              Kajsa

              Comment


              • #8
                I find the AMOS toolset convenient for converting between many differnt assembly outputs/inputs.

                So, in this case you would use

                toAmos which yields a amos Bank file (useful in using Hawkeye for viewing if you like) and the bonus is that you can input information on scaffolding, library sizes, etc and this will be maintained through conversions.

                from this bank file, you could convert to several differnt formats, dump the consensus fasta, reads, etc.
                Justin H. Johnson | Twitter: @BioInfo | LinkedIn: http://bit.ly/LIJHJ | EdgeBio

                Comment


                • #9
                  That also looks very interesting. I will look at that too.

                  Thanks!

                  Kajsa

                  Comment


                  • #10
                    really powerful code, thanks and study the coding ...

                    Comment


                    • #11
                      But the problem remains: ACE format does not consider and thus does not store the consensus sequence ...
                      You should probably use a different alignment format?

                      cheers,
                      Sven

                      Comment


                      • #12
                        Sven: Do you have any suggestions on what alignment format to use?

                        Kajsa

                        Comment


                        • #13
                          It looks like you are trying to generate a consensus fasta file from a mapping/alignment, and you are using ace only because it is what is produced by mosaik (mosaik also produces sam/bam, but not useful for producing a consensus from a mapping as far as I can tell). One option is to use the Amos tools to produce a fasta consensus from the ace (after converting to the native amos format .bnk) . Most alignment formats do not include the reference and do not generate or recall a consensus from a multiple sequence alignment as far as I know, since most people only want to do variant detection downstream of the mapping. Mosaik will create a consensus of the mapping assembly, but only stores it in ace format. It will also produce SAM/BAM if you want to use the SAMtools downstream for further analysis.
                          Justin H. Johnson | Twitter: @BioInfo | LinkedIn: http://bit.ly/LIJHJ | EdgeBio

                          Comment


                          • #14
                            We didn't get the Amos tools to work on our mac computers. But we found a solution: We can export a BAM file and then use SAM tools to extract the consensus sequence.

                            Comment


                            • #15
                              Originally posted by kajsa View Post
                              We didn't get the Amos tools to work on our mac computers. But we found a solution: We can export a BAM file and then use SAM tools to extract the consensus sequence.
                              I think you are talking about using 'pileup' command with -c (consensus calling parameter). Am i right?

                              I also have .ace file from mosaik but i dont know whats the next step to do with it. Can you guide me?
                              ~Adnan~

                              Comment

                              Working...
                              X