Header Leaderboard Ad

Collapse

Using solexa to correct 454 homopolymer errors

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using solexa to correct 454 homopolymer errors

    Hello All,

    I am currently attempting to resolve homoploymer and other errors in my 454 assembly using solexa data.
    If anyone has any suggestions of the best tools and/or pipelines to use for this I would greatly appreciate any input

    regards

    Brian

  • #2
    One tip is to simply use a short-read aligner (MAQ, Bowtie, BWA, Novoalign etc.) to align the Solexa reads against your 454 assembly. The settings will decide how helpful this is, but you should be able to find SNPs or indels around incorrectly called homopolymers. Or just view the alignment in regions where you have homopolymers of a certain length. This relies on paired-end Solexa data or Solexa fragments perhaps 50 or 75 bases in length, I find it didn't work well with fragment libraries of 36 bases.

    Alternatively you could try a hybrid assembly in MIRA.

    Let us know how you get on.

    Comment


    • #3
      Originally posted by coldturkey View Post
      I am currently attempting to resolve homoploymer and other errors in my 454 assembly using solexa data.
      If anyone has any suggestions of the best tools and/or pipelines to use for this I would greatly appreciate any input
      Nesoni can do this for you. Feed it your 454 contigs as the "reference" and the Illumina reads as the "reads". Run "nesoni shrimp" then "nesoni consensus". The working folder will contain "reference_consensus.fa" or similar which is effectively the "corrected" 454 contigs.

      Download site: http://www.vicbioinformatics.com/nesoni.shtml

      Comment


      • #4
        removed...
        Last edited by glacerda; 08-12-2010, 01:42 PM. Reason: forgot to quote

        Comment


        • #5
          nesoni consensus error

          Originally posted by Torst View Post
          Nesoni can do this for you. Feed it your 454 contigs as the "reference" and the Illumina reads as the "reads". Run "nesoni shrimp" then "nesoni consensus". The working folder will contain "reference_consensus.fa" or similar which is effectively the "corrected" 454 contigs.

          Download site: http://www.vicbioinformatics.com/nesoni.shtml

          Hi Torst, this tool looks very interesting.
          I have tried to run "nesoni consensus" and it prints the following error messages. Do you know what is happening? It looks like there is a missing module, called statistics, which is not included in the nesoni package. Do you know how to solve that? Thank you very much.

          Traceback (most recent call last):
          File "/usr/bin/nesoni", line 15, in <module>
          sys.exit(nesoni.main(sys.argv[1:]))
          File "/usr/lib/python2.6/site-packages/nesoni/__init__.py", line 155, in main
          plot
          File "/usr/lib/python2.6/site-packages/nesoni/grace.py", line 137, in execute
          commands[args[start]](args[start+1:end])
          File "/usr/lib/python2.6/site-packages/nesoni/__init__.py", line 80, in consensus
          grace.load('consensus').main(args)
          File "/usr/lib/python2.6/site-packages/nesoni/grace.py", line 22, in load
          m = __import__(module_name, globals())
          File "/usr/lib64/python2.6/site-packages/pyximport/pyximport.py", line 328, in load_module
          self.pyxbuild_dir)
          File "/usr/lib64/python2.6/site-packages/pyximport/pyximport.py", line 181, in load_module
          mod = imp.load_dynamic(name, so_path)
          File "consensus.pyx", line 11, in init nesoni.consensus (/root/.pyxbld/temp.linux-x86_64-2.6/pyrex/consensus.c:24117)
          from nesoni import io, grace, shrimp, statistics
          ImportError: Building module failed: ["AttributeError: 'module' object has no attribute 'statistics'\n"]

          Comment


          • #6
            Ah. Oops. distutils didn't automatically update the MANIFEST for some reason.

            Try this: http://bioinformatics.net.au/nesoni-0.35.tar.gz


            Note that "nesoni shrimp" and "nesoni consensus" use SHRiMP 1.0 (rmapper-ls). To use SHRiMP 2.0 (gmapper-ls), use "nesoni samshrimp" and "nesoni samconsensus" -- but this is very new code, not thoroughly tested.

            Your corrected contigs will be in a file called consensus_masked.fa -- this file defaults back to the reference sequence (in lowercase) if a consensus can't be called from the aligned reads.
            Last edited by pfh; 08-12-2010, 09:57 PM.

            Comment


            • #7
              nesoni samconsensus

              Originally posted by pfh View Post
              Ah. Oops. distutils didn't automatically update the MANIFEST for some reason.

              Try this: http://bioinformatics.net.au/nesoni-0.35.tar.gz


              Note that "nesoni shrimp" and "nesoni consensus" use SHRiMP 1.0 (rmapper-ls). To use SHRiMP 2.0 (gmapper-ls), use "nesoni samshrimp" and "nesoni samconsensus" -- but this is very new code, not thoroughly tested.

              Your corrected contigs will be in a file called consensus_masked.fa -- this file defaults back to the reference sequence (in lowercase) if a consensus can't be called from the aligned reads.
              Hi PFH, thank you very much for your help and for making this software package available for the community. The version 0.35 has solved the error I pointed, but there was another error in the samconsensus. The error occurs after it has printed all the *.userplot files. There follows the error message.

              Traceback (most recent call last):
              File "/usr/bin/nesoni", line 15, in <module>
              sys.exit(nesoni.main(sys.argv[1:]))
              File "/usr/lib/python2.6/site-packages/nesoni/__init__.py", line 177, in main
              recombination
              File "/usr/lib/python2.6/site-packages/nesoni/grace.py", line 160, in execute
              commands[args[start]](args[start+1:end])
              File "/usr/lib/python2.6/site-packages/nesoni/__init__.py", line 127, in samconsensus
              grace.load('sam').consensus_main(args, True)
              File "/usr/lib/python2.6/site-packages/nesoni/sam.py", line 1122, in consensus_main
              references[name] = Ref_seq( seq.upper() )
              File "/usr/lib/python2.6/site-packages/nesoni/sam.py", line 955, in __init__
              self.base_counts = [ consensus.EMPTY_EVIDENCE ] * len(seq)
              AttributeError: 'module' object has no attribute 'EMPTY_EVIDENCE'
              Waiting for data... (interrupt to abort)

              Comment


              • #8
                This might be a minor difference between Cython versions, which version are you using? I have 0.12.1.

                Anyway, this might fix it:

                http://bioinformatics.net.au/nesoni-0.36.tar.gz

                Comment


                • #9
                  Dear nesoni developers,

                  I ran nesoni with
                  nesoni samshrimp
                  and
                  nesoni samconsensus
                  on a reference set of 454 contigs with 36 bp Illumina reads mapped to them.

                  Results are something like this -

                  contig00006 shrimp_consensus variation 7779 7780 . + . product=Insertion: .C. ("C"x32 "-"x10)
                  contig00006 shrimp_consensus variation 27322 27322 . + . product=Base deleted: C ("-"x22 "C"x5)
                  contig00006 shrimp_consensus variation 64220 64220 . + . product=Base deleted: T ("-"x29 "T"x7)
                  contig00011 shrimp_consensus variation 85 85 . + . product=Substitution: C became A ("A"x10)
                  contig00011 shrimp_consensus variation 100 100 . + . product=Substitution: T became C ("C"x9)
                  contig00011 shrimp_consensus variation 19231 19231 . + . product=Base deleted: C ("-"x23 "C"x8)
                  contig00012 shrimp_consensus variation 34618 34618 . + . product=Base deleted: A ("-"x26 "A"x10 "C"x1)
                  contig00015 shrimp_consensus variation 78530 78530 . + . product=Base deleted: A ("-"x30 "A"x11)
                  contig00019 shrimp_consensus variation 122482 122482 . + . product=Base deleted: A ("-"x25 "A"x9)
                  contig00020 shrimp_consensus variation 62624 62624 . + . product=Substitution: T became V ("G"x3 "C"x1 "A"x1)
                  contig00024 shrimp_consensus variation 72892 72892 . + . product=Base deleted: T ("-"x22 "C"x4 "T"x1)
                  contig00024 shrimp_consensus variation 143531 143531 . + . product=Base deleted: A ("-"x27 "A"x8)
                  contig00030 shrimp_consensus variation 13275 13275 . + . product=Substitution: T became A ("A"x21 "T"x2)
                  contig00030 shrimp_consensus variation 13279 13279 . + . product=Substitution: C became T ("T"x21 "C"x2)
                  contig00030 shrimp_consensus variation 13283 13283 . + . product=Substitution: A became C ("C"x21 "A"x2)
                  contig00042 shrimp_consensus variation 57483 57483 . + . product=Substitution: A became Y ("C"x4 "T"x1)
                  contig00045 shrimp_consensus variation 20062 20062 . + . product=Base deleted: T ("-"x31 "T"x14)
                  contig00046 shrimp_consensus variation 43525 43525 . + . product=Base deleted: T ("-"x35 "T"x16)
                  contig00048 shrimp_consensus variation 69141 69141 . + . product=Base deleted: T ("-"x41 "T"x14)
                  contig00051 shrimp_consensus variation 85164 85165 . + . product=Insertion: .T. ("T"x21 "-"x5)
                  contig00057 shrimp_consensus variation 11841 11841 . + . product=Substitution: T became V ("C"x3 "G"x1 "A"x1)
                  contig00057 shrimp_consensus variation 86311 86311 . + . product=Substitution: C became G ("G"x26)
                  contig00071 shrimp_consensus variation 14588 14588 . + . product=Substitution: G became T ("T"x29 "G"x10)
                  contig00071 shrimp_consensus variation 14591 14591 . + . product=Substitution: G became A ("A"x33 "G"x7)
                  contig00073 shrimp_consensus variation 13829 13829 . + . product=Base deleted: A ("-"x40 "A"x18)
                  contig00088 shrimp_consensus variation 214369 214370 . + . product=Insertion: .G. ("G"x26 "-"x4)
                  contig00091 shrimp_consensus variation 42090 42090 . + . product=Base deleted: T ("-"x12)
                  contig00099 shrimp_consensus variation 99157 99157 . + . product=Base deleted: T ("-"x29 "T"x5)
                  contig00100 shrimp_consensus variation 39818 39819 . + . product=Insertion: .T. ("T"x39 "-"x12)

                  I was actually expecting more corrections for a 6 Mbp genome. I haven't systematically surveyed homopolymer errors in the 454 contigs but would expect more than the ca. 46 corrections here.

                  What experience do you have with
                  -coverage of contigs
                  -coverage of reads
                  -number of corrections per megabase

                  cheers
                  colin

                  Comment


                  • #10
                    Hello everybody,

                    I installed nesoni 0.4 but get the following error when starting it with a test dataset: can anybody spot the mistake?

                    Best,

                    Yvan

                    >nesoni samshrimp nesoni_output crebs.fa reads ./GEX-1.fq
                    Running gmapper-ls -E -T -w 200% -n 2 -N 8 -X
                    Traceback (most recent call last):
                    File "/usr/local/bin/nesoni", line 15, in <module>
                    sys.exit(nesoni.main(sys.argv[1:]))
                    File "/usr/local/lib/python2.6/dist-packages/nesoni/__init__.py", line 192, in main
                    recombination
                    File "/usr/local/lib/python2.6/dist-packages/nesoni/grace.py", line 160, in execute
                    commands[args[start]](args[start+1:end])
                    File "/usr/local/lib/python2.6/dist-packages/nesoni/__init__.py", line 126, in samshrimp
                    grace.load('sam').shrimp2_main(args)
                    File "/usr/local/lib/python2.6/dist-packages/nesoni/sam.py", line 602, in shrimp2_main
                    stderr=log_file)
                    File "/usr/local/lib/python2.6/dist-packages/nesoni/sam.py", line 40, in run
                    close_fds=True,
                    File "/usr/lib/python2.6/subprocess.py", line 633, in __init__
                    errread, errwrite)
                    File "/usr/lib/python2.6/subprocess.py", line 1139, in _execute_child
                    raise child_exception
                    OSError: [Errno 2] No such file or directory
                    [samopen] no @SQ lines in the header.
                    [sam_read1] missing header? Abort !

                    Comment


                    • #11
                      I had a similar error I think -
                      check shrimp2 is installed and accessible from the nesoni directory on the cmd line, i.e. is in your system path

                      Comment


                      • #12
                        Originally posted by colindaven View Post
                        I was actually expecting more corrections for a 6 Mbp genome. I haven't systematically surveyed homopolymer errors in the 454 contigs but would expect more than the ca. 46 corrections here.
                        I have found that with old 454 GS20 and GS FLX data, we were correcting about 20+ errors per megabase. However, with newer GS FLX and Titanium data (which includes higher yield/coverage), combined with newer versions of Newbler, this has been descreasing to ~10 errors per megabase.

                        So your 46/6 ~ 7.5 errors / Mbp is expected if your data/results are from recent data.

                        Comment


                        • #13
                          The relevant program from SHRiMP 2 will be "gmapper-ls".

                          It looks like this is failing to run, and then samtools is discombobulated by receiving empty input.

                          Comment


                          • #14
                            Hello Colin and pfh,

                            Yes, thanks, that was the problem, I did not correctly added shrimp2 to my path...

                            Best,

                            Yvan

                            Comment


                            • #15
                              Hello everybody~
                              I installed nesoni-0.58 and SHRiMP_2_2_1.
                              And ran "nesoni samshrimp"

                              >nesoni samshrimp test_out J_mapper.fasta J.fq
                              Error: No read files given

                              anyone can tell me what problem is ??

                              thanks a lot

                              Comment

                              Working...
                              X