Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Simulated Dataset of Solexa

    Dear all,

    Is there any resource in which we can download the synthetically
    generated Solexa datasets? e.g. with "known" tags.

    The aim is to test our algorithm of mapping tags in the genome.
    We also want to evaluate the error correction model of tag count
    with this simulated dataset.

  • #2
    Hi

    You can try this program



    cheers
    Colin

    Comment


    • #3
      Thanks so much. I owe you one Collin.

      Comment


      • #4
        Hi,
        I downloaded metasim but I discovered that by default I can simulate only Illumina reads that are not paired and of length 36. Do you know the parameters that have to be used in order to generate an acceptable coverage composed by paired end illumina reads of length greater then 50 bases?

        Thanks
        Francesco

        Comment


        • #5
          Hi Franscesco

          In short, no. This may stretch the capabilities of the program. Why not write to the authors - I am sure this kind of feature would be useful for a lot of people with all sorts of next gen sequencing read lengths coming out now.
          Maybe they would be prepared to add a specific Illumina option.

          cheers
          Colin

          Comment


          • #6
            A lot of other groups have written various short read simulators. For example, both maq and samtools include reads simulators having the same code base. Maq's is able to learn error profile from a known fastq file but the read length may also be limited by the training data at the same time. The wgsim in samtools only generates uniform errors, but removes the limit of training data.

            I know people from BGI and Gabor's group have also implemented good short read simulators.

            Comment


            • #7
              Dear all,

              I just download WGSIM from de SAMTools package but I didn't find any manual. I success to generate reads Fastq files but I don't understand how to control read length. And I don't understand why there is 2 output files. Are the reads paired-end reads or single-end reads?

              Thanks

              Maria

              Comment


              • #8
                Originally posted by maria.b View Post
                Dear all,

                I just download WGSIM from de SAMTools package but I didn't find any manual. I success to generate reads Fastq files but I don't understand how to control read length. And I don't understand why there is 2 output files. Are the reads paired-end reads or single-end reads?

                Thanks

                Maria
                The output files are meant for BWA or MAQ, where each paired end is in a separate file. To get a list of options, including the options to control read length (-1 and -2), use ./wgsim -h. Let me know if you need a version of this simulator for other aligners (I forked this simulator in a similar package: DNAA).

                Nils

                Comment


                • #9
                  Can someone please help me understand the parameters of wgsim. I am struggling to understand how changing the standard deviation [-s paramater; default value = 50] and the "outer distance between the two ends" [-d parameter; default value = 500] will affect the output of the simulation.

                  I am generating a synthetic library of genomic loci that vary in size. For instance I get coordinates a,b,c...z and for each position in the genome I generate a set of subsequences centred in the respective coordinate but with varying length.

                  The problem is that some of the subsequences retrieved by my script are ignored when using them as an input for wgsim.

                  I have played a bit around and found that the minimum length of an input sequence for wgsim must be (s x 3) + d. I can trick my script to generate sequences bigger than that value but I want to understand better what is the simulator doing.

                  Thanks in advance

                  Comment


                  • #10
                    I tried download page of DNAA at SF, http://sourceforge.net/projects/dnaa/files/.
                    But there is no file available.

                    Comment


                    • #11
                      Originally posted by Auction View Post
                      I tried download page of DNAA at SF, http://sourceforge.net/projects/dnaa/files/.
                      But there is no file available.
                      Ah, get the source code via git as there is not release yet:
                      Code:
                      git clone git://dnaa.git.sourceforge.net/gitroot/dnaa/dnaa
                      Nils

                      Comment


                      • #12
                        I've got it. Thanks

                        Comment


                        • #13
                          Hi,

                          Can someone share experience using DNAA? How does it work!

                          We are working to generate a set of reads from given reference sequence (Mitochondria genome), and then map those reads using an aligner, and verify if the artificial SNPs were identified. Essentially, having the information in read header, where in the genome it was generated from, and having the knowledge of artificially inserted SNP positions is important.
                          --
                          bioinfosm

                          Comment


                          • #14
                            Originally posted by bioinfosm View Post
                            Hi,

                            Can someone share experience using DNAA? How does it work!

                            We are working to generate a set of reads from given reference sequence (Mitochondria genome), and then map those reads using an aligner, and verify if the artificial SNPs were identified. Essentially, having the information in read header, where in the genome it was generated from, and having the knowledge of artificially inserted SNP positions is important.
                            As the main developer, it works great (ha)! Seriously though, there are many tools that I use frequently (some gems) that are released as is. I would be happy to add anyone as a developer. The tools include simulation code, SAM/BAM manipulation, SV detection tools, and pre and post alignment QC tools.

                            For generating reads from a simulated genome, it works quite well. The code is taken from Heng Li's "wgsim" found in samtools. I modified it to handle SOLiD data faithfully as well as model error rates by cycle/ligation (non-uniform error rates). Once the reads have been generated, you can run your favorite aligner. I then have a fast C-program evaluate your SAM/BAM file. Furthermore, if you SNP call with samtools, there is a PERL script to evaluate your pileup calls given the simulated variants.

                            Nils

                            Please leave your feedback on its usefulness

                            Nils

                            Comment


                            • #15
                              Hello,

                              Does the git command above still work? I've tried it a few times today with no luck:

                              Code:
                              $ git clone git://dnaa.git.sourceforge.net/gitroot/dnaa/dnaa
                              Initialized empty Git repository in /[my local path]/dnaa/.git/
                              dnaa.git.sourceforge.net[0: 216.34.181.91]: errno=Connection timed out
                              fatal: unable to connect a socket (Connection timed out)
                              Thanks,
                              Leonardo

                              PS I'll try later from home as I guess that it could be a local network issue.

                              Edit: It worked perfectly at home, so I guess that the port git uses is blocked at my workplace.
                              Last edited by lcollado; 03-01-2010, 08:24 PM.
                              L. Collado Torres, Ph.D. student in Biostatistics.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              17 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              22 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              16 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              46 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X