Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to align reads to other reads (not to reference genome)

    Hello,

    I have a 5 yeast genomes (Illumina MiSeq) -- I will call them A, B, C, D, and E.

    I grew a colony of yeast over a period of a couple days under four different conditions in order to see how each particular condition would affect the genome of the yeast. These conditions should induce mutations in the DNA of the yeast.

    Thus, A is the genome of the yeast I started with, and B-E (one for each condition) are the "altered" genomes of the yeast that I ended up at the end of the experiment.

    I have aligned A-E to the S288C S. cerevisiae reference genome using BWA and called SNPs through two methods (with the mpileup function in SAMtools; also, via GATK and VCFtools), but these methods haven't quite given me the results I wanted.

    To be brief, when mapped to the reference genome, Genomes B-E show fewer mutations than when Genome A is mapped to the reference genome. I would like to align B-E to A to call SNPs/INDELs. I hope, in this way, that I can get a better fit of B-E onto A (since B-E are more closely related to A than they are to the reference genome, and thus should provide a better fit).

    How do I go about mapping B-E onto A? Do I need to process A to serve as a reference genome, and, if so, how would I do that? I have A-E as .fastq files, as well as all the .SAM and .BAM files after aligning to the reference genome.

    I will readily admit I am not particularly good at working with computers, but if you request any other information, please let me know.

  • #2
    That's kind of difficult. You could assemble dataset A into contigs, map the other datasets to it, and call variations, which is easy - but the coordinates of the variations would differ from your original reference so it might not be very informative. You'd have to perform some additional steps to determine which mutation goes with which gene, though it is doable.

    It depends on your goal, of course (could you clarify it?) but the best solution may be to modify the original genome by applying the called variations from A to it, then mapping everything else to the modified genome. Due to indels, the coordinates would still change (though perhaps only slightly in this case) so it would be more difficult to analyze, but that's probably the best way to determine the difference between A and the other samples while retaining a structure similar to the reference.

    If you want to do an analysis with respect to the reference coordinates (which is the most straightforward method), it's best to simply map everything to the reference and compare the variations, as you are already doing. How exactly is it not giving the results you expect?

    Comment


    • #3
      If you are looking for reference-free SNP calling -forgive me if I've read too quickly- you might try KisSnp and/or take a look at this review.
      Last edited by syfo; 08-18-2014, 07:50 AM. Reason: link fix

      Comment


      • #4
        Originally posted by Brian Bushnell View Post
        That's kind of difficult. You could assemble dataset A into contigs, map the other datasets to it, and call variations, which is easy - but the coordinates of the variations would differ from your original reference so it might not be very informative. You'd have to perform some additional steps to determine which mutation goes with which gene, though it is doable.
        Can you go over how to do this in a bit more detail? I'm not so concerned as to where (within each gene) the mutation is; I'm more concerned about the overall number of mutations.

        Originally posted by Brian Bushnell View Post
        It depends on your goal, of course (could you clarify it?)
        Hehe, I am not sure how much I can say, as this is not my intellectual property.

        I do apologize

        Comment


        • #5
          Originally posted by Brian Bushnell View Post
          That's kind of difficult. You could assemble dataset A into contigs, map the other datasets to it, and call variations, which is easy - but the coordinates of the variations would differ from your original reference so it might not be very informative. You'd have to perform some additional steps to determine which mutation goes with which gene, though it is doable.

          Sorry about the slow reply -- can you go over how to do this?

          Comment


          • #6
            Originally posted by Brian Bushnell View Post
            That's kind of difficult. You could assemble dataset A into contigs, map the other datasets to it, and call variations, which is easy - but the coordinates of the variations would differ from your original reference so it might not be very informative. You'd have to perform some additional steps to determine which mutation goes with which gene, though it is doable.
            Sorry about the slow reply...I thought I had posted this a couple days ago but apparently I had not.

            How would I do this?

            Comment

            Latest Articles

            Collapse

            • seqadmin
              The Impact of AI in Genomic Medicine
              by seqadmin



              Article Coming Soon......
              Today, 02:07 PM
            • seqadmin
              Multiomics Techniques Advancing Disease Research
              by seqadmin


              New and advanced multiomics tools and technologies have opened new avenues of research and markedly enhanced various disciplines such as disease research and precision medicine1. The practice of merging diverse data from various ‘omes increasingly provides a more holistic understanding of biological systems. As Maddison Masaeli, Co-Founder and CEO at Deepcell, aptly noted, “You can't explain biology in its complex form with one modality.”

              A major leap in the field has
              ...
              02-08-2024, 06:33 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 02-23-2024, 04:11 PM
            0 responses
            31 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 02-21-2024, 08:52 AM
            0 responses
            46 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 02-20-2024, 08:57 AM
            0 responses
            36 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 02-14-2024, 09:19 AM
            0 responses
            63 views
            0 likes
            Last Post seqadmin  
            Working...
            X