Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • username111
    Junior Member
    • Aug 2014
    • 4

    How to align reads to other reads (not to reference genome)

    Hello,

    I have a 5 yeast genomes (Illumina MiSeq) -- I will call them A, B, C, D, and E.

    I grew a colony of yeast over a period of a couple days under four different conditions in order to see how each particular condition would affect the genome of the yeast. These conditions should induce mutations in the DNA of the yeast.

    Thus, A is the genome of the yeast I started with, and B-E (one for each condition) are the "altered" genomes of the yeast that I ended up at the end of the experiment.

    I have aligned A-E to the S288C S. cerevisiae reference genome using BWA and called SNPs through two methods (with the mpileup function in SAMtools; also, via GATK and VCFtools), but these methods haven't quite given me the results I wanted.

    To be brief, when mapped to the reference genome, Genomes B-E show fewer mutations than when Genome A is mapped to the reference genome. I would like to align B-E to A to call SNPs/INDELs. I hope, in this way, that I can get a better fit of B-E onto A (since B-E are more closely related to A than they are to the reference genome, and thus should provide a better fit).

    How do I go about mapping B-E onto A? Do I need to process A to serve as a reference genome, and, if so, how would I do that? I have A-E as .fastq files, as well as all the .SAM and .BAM files after aligning to the reference genome.

    I will readily admit I am not particularly good at working with computers, but if you request any other information, please let me know.
  • Brian Bushnell
    Super Moderator
    • Jan 2014
    • 2709

    #2
    That's kind of difficult. You could assemble dataset A into contigs, map the other datasets to it, and call variations, which is easy - but the coordinates of the variations would differ from your original reference so it might not be very informative. You'd have to perform some additional steps to determine which mutation goes with which gene, though it is doable.

    It depends on your goal, of course (could you clarify it?) but the best solution may be to modify the original genome by applying the called variations from A to it, then mapping everything else to the modified genome. Due to indels, the coordinates would still change (though perhaps only slightly in this case) so it would be more difficult to analyze, but that's probably the best way to determine the difference between A and the other samples while retaining a structure similar to the reference.

    If you want to do an analysis with respect to the reference coordinates (which is the most straightforward method), it's best to simply map everything to the reference and compare the variations, as you are already doing. How exactly is it not giving the results you expect?

    Comment

    • syfo
      Just a member
      • Nov 2012
      • 103

      #3
      If you are looking for reference-free SNP calling -forgive me if I've read too quickly- you might try KisSnp and/or take a look at this review.
      Last edited by syfo; 08-18-2014, 07:50 AM. Reason: link fix

      Comment

      • username111
        Junior Member
        • Aug 2014
        • 4

        #4
        Originally posted by Brian Bushnell View Post
        That's kind of difficult. You could assemble dataset A into contigs, map the other datasets to it, and call variations, which is easy - but the coordinates of the variations would differ from your original reference so it might not be very informative. You'd have to perform some additional steps to determine which mutation goes with which gene, though it is doable.
        Can you go over how to do this in a bit more detail? I'm not so concerned as to where (within each gene) the mutation is; I'm more concerned about the overall number of mutations.

        Originally posted by Brian Bushnell View Post
        It depends on your goal, of course (could you clarify it?)
        Hehe, I am not sure how much I can say, as this is not my intellectual property.

        I do apologize

        Comment

        • username111
          Junior Member
          • Aug 2014
          • 4

          #5
          Originally posted by Brian Bushnell View Post
          That's kind of difficult. You could assemble dataset A into contigs, map the other datasets to it, and call variations, which is easy - but the coordinates of the variations would differ from your original reference so it might not be very informative. You'd have to perform some additional steps to determine which mutation goes with which gene, though it is doable.

          Sorry about the slow reply -- can you go over how to do this?

          Comment

          • username111
            Junior Member
            • Aug 2014
            • 4

            #6
            Originally posted by Brian Bushnell View Post
            That's kind of difficult. You could assemble dataset A into contigs, map the other datasets to it, and call variations, which is easy - but the coordinates of the variations would differ from your original reference so it might not be very informative. You'd have to perform some additional steps to determine which mutation goes with which gene, though it is doable.
            Sorry about the slow reply...I thought I had posted this a couple days ago but apparently I had not.

            How would I do this?

            Comment

            Latest Articles

            Collapse

            • seqadmin
              New Genomics Tools and Methods Shared at AGBT 2025
              by seqadmin


              This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

              The Headliner
              The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
              03-03-2025, 01:39 PM
            • seqadmin
              Investigating the Gut Microbiome Through Diet and Spatial Biology
              by seqadmin




              The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
              02-24-2025, 06:31 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Today, 05:03 AM
            0 responses
            16 views
            0 reactions
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 07:27 AM
            0 responses
            13 views
            0 reactions
            Last Post seqadmin  
            Started by seqadmin, 03-18-2025, 12:50 PM
            0 responses
            15 views
            0 reactions
            Last Post seqadmin  
            Started by seqadmin, 03-03-2025, 01:15 PM
            0 responses
            185 views
            0 reactions
            Last Post seqadmin  
            Working...