Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Advice on analysis pipeline

    I am new to seqanswers. I have already searched for answers to some of my questions. Forgive me if there are already posts that address them, though.

    I am using illumina for a resequencing project to explore the genetic diversity of an RNA virus population (RNA viruses have on average one mutation per genome per replication cycle - so A LOT of SNPs). I am trying to get up to speed on analysis programs and am learning basic python as well. Would like to minimize the number of programs that I use, but realize that I may need several to achieve my analytic goals, which are:

    1. Align reads to my reference genome (~7kb) and generate data/histograms of per base coverage

    2. Identify unique reads

    3. Identify single nucleotide polymorphisms and their approximate frequency

    4. Determine whether polymorphisms are synonymous or nonsynonymous for amino acid change based on known viral reference sequence (many tools I have found are for humans, mice, yeast etc.). If I need to write a script myself fine, but do people have suggestions on how to incorporate coding frame into short read analysis?

    5. Get a frequency count on the types of polymorphism (NS vs. S, charge changes, stop codons) on a per codon or base level

    6. Map polymorphisms to reference genome and possibly localize to a given protein or sub-protein domain if possible

    Any help on any of these is much appreciated. Anticipate my biggest block will be with #4 and #5.

    Thanks!

  • #2
    If I were you, I'd start with Bowtie (link). I found it to be pretty fast and straightforward to use.

    You'll first need to turn your reference genome into a Bowtie index using the bowtie-build program. After that, you can align using the bowtie program. I'd recommend using the -S option to output the results in SAM format; I like the samtools package (link), and it's what I'd use next.

    samtools will let you take the SAM file that gets output by Bowtie and turn it into a BAM, or binary SAM, file. First you'll use samtools import to turn the SAM into a BAM, then you'll use samtools sort to sort the BAM, and finally samtools index to make it more useful in future applications. The samtools pileup command will help you calculate coverage.

    You can identify unique reads with simple command line tools like grep, or a simple Python program.

    For SNPs and some of the other stuff, I'd suggest using IGV (link) - you've got a pretty small genome, and looking at it by hand is (in my opinion) a good way to get started. IGV will take read qualities into consideration when calling SNPs so you don't end up chasing SNPs that are the result of sequencing errors.

    Once you're to some of the more difficult stuff, I'd suggest checking out Biopython (link). It's a powerful set of tools, and quite useful in general. I don't know if it will address all your needs, but it's a decent place to start.

    Hope that helps!

    P.S. Just my opinion, I'm sure everyone here has their own favorite pipeline. Your mileage may vary.
    Last edited by martian_bob; 05-04-2010, 07:04 AM. Reason: Hedging

    Comment


    • #3
      Thanks for your tips. I have already starting playing around with bowtie and have made my indexes. One concern with this one is that there is a slight possibility that some of my reads could have >2 mutations, which is the limit for bowtie? I guess I will see this as unmappable reads?

      Will try the SAMtools to IGV workflow. Do you (or anyone else) have familiarity with Maq and whether it would be a one stop solution to some of my post-alignment analysis?

      Comment


      • #4
        Bowtie's limit is 3 mutations, so there's that. I have no familiarity with MAQ at all, but I know that a lot of people on these boards use it.

        Comment


        • #5
          you might wanna checkout usegalaxy.org for analysis pipeline. if your data is small,it should be a breeze to use.
          http://kevin-gattaca.blogspot.com/

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Recent Advances in Sequencing Technologies
            by seqadmin



            Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

            Long-Read Sequencing
            Long-read sequencing has seen remarkable advancements,...
            12-02-2024, 01:49 PM
          • seqadmin
            Genetic Variation in Immunogenetics and Antibody Diversity
            by seqadmin



            The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
            11-06-2024, 07:24 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 12-02-2024, 09:29 AM
          0 responses
          153 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 12-02-2024, 09:06 AM
          0 responses
          51 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 12-02-2024, 08:03 AM
          0 responses
          43 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 11-22-2024, 07:36 AM
          0 responses
          76 views
          0 likes
          Last Post seqadmin  
          Working...
          X