Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Advice on analysis pipeline

    I am new to seqanswers. I have already searched for answers to some of my questions. Forgive me if there are already posts that address them, though.

    I am using illumina for a resequencing project to explore the genetic diversity of an RNA virus population (RNA viruses have on average one mutation per genome per replication cycle - so A LOT of SNPs). I am trying to get up to speed on analysis programs and am learning basic python as well. Would like to minimize the number of programs that I use, but realize that I may need several to achieve my analytic goals, which are:

    1. Align reads to my reference genome (~7kb) and generate data/histograms of per base coverage

    2. Identify unique reads

    3. Identify single nucleotide polymorphisms and their approximate frequency

    4. Determine whether polymorphisms are synonymous or nonsynonymous for amino acid change based on known viral reference sequence (many tools I have found are for humans, mice, yeast etc.). If I need to write a script myself fine, but do people have suggestions on how to incorporate coding frame into short read analysis?

    5. Get a frequency count on the types of polymorphism (NS vs. S, charge changes, stop codons) on a per codon or base level

    6. Map polymorphisms to reference genome and possibly localize to a given protein or sub-protein domain if possible

    Any help on any of these is much appreciated. Anticipate my biggest block will be with #4 and #5.

    Thanks!

  • #2
    If I were you, I'd start with Bowtie (link). I found it to be pretty fast and straightforward to use.

    You'll first need to turn your reference genome into a Bowtie index using the bowtie-build program. After that, you can align using the bowtie program. I'd recommend using the -S option to output the results in SAM format; I like the samtools package (link), and it's what I'd use next.

    samtools will let you take the SAM file that gets output by Bowtie and turn it into a BAM, or binary SAM, file. First you'll use samtools import to turn the SAM into a BAM, then you'll use samtools sort to sort the BAM, and finally samtools index to make it more useful in future applications. The samtools pileup command will help you calculate coverage.

    You can identify unique reads with simple command line tools like grep, or a simple Python program.

    For SNPs and some of the other stuff, I'd suggest using IGV (link) - you've got a pretty small genome, and looking at it by hand is (in my opinion) a good way to get started. IGV will take read qualities into consideration when calling SNPs so you don't end up chasing SNPs that are the result of sequencing errors.

    Once you're to some of the more difficult stuff, I'd suggest checking out Biopython (link). It's a powerful set of tools, and quite useful in general. I don't know if it will address all your needs, but it's a decent place to start.

    Hope that helps!

    P.S. Just my opinion, I'm sure everyone here has their own favorite pipeline. Your mileage may vary.
    Last edited by martian_bob; 05-04-2010, 07:04 AM. Reason: Hedging

    Comment


    • #3
      Thanks for your tips. I have already starting playing around with bowtie and have made my indexes. One concern with this one is that there is a slight possibility that some of my reads could have >2 mutations, which is the limit for bowtie? I guess I will see this as unmappable reads?

      Will try the SAMtools to IGV workflow. Do you (or anyone else) have familiarity with Maq and whether it would be a one stop solution to some of my post-alignment analysis?

      Comment


      • #4
        Bowtie's limit is 3 mutations, so there's that. I have no familiarity with MAQ at all, but I know that a lot of people on these boards use it.

        Comment


        • #5
          you might wanna checkout usegalaxy.org for analysis pipeline. if your data is small,it should be a breeze to use.
          http://kevin-gattaca.blogspot.com/

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM
          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Today, 08:47 AM
          0 responses
          10 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          60 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          57 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          53 views
          0 likes
          Last Post seqadmin  
          Working...
          X