Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Excited to get started on Viral sequencing

    Hi All,

    I am starting a possible thesis project involving sequencing some virus passaged through different mouse genotypes. It should be a great learning experience regardless of the outcome(which I can only assume will be success). Anywho, I am having a couple of problems. While I am trying to develop/optimize a wetlab protocol, I am trying to get familiar with Linux and the algorithms that will be applicable.

    So, the project will be to call variants in passaged retroviral samples from mice. The wetlab work will entail separating virus RNA from host, as there are putative expressed endogenous retroviruses in the mouse strains we have used. The viral blood titers are very low with this specific infection, so I will be using a tissue homogenate. Any RNA prep will bring a large amount of host message too. My workflow right now will basically try to separate the sequences with specific priming in the RT reaction and the subsequent PCR. Right now I am at the RT=> PCR step and getting wacky results. Using my RT as a template for a PCR step gives no amplicon, while the -RT gives a strong band where I expect. This is reproducible over 4 replicates with 2 different forms of -RT(-primer and -reverse transcriptase). This is confusing the bejesus out of me. Any contamination that could be primed should be also in the +RT. I am at a loss for the moment. On to the other half...

    While the wetlab protocol is being worked out, I am spending a bunch of time trying to setup a pipeline for the data that will hopefully be generated soon. This will be a daunting task for me. The only bioinformatics work i have done was delving into some metagenomic data to find a very conserved biosynthetic pathway. All the infrastructure was set up for me. The Cygwin i was using already had all the appropriate modules and the scripts were more or less plug and play. My situation now is a bit different. The infrastructure and pipelines are going to have to be set up by me. This is way harder than just looking at the velvet manual to find out how to change a couple of parameters on an assembly. That said, I am very excited for the chance to do learn a new skill set.

    On to my bioinformatics problem...

    When I get some server access, hopefully this afternoon, I want to be able to start to compile the programs I will need to analyze my data and give it some test runs on small datasets. Umm...this is where I am going to sound very silly. I dont really know what the pipeline will look like. As I understand it, the pipeline will entail:
    alignment-sort-dedup-clean-indel realignment-variant call

    I am trying to figure out the best packages for small templates (9kb) that are optimized for pooled data.

    If anyone has suggestions, advice or hints I would love to hear them.

    BTW this is a sweet forum. I would hate to have to pool all this information from google searches. Thanks everyone for being a part of it. I look forward to being able to contribute some day.


    Thanks a bunch,
    Earl

    Wow, this post is entirely too long.
    --Please take everything thing I say with a grain of salt, because, if grad school has taught me anything, it's that I'm an idiot--

  • #2
    I'd start with the assumption that you'll change everything in the bioinformatics pipeline between the initial and final versions and that you'll do lots and lots of testing and tweaking along the way. Make sure the whole thing is automated/scripted such that you can run the script with options to specify (1) input file (2) which programs to use (eg, which aligner) and (3) which options. You don't need to write all that on the first pass, just start with a simple initial version and work your way up. Scripting it all like this then makes it easy to start 10 variations on the cluster over the weekened so you can come in Monday morning to compare results.

    I like shell scripts for this, since they make it easy to cut and paste commands when you're debugging. If you save intermediate results to disk at every point (rather than piping | them from one command to the next) then you can run just part of your pipeline by hand when necessary.

    If you weren't already planning on it, I'd generate a reference sequence input for your aligner that's the mouse + viral genomes. After you align, you can just take reads that map to the viral chromosome. This avoids some of the difficulty of deciding what's viral and what's mouse because the two genomes are competing for reads in the alignment.

    I've had good luck with Bowtie, Bowtie2, and Freebayes for SNP calling, though there are lots of options. One thing to watch out for in SNP calling is what assumptions the program makes- does it assume you're working on a diploid genome?

    good luck!

    Alex

    Comment


    • #3
      Thanks Alex.

      I didnt really think to just use the endogenous sequences as reference to compete them away from the viral genome. As for SNP calling for pooled sequences, I was told to check out SNVer. They even have a GUI for numbskulls like me! Hopefully I can manage without. I just got my server space up and running, so I have a whole new set of stuff to play with.

      Thanks again,
      Earl
      --Please take everything thing I say with a grain of salt, because, if grad school has taught me anything, it's that I'm an idiot--

      Comment


      • #4
        Exactly I also think that the post is too long but it's quite informative also. Right. It will be also boring reading this post. This post is all about the virus errors causing your system to get encrypted. It should be read to be caution against further viruses. click here
        Last edited by Geoffreyion; 04-10-2013, 10:30 PM.
        click here >>> Light in the box coupon code

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          Yesterday, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        57 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        53 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        45 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        55 views
        0 likes
        Last Post seqadmin  
        Working...
        X