Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • BurlEarl
    Member
    • Jun 2012
    • 19

    Excited to get started on Viral sequencing

    Hi All,

    I am starting a possible thesis project involving sequencing some virus passaged through different mouse genotypes. It should be a great learning experience regardless of the outcome(which I can only assume will be success). Anywho, I am having a couple of problems. While I am trying to develop/optimize a wetlab protocol, I am trying to get familiar with Linux and the algorithms that will be applicable.

    So, the project will be to call variants in passaged retroviral samples from mice. The wetlab work will entail separating virus RNA from host, as there are putative expressed endogenous retroviruses in the mouse strains we have used. The viral blood titers are very low with this specific infection, so I will be using a tissue homogenate. Any RNA prep will bring a large amount of host message too. My workflow right now will basically try to separate the sequences with specific priming in the RT reaction and the subsequent PCR. Right now I am at the RT=> PCR step and getting wacky results. Using my RT as a template for a PCR step gives no amplicon, while the -RT gives a strong band where I expect. This is reproducible over 4 replicates with 2 different forms of -RT(-primer and -reverse transcriptase). This is confusing the bejesus out of me. Any contamination that could be primed should be also in the +RT. I am at a loss for the moment. On to the other half...

    While the wetlab protocol is being worked out, I am spending a bunch of time trying to setup a pipeline for the data that will hopefully be generated soon. This will be a daunting task for me. The only bioinformatics work i have done was delving into some metagenomic data to find a very conserved biosynthetic pathway. All the infrastructure was set up for me. The Cygwin i was using already had all the appropriate modules and the scripts were more or less plug and play. My situation now is a bit different. The infrastructure and pipelines are going to have to be set up by me. This is way harder than just looking at the velvet manual to find out how to change a couple of parameters on an assembly. That said, I am very excited for the chance to do learn a new skill set.

    On to my bioinformatics problem...

    When I get some server access, hopefully this afternoon, I want to be able to start to compile the programs I will need to analyze my data and give it some test runs on small datasets. Umm...this is where I am going to sound very silly. I dont really know what the pipeline will look like. As I understand it, the pipeline will entail:
    alignment-sort-dedup-clean-indel realignment-variant call

    I am trying to figure out the best packages for small templates (9kb) that are optimized for pooled data.

    If anyone has suggestions, advice or hints I would love to hear them.

    BTW this is a sweet forum. I would hate to have to pool all this information from google searches. Thanks everyone for being a part of it. I look forward to being able to contribute some day.


    Thanks a bunch,
    Earl

    Wow, this post is entirely too long.
    --Please take everything thing I say with a grain of salt, because, if grad school has taught me anything, it's that I'm an idiot--
  • arolfe
    Member
    • Jul 2011
    • 29

    #2
    I'd start with the assumption that you'll change everything in the bioinformatics pipeline between the initial and final versions and that you'll do lots and lots of testing and tweaking along the way. Make sure the whole thing is automated/scripted such that you can run the script with options to specify (1) input file (2) which programs to use (eg, which aligner) and (3) which options. You don't need to write all that on the first pass, just start with a simple initial version and work your way up. Scripting it all like this then makes it easy to start 10 variations on the cluster over the weekened so you can come in Monday morning to compare results.

    I like shell scripts for this, since they make it easy to cut and paste commands when you're debugging. If you save intermediate results to disk at every point (rather than piping | them from one command to the next) then you can run just part of your pipeline by hand when necessary.

    If you weren't already planning on it, I'd generate a reference sequence input for your aligner that's the mouse + viral genomes. After you align, you can just take reads that map to the viral chromosome. This avoids some of the difficulty of deciding what's viral and what's mouse because the two genomes are competing for reads in the alignment.

    I've had good luck with Bowtie, Bowtie2, and Freebayes for SNP calling, though there are lots of options. One thing to watch out for in SNP calling is what assumptions the program makes- does it assume you're working on a diploid genome?

    good luck!

    Alex

    Comment

    • BurlEarl
      Member
      • Jun 2012
      • 19

      #3
      Thanks Alex.

      I didnt really think to just use the endogenous sequences as reference to compete them away from the viral genome. As for SNP calling for pooled sequences, I was told to check out SNVer. They even have a GUI for numbskulls like me! Hopefully I can manage without. I just got my server space up and running, so I have a whole new set of stuff to play with.

      Thanks again,
      Earl
      --Please take everything thing I say with a grain of salt, because, if grad school has taught me anything, it's that I'm an idiot--

      Comment

      • Geoffreyion
        Junior Member
        • Jul 2012
        • 6

        #4
        Exactly I also think that the post is too long but it's quite informative also. Right. It will be also boring reading this post. This post is all about the virus errors causing your system to get encrypted. It should be read to be caution against further viruses. click here
        Last edited by Geoffreyion; 04-10-2013, 10:30 PM.
        click here >>> Light in the box coupon code

        Comment

        Latest Articles

        Collapse

        • GATTACAT
          Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
          by GATTACAT
          Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
          07-01-2026, 11:43 AM
        • SEQadmin2
          Nine Things a Sample Prep Scientist Thinks About Before Sequencing
          by SEQadmin2


          I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

          Here are nine questions we think about, in roughly the order they matter, before...
          06-18-2026, 07:11 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by SEQadmin2, 07-02-2026, 11:08 AM
        0 responses
        11 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-30-2026, 05:37 AM
        0 responses
        13 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-26-2026, 11:10 AM
        0 responses
        20 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-17-2026, 06:09 AM
        0 responses
        54 views
        0 reactions
        Last Post SEQadmin2  
        Working...