Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • New to NGS, which tools to learn?

    Hi all. I'm just about to graduate with an Msc in biotechnology, and although I haven't had any training in it specifically, I want to get into NGS/HTS. However, there seems to be very many different options for performing mapping and assembly, and no "gold standard." Which software tools should I learn first?

    Best regards,
    Jarle Pahr

  • #2
    There are threads that cover this to degree, but a super short list of things to do/look into:

    Background:
    1. Get comfortable in a Unix/Linux environment and learn to adequately use sed/awk/bash scripting.
    2. Pick up a scripting language like Python/Perl.
    3. If you want to do more serious coding/software development then learning R and a more "heavy duty language" like C++ or Java will be useful, but this certainly is not required if you do not plan to do real development.

    Programs:
    4. When you get raw data from the sequencer, people often analyze the quality with "FastQC", so look into that. People also typically trim adapters, look into "cutadapt" for one way; "Trim Galore!" and "Trimmomatic" are also good and can handle paired end reads well while "cutadapt" cannot.
    4. As for tools, read up on "samtools" and the sam file format specifications (google it, there's like an 11 page pdf). It will be somewhat hard to really grasp this without data to play with, though.
    5. You will also want to become familiar with "Picard" (tools) and the "Genome Analysis Toolkit" ("GATK").
    6. A common aligner is "Bowtie2"; there are tons of aligners that can be used for different goals but if you have to pick one to start off with, "Bowtie2" is pretty popular.

    More specialized:
    7. To call SNPs/indels people frequently use "samtools mpileup" or the GATK pipeline.
    8. To call structural variants there are many different programs, one of which is "Pindel" (if you just want to get an idea of what's out there).
    9. To call copy number variants there are lots of programs; one program designed specifically for targeted resequencing is "CONTRA".
    10. If you want to call peaks from ChIP-seq data or any data a common tool is MACS.
    11. There are other tools for RNA-seq analysis that I won't mention here as I haven't done it myself.

    Annotation of variant calls:
    12. I always loved the program "ANNOVAR".

    Other:
    13. You'll want to become familiar with the USCS genome browser.
    14. A lot of people use the "Integrated Genome Viewer" ("IGV") to view alignment files locally.
    15. "Galaxy" may be of interest, especially if individuals who are less computationally inclined will be doing any analysis.
    16. "BEDtools" is extremely easy to use and important to know as it's very useful.

    So there are a bunch of tools; only look at the more specialized ones if you think you'll work with those types of datasets. I'm sure there are many more I'm forgetting; for example I almost forgot BEDtools. If you don't have any background in how NGS works, start by finding reviews to teach yourself that.

    Comment


    • #3
      Thank you very much. I already had a rough idea of what packages seem to be popular, but I didn't want to "anchor" the discussion by mentioning any.
      From what you're saying I gather that learning SAMtools, Bowtie and GATK/Picard would be a good start for gaining a core skillset.

      I've already read some reviews and also bought the book on NGS Bioinformatics by Stuart Brown (http://www.amazon.com/Next-Generatio...ref=pd_sim_b_2), but haven't had time to study the theory in detail yet. Looking forward to dvelve more into that. Especially de Bruijn graphs seems worth understanding.

      Comment


      • #4
        I had no idea a book existed. You definitely do not need to know anything about de Bruijin graphs for the vast majority of the analysis people typically do (you will want to know about them if you're developing algorithms of your own where they will come in handy). For a lot of stuff you don't need to understand the underlying algorithms to be able to use the software.

        SAMtools/Picard/GATK/BEDtools is a great place to start. Make sure you understand SAM files well from the sam specifications too.

        Comment


        • #5
          Thanks for your recommendations, I'll keep the advice in mind.

          Comment


          • #6
            If you're doing de novo genome assembly, then the top programs are Allpaths-LG, SOAPdenovo, ABySS, SGA, Ray.

            From transcriptome assembly, Trinity seems the favorite as its easy to use and pretty good. Though depending on data types, Trans-ABySS and MIRA could be worth it too.

            Looking up the original papers for Allpaths, SOAP, ABySS and Trinity and giving them a read is pretty useful too.

            I might add Heisman's recommendations too, that if you're starting to pick up a scripting language, python is the easiest pick. You can be doing useful scripting with it in a just a couple hours of reading. At least for me, that wasn't possible with Perl and I eventually gave up in favor of Python. Also, be sure to pay attention to BioPerl and BioPython. Many of the simple things you may want to make functions for in Perl/Python are likely already modules in BioPerl/Python.

            And don't forget about simple bash commands/scripts.

            You'll likely be using clusters with torque (uses qsub to submit jobs to a scheduler). So, its nice to have some simple bash scripts to kick jobs off. And for somethings sed/awk/grep will just be quicker than python/perl. So I suggest becoming minimally proficient in those too.

            Comment


            • #7
              For RNA-Seq and differential expression

              Tophat
              Star (if you have the computational resources)
              Cufflinks
              DESeq/DESeq2
              EdgeR
              GOseq

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM
              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-25-2024, 11:49 AM
              0 responses
              20 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-24-2024, 08:47 AM
              0 responses
              20 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              62 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              61 views
              0 likes
              Last Post seqadmin  
              Working...
              X