Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • RNA-Seq quality controls: golden standard tool?

    Hi,
    I need to establish a protocol for checking the quality of the RNA-Seq data. There are few pipelines for this purpose published over the last few years (ShortRead, htSeqTools, ArrayExpressHTS), and I'm wondering which of these are commonly used and represent golden standards? The questions I ask in my QC is what is the degradation level of the RNA, what is the quality of the sequences coming out of the Illumina platforms (HiSeq at the moment), how the given sample differs from the rest (is it a weird outlier).
    I have other datasets that I would like to evaluate as well, and they are produced on GAII platform of different sequence read lengths. So Ideally, I would be looking for the QC tool that are:
    1) flexible to use different read length
    2) provide rigorous QC with nice graphs
    3) can use the output from the TopHat pipeline.

    I would very much appreciate any help, suggestions, advices.
    Thanks!!!
    Anna

  • #2
    I might not be a great reference but I don't think there IS a standard at this point.

    FastQC is a nice tool to get a set of quality assesment tests and graphs all at once for the raw reads (in FASTQ format) http://www.bioinformatics.babraham.a...ojects/fastqc/. You might use the output of FastQC to help you get an idea of whether you want to trim bases off of the 5' or 3' ends of your reads. Some aligners can do that for you, like BWA. Most aligners provide an option for you to specify some type of threshold for base qualities that are accepted for alignments. So tools like FastQC are just there for you to check up on the quality of your run however they aren't directly used to control what you run through the aligners.

    As far as determining how "any given sample differs from the rest" - this question could be pretty complex to answer. You can look at SNPs, differential gene expression, or splice variant differences (from some novel transcript assembler like cufflinks). You can use the "tuxedo" pipeline to access differential expression and splicing variation between samples. For SNPs I like to use samtools mpileup followed by bcftools for variant calling. After that I use bedtools to make comparisons between VCF outputs from bcftools to determine which SNPs are unique to which samples, which are shared, etc.

    I've had good results from clustering samples in R using its hierarchical clustering function on gene expression output across multiple samples from multiple lines. However determining why any sample clusters separately from others (or more specifically producing a gene list responsible for the clustering) has not been straightforward nor "established" from what I can tell.
    /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
    Salk Institute for Biological Studies, La Jolla, CA, USA */

    Comment


    • #3
      There is a program called RNASeqQC which is more useful that FastQC for this purpose.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Exploring the Dynamics of the Tumor Microenvironment
        by seqadmin




        The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
        07-08-2024, 03:19 PM
      • seqadmin
        Exploring Human Diversity Through Large-Scale Omics
        by seqadmin


        In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
        06-25-2024, 06:43 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 07-10-2024, 07:30 AM
      0 responses
      25 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 07-03-2024, 09:45 AM
      0 responses
      201 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 07-03-2024, 08:54 AM
      0 responses
      211 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 07-02-2024, 03:00 PM
      0 responses
      193 views
      0 likes
      Last Post seqadmin  
      Working...
      X