Originally posted by Davis McC
View Post
Hi zorph
I am one of the developers for the Bioconductor package edgeR , which is designed for carrying out differential expression analysis of count data (like RNA-seq). Check out the User's Guide for more details and case studies to provide examples on how to use the package.
I'm not familiar with Wig files and can't tell what sort of analysis you've carried out already, but colleagues of mine suggest the following sort of steps to go from raw RNA-seq short read data from the raw fasta files, through to GO category testing. You may find this sort of analysis pipeline useful.
Steps with required tools & files
To perform the entire analysis, the following steps and tools will be needed:
1. Get some short read RNA-seq data, for at least two different experimental conditions you wish to compare
2. Choose a reference to map against, and map your data using a short read mapper that outputs in SAM format. We tend to use bowtie. Other options are bwa, SOAP2, novoalign, shrimp.
3. Use SAMtools to convert SAM output into the binary BAM format, which is both smaller on disk and allows for fast indexing.
4. Summarize reads on the gene/transcript/exon level. We use the R platform with the Rsamtools and GenomicFeatures packages.
5. Calculate DE genes from counts summarized on the gene level. We use the R package edgeR, which we have developed, although there are other tools out there. edgeR can account for biological variation in the data (using a negative binomial model), separate biological from technical variation, produce an MDS plot, and conduct exact testing procedures.
6. Perform GO category testing on the results of the differential expression analysis, using the R package goseq.
Considerations for DE Analysis
Extra-Poisson variation (or overdispersion) is typical of RNA-seq data, especially if there is biological replication amongst your samples. If you only have technical replicates then this may not be an issue, but I would recommend running your data through edgeR to get some idea of the inter-library variability. If you have overdispersed data, then using a Poisson model will *drastically* overestimate the levels of differential expression in your data. Using a NB model like in edgeR can account for this extra variation in the data and give much better assessment of DE.
edgeR can deal with overdispersion in the data, investigate inter-library (incl. biological) variability and get exact p-values for DE based on the NB model.
Hope that is helpful and good luck with your data analysis. Please ask if you have any more questions I might be able to help with.
Best regards
Davis
I am one of the developers for the Bioconductor package edgeR , which is designed for carrying out differential expression analysis of count data (like RNA-seq). Check out the User's Guide for more details and case studies to provide examples on how to use the package.
I'm not familiar with Wig files and can't tell what sort of analysis you've carried out already, but colleagues of mine suggest the following sort of steps to go from raw RNA-seq short read data from the raw fasta files, through to GO category testing. You may find this sort of analysis pipeline useful.
Steps with required tools & files
To perform the entire analysis, the following steps and tools will be needed:
1. Get some short read RNA-seq data, for at least two different experimental conditions you wish to compare
2. Choose a reference to map against, and map your data using a short read mapper that outputs in SAM format. We tend to use bowtie. Other options are bwa, SOAP2, novoalign, shrimp.
3. Use SAMtools to convert SAM output into the binary BAM format, which is both smaller on disk and allows for fast indexing.
4. Summarize reads on the gene/transcript/exon level. We use the R platform with the Rsamtools and GenomicFeatures packages.
5. Calculate DE genes from counts summarized on the gene level. We use the R package edgeR, which we have developed, although there are other tools out there. edgeR can account for biological variation in the data (using a negative binomial model), separate biological from technical variation, produce an MDS plot, and conduct exact testing procedures.
6. Perform GO category testing on the results of the differential expression analysis, using the R package goseq.
Considerations for DE Analysis
Extra-Poisson variation (or overdispersion) is typical of RNA-seq data, especially if there is biological replication amongst your samples. If you only have technical replicates then this may not be an issue, but I would recommend running your data through edgeR to get some idea of the inter-library variability. If you have overdispersed data, then using a Poisson model will *drastically* overestimate the levels of differential expression in your data. Using a NB model like in edgeR can account for this extra variation in the data and give much better assessment of DE.
edgeR can deal with overdispersion in the data, investigate inter-library (incl. biological) variability and get exact p-values for DE based on the NB model.
Hope that is helpful and good luck with your data analysis. Please ask if you have any more questions I might be able to help with.
Best regards
Davis
Is edgeR suit for the Data with bio replicates?
Leave a comment: