HTseq to DeSeq/EdgeR to Heatmap

super0925 replied

10-07-2014, 04:09 AM
Hi D,
Just two questions.
1)I am doing RNA-Seq analysis (reads mapped to human genome) and want to find significant Differential expressed human genes, but I also find there is some reads counts mapped to miRNA, is it normal? Do I need to separate them to analysis it (cause I don't is it could impact the reads distripbution) ? Or together analysis it?
2)BTW, the log fold change inf or -inf (I know what does it means) but with Q< 0.05 , is it useful for downstream analysis?
Leave a comment:
dpryan replied

08-29-2014, 05:25 AM
I don't think using cufflinks or not has anything to do with it. Every tool makes assumptions, it's just a matter of which tool's assumptions happen to match your particular dataset better.
Leave a comment:
super0925 replied

08-29-2014, 04:49 AM
Originally posted by dpryan View Post

I expect that the tool that gets the most true positives will vary by dataset. This will also vary wildly by cuffdiff version.

Thank you D. Very useful! I got it
So do you think it is due to I didn't do Cufflinks? Cause I omit this steps that I don't need find new genes and assemble the transcripts. And for the counts based approach (edgeR/DESeq2), we don't do the cufflinks step as well.
Leave a comment:
dpryan replied

08-28-2014, 11:53 PM
I expect that the tool that gets the most true positives will vary by dataset. This will also vary wildly by cuffdiff version.
Leave a comment:
super0925 replied

08-28-2014, 11:25 AM
Originally posted by dpryan View Post

From the MDS plot, I'd guess that edgeR and DESeq2 are correct and there aren't any DE genes, but it's usually a good idea to not read too much into these. At the very least it's likely that the experiment is underpowered.

Hi D
A quick question
I have shown that my results comparison among the pipelines to you before.

tophat-bam-cuffdiff2 (I don't use cufflinks cause I don't need to finid the new genes)
tophat-htseq-deseq2/edgeR

I found that the Cuffdiff2 is more liberal than edgeR.

However, from the paper

A Comparative Study of Techniques for Differential Expression Analysis on RNA-Seq Data

http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0103207

Recent advances in next-generation sequencing technology allow high-throughput cDNA sequencing (RNA-Seq) to be widely applied in transcriptomic studies, in particular for detecting differentially expressed genes between groups. Many software packages have been developed for the identification of differentially expressed genes (DEGs) between treatment groups based on RNA-Seq data. However, there is a lack of consensus on how to approach an optimal study design and choice of suitable software for the analysis. In this comparative study we evaluate the performance of three of the most frequently used software tools: Cufflinks-Cuffdiff2, DESeq and edgeR. A number of important parameters of RNA-Seq technology were taken into consideration, including the number of replicates, sequencing depth, and balanced vs. unbalanced sequencing depth within and between groups. We benchmarked results relative to sets of DEGs identified through either quantitative RT-PCR or microarray. We observed that edgeR performs slightly better than DESeq and Cuffdiff2 in terms of the ability to uncover true positives. Overall, DESeq or taking the intersection of DEGs from two or more tools is recommended if the number of false positives is a major concern in the study. In other circumstances, edgeR is slightly preferable for differential expression analysis at the expense of potentially introducing more false positives.

it seems that edgeR got most True positive.

Could it happen (i.e. Cuffdiff2 is more liberal) because of different animals? I have done it on bovine samples. Or because the problem of my scripts?
I used the default parameter in Cuffdiff2 and edgeR/DESeq2.
Thank you!
Attached Files

Untitled.jpg (63.8 KB, 6 views)
Leave a comment:
dpryan replied

08-14-2014, 12:01 AM
From the MDS plot, I'd guess that edgeR and DESeq2 are correct and there aren't any DE genes, but it's usually a good idea to not read too much into these. At the very least it's likely that the experiment is underpowered.
Leave a comment:
super0925 replied

08-13-2014, 11:46 PM
Originally posted by dpryan View Post

You can always change the threshold for significance a bit (0.1 is very common). I'm not familiar enough with the inner workings of cuffdiff2 to provide any insights there.

Thank you D. So from the MDS and MA plots are looked like OK (cause they are not totally separate between two groups)?

Last edited by super0925; 08-13-2014, 11:56 PM.
Leave a comment:
dpryan replied

08-13-2014, 09:38 AM
You can always change the threshold for significance a bit (0.1 is very common). I'm not familiar enough with the inner workings of cuffdiff2 to provide any insights there.
Leave a comment:
super0925 replied

08-13-2014, 06:41 AM
Originally posted by dpryan View Post

It's probably a bug in bowtie2 then. If you can subset your fastq file to a reasonable size and can still reproduce the issue then you can either post that and the gene you're aligning against and I'll have a look or you can then just directly submit a bug report to the bowtie2 authors.

BTW, you don't need to specify --local if you also specify --very-sensitive-local.

Hi teacher, another question again.
I am analysising 6 samples in 2 conditions.
From Cuffdiff2, I got ~700 DE genes (default is Q <0.05 , you got it)
But if I use edgeR and DESeq2, I didn't find any DE genes by the default threshold (i.e. Q<0.05).
Why? Do I need to change any setting or parameters? (currently I use default)
Could I insist use Tuxedo? or change parameter (e.g. P value) for count-based methods?

I have attached the MA plot and MDS plot.

Thank you!
Attached Files

Untitled.png (20.1 KB, 5 views)

Untitled1.png (56.3 KB, 5 views)
Leave a comment:
dpryan replied

07-31-2014, 09:31 AM
It's probably a bug in bowtie2 then. If you can subset your fastq file to a reasonable size and can still reproduce the issue then you can either post that and the gene you're aligning against and I'll have a look or you can then just directly submit a bug report to the bowtie2 authors.

BTW, you don't need to specify --local if you also specify --very-sensitive-local.
Leave a comment:
super0925 replied

07-31-2014, 05:28 AM
Originally posted by dpryan View Post

You forgot the "-x" before "GQ2Bowtie2Index/GQ2". I presume that was just a typo in this post, though, so you'd have to run bowtie2 in a debugger to find the source of the error. It's likely that this sort of thing is a bug rather than you doing something wrong.

Sorry D, if I add the "-x" , I still get the same error... T__T

Last edited by super0925; 07-31-2014, 05:30 AM.
Leave a comment:
dpryan replied

07-31-2014, 05:11 AM
You forgot the "-x" before "GQ2Bowtie2Index/GQ2". I presume that was just a typo in this post, though, so you'd have to run bowtie2 in a debugger to find the source of the error. It's likely that this sort of thing is a bug rather than you doing something wrong.
Leave a comment:
super0925 replied

07-29-2014, 06:22 AM
Originally posted by dpryan View Post

We should setup a consultation contract

If you just want a quick and dirty check then just mapping reads to that gene should suffice. Just tweak your settings to only permit perfect or near perfect matches.

That should suffice. If you need to know exact read numbers or you need the alignments for SNP calling, then this method isn't ideal. In those cases, you would really need to map to the entire genome so as to not bias alignments (this is also why I suggested only accepting near-perfect matches above).

Hello D,
Sorry to trouble you again in this issue.
If I want to see whether a sequence (supposed named "GQ2") is expressed in bovine cells. I followed the solution in our last discussion: 'firstly I will generate the bowtie2 index of this "GQ2" based on the fasta sequence, and then map my fastq file to that "GQ2" genome.'
But when I did it, I got the error.
After I ran the bowtie2, it goes to the endless loop (obviously it is error). If I terminate the bowtie2, I found the error.
"bowtie2-align died with signal 2 (INT)"

I have googled the error but I didn't find any solution about this error. some post said it may due to memory problem but I don't think it is my problem. Hence I am wondering are my command lines wrong?

My step:
(1) generate the bowtie2 index of GQ2.

>GQ2
ATGGAGCACTTTCCCCGCTGTGTGCACGAGTCCTGGGGTTCCTCAAAGGA
GCCCCAGAAAACAGAGGTGCTGCAACTCTTGAGCTTAGCGGACCCTGAGG
.....
mkdir GQ2Bowtie2Index
cd GQ2Bowtie2Index
bowtie2 GQ2.fa GQ2
ls
GQ2.1.bt2 GQ2.2.bt2 GQ2.3.bt2 GQ2.4.bt2 GQ2.fa GQ2.rev.1.bt2 GQ2.rev.2.bt2

(2) map the gene on to 'GQ2' genome

bowtie2 --local --very-sensitive-local -p 8 ./GQ2Bowtie2Index/GQ2 -U bovine_sample1.fastq

then get the error!!!
I really don't know why I am wrong ...
Thank you!
Attached Files

Untitled.png (32.3 KB, 5 views)
Last edited by super0925; 07-31-2014, 02:17 AM.
Leave a comment:
super0925 replied

07-17-2014, 01:38 AM
Originally posted by dpryan View Post

We should setup a consultation contract

If you just want a quick and dirty check then just mapping reads to that gene should suffice. Just tweak your settings to only permit perfect or near perfect matches.

That should suffice. If you need to know exact read numbers or you need the alignments for SNP calling, then this method isn't ideal. In those cases, you would really need to map to the entire genome so as to not bias alignments (this is also why I suggested only accepting near-perfect matches above).

Thank you soooo much! You are not only my consultant, but also my teacher
I will try to do it.
Leave a comment:
dpryan replied

07-16-2014, 08:33 AM
Originally posted by super0925 View Post

Hi D
I have two samples (i.e. two .fastq files) from bovine. If I want to see whether a sequence (supposed named "GQ2") is expressed in bovine cells. (I have got the FASTA of this sequence) The sequence is unannotated in the current bovine genome, so might not have been tested in the analyses thus far.

We should setup a consultation contract

Q1: How could I do it?

If you just want a quick and dirty check then just mapping reads to that gene should suffice. Just tweak your settings to only permit perfect or near perfect matches.

Q2: My solution (I don't know is it correct)
firstly I generate the bowtie/bowtie2 index of this "GQ2" based on the FASTA sequence, and then map my fastq file to that "GQ2" genome. is it correct?
Thank you!

That should suffice. If you need to know exact read numbers or you need the alignments for SNP calling, then this method isn't ideal. In those cases, you would really need to map to the entire genome so as to not bias alignments (this is also why I suggested only accepting near-perfect matches above).
Leave a comment:

Previous 1 2 3 4 5 12 13 template Next

Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
04-22-2024, 07:01 AM
Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 12 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News