Hi D,
Just two questions.
1)I am doing RNA-Seq analysis (reads mapped to human genome) and want to find significant Differential expressed human genes, but I also find there is some reads counts mapped to miRNA, is it normal? Do I need to separate them to analysis it (cause I don't is it could impact the reads distripbution) ? Or together analysis it?
2)BTW, the log fold change inf or -inf (I know what does it means) but with Q< 0.05 , is it useful for downstream analysis?
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
I don't think using cufflinks or not has anything to do with it. Every tool makes assumptions, it's just a matter of which tool's assumptions happen to match your particular dataset better.
Leave a comment:
-
Originally posted by dpryan View PostI expect that the tool that gets the most true positives will vary by dataset. This will also vary wildly by cuffdiff version.
So do you think it is due to I didn't do Cufflinks? Cause I omit this steps that I don't need find new genes and assemble the transcripts. And for the counts based approach (edgeR/DESeq2), we don't do the cufflinks step as well.
Leave a comment:
-
I expect that the tool that gets the most true positives will vary by dataset. This will also vary wildly by cuffdiff version.
Leave a comment:
-
Originally posted by dpryan View PostFrom the MDS plot, I'd guess that edgeR and DESeq2 are correct and there aren't any DE genes, but it's usually a good idea to not read too much into these. At the very least it's likely that the experiment is underpowered.
A quick question
I have shown that my results comparison among the pipelines to you before.
tophat-bam-cuffdiff2 (I don't use cufflinks cause I don't need to finid the new genes)
tophat-htseq-deseq2/edgeR
I found that the Cuffdiff2 is more liberal than edgeR.
However, from the paper
Recent advances in next-generation sequencing technology allow high-throughput cDNA sequencing (RNA-Seq) to be widely applied in transcriptomic studies, in particular for detecting differentially expressed genes between groups. Many software packages have been developed for the identification of differentially expressed genes (DEGs) between treatment groups based on RNA-Seq data. However, there is a lack of consensus on how to approach an optimal study design and choice of suitable software for the analysis. In this comparative study we evaluate the performance of three of the most frequently used software tools: Cufflinks-Cuffdiff2, DESeq and edgeR. A number of important parameters of RNA-Seq technology were taken into consideration, including the number of replicates, sequencing depth, and balanced vs. unbalanced sequencing depth within and between groups. We benchmarked results relative to sets of DEGs identified through either quantitative RT-PCR or microarray. We observed that edgeR performs slightly better than DESeq and Cuffdiff2 in terms of the ability to uncover true positives. Overall, DESeq or taking the intersection of DEGs from two or more tools is recommended if the number of false positives is a major concern in the study. In other circumstances, edgeR is slightly preferable for differential expression analysis at the expense of potentially introducing more false positives.
it seems that edgeR got most True positive.
Could it happen (i.e. Cuffdiff2 is more liberal) because of different animals? I have done it on bovine samples. Or because the problem of my scripts?
I used the default parameter in Cuffdiff2 and edgeR/DESeq2.
Thank you!Attached Files
Leave a comment:
-
From the MDS plot, I'd guess that edgeR and DESeq2 are correct and there aren't any DE genes, but it's usually a good idea to not read too much into these. At the very least it's likely that the experiment is underpowered.
Leave a comment:
-
Originally posted by dpryan View PostYou can always change the threshold for significance a bit (0.1 is very common). I'm not familiar enough with the inner workings of cuffdiff2 to provide any insights there.Last edited by super0925; 08-13-2014, 11:56 PM.
Leave a comment:
-
You can always change the threshold for significance a bit (0.1 is very common). I'm not familiar enough with the inner workings of cuffdiff2 to provide any insights there.
Leave a comment:
-
Originally posted by dpryan View PostIt's probably a bug in bowtie2 then. If you can subset your fastq file to a reasonable size and can still reproduce the issue then you can either post that and the gene you're aligning against and I'll have a look or you can then just directly submit a bug report to the bowtie2 authors.
BTW, you don't need to specify --local if you also specify --very-sensitive-local.
Hi teacher, another question again.
I am analysising 6 samples in 2 conditions.
From Cuffdiff2, I got ~700 DE genes (default is Q <0.05 , you got it)
But if I use edgeR and DESeq2, I didn't find any DE genes by the default threshold (i.e. Q<0.05).
Why? Do I need to change any setting or parameters? (currently I use default)
Could I insist use Tuxedo? or change parameter (e.g. P value) for count-based methods?
I have attached the MA plot and MDS plot.
Thank you!Attached Files
Leave a comment:
-
It's probably a bug in bowtie2 then. If you can subset your fastq file to a reasonable size and can still reproduce the issue then you can either post that and the gene you're aligning against and I'll have a look or you can then just directly submit a bug report to the bowtie2 authors.
BTW, you don't need to specify --local if you also specify --very-sensitive-local.
Leave a comment:
-
Originally posted by dpryan View PostYou forgot the "-x" before "GQ2Bowtie2Index/GQ2". I presume that was just a typo in this post, though, so you'd have to run bowtie2 in a debugger to find the source of the error. It's likely that this sort of thing is a bug rather than you doing something wrong.Last edited by super0925; 07-31-2014, 05:30 AM.
Leave a comment:
-
You forgot the "-x" before "GQ2Bowtie2Index/GQ2". I presume that was just a typo in this post, though, so you'd have to run bowtie2 in a debugger to find the source of the error. It's likely that this sort of thing is a bug rather than you doing something wrong.
Leave a comment:
-
Originally posted by dpryan View PostWe should setup a consultation contract
If you just want a quick and dirty check then just mapping reads to that gene should suffice. Just tweak your settings to only permit perfect or near perfect matches.
That should suffice. If you need to know exact read numbers or you need the alignments for SNP calling, then this method isn't ideal. In those cases, you would really need to map to the entire genome so as to not bias alignments (this is also why I suggested only accepting near-perfect matches above).
Hello D,
Sorry to trouble you again in this issue.
If I want to see whether a sequence (supposed named "GQ2") is expressed in bovine cells. I followed the solution in our last discussion: 'firstly I will generate the bowtie2 index of this "GQ2" based on the fasta sequence, and then map my fastq file to that "GQ2" genome.'
But when I did it, I got the error.
After I ran the bowtie2, it goes to the endless loop (obviously it is error). If I terminate the bowtie2, I found the error.
"bowtie2-align died with signal 2 (INT)"
I have googled the error but I didn't find any solution about this error. some post said it may due to memory problem but I don't think it is my problem. Hence I am wondering are my command lines wrong?
My step:
(1) generate the bowtie2 index of GQ2.
>GQ2
ATGGAGCACTTTCCCCGCTGTGTGCACGAGTCCTGGGGTTCCTCAAAGGA
GCCCCAGAAAACAGAGGTGCTGCAACTCTTGAGCTTAGCGGACCCTGAGG
.....
mkdir GQ2Bowtie2Index
cd GQ2Bowtie2Index
bowtie2 GQ2.fa GQ2
ls
GQ2.1.bt2 GQ2.2.bt2 GQ2.3.bt2 GQ2.4.bt2 GQ2.fa GQ2.rev.1.bt2 GQ2.rev.2.bt2
(2) map the gene on to 'GQ2' genome
bowtie2 --local --very-sensitive-local -p 8 ./GQ2Bowtie2Index/GQ2 -U bovine_sample1.fastq
then get the error!!!
I really don't know why I am wrong ...
Thank you!Attached FilesLast edited by super0925; 07-31-2014, 02:17 AM.
Leave a comment:
-
Originally posted by dpryan View PostWe should setup a consultation contract
If you just want a quick and dirty check then just mapping reads to that gene should suffice. Just tweak your settings to only permit perfect or near perfect matches.
That should suffice. If you need to know exact read numbers or you need the alignments for SNP calling, then this method isn't ideal. In those cases, you would really need to map to the entire genome so as to not bias alignments (this is also why I suggested only accepting near-perfect matches above).
Thank you soooo much! You are not only my consultant, but also my teacher
I will try to do it.
Leave a comment:
-
Originally posted by super0925 View PostHi D
I have two samples (i.e. two .fastq files) from bovine. If I want to see whether a sequence (supposed named "GQ2") is expressed in bovine cells. (I have got the FASTA of this sequence) The sequence is unannotated in the current bovine genome, so might not have been tested in the analyses thus far.
Q1: How could I do it?
Q2: My solution (I don't know is it correct)
firstly I generate the bowtie/bowtie2 index of this "GQ2" based on the FASTA sequence, and then map my fastq file to that "GQ2" genome. is it correct?
Thank you!
Leave a comment:
Latest Articles
Collapse
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Yesterday, 08:47 AM
|
0 responses
12 views
0 likes
|
Last Post
by seqadmin
Yesterday, 08:47 AM
|
||
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
60 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
59 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
||
Started by seqadmin, 04-10-2024, 09:21 AM
|
0 responses
54 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 09:21 AM
|
Leave a comment: