Hi Everyone,
I am working on a project with two groups (A & B) each containing 5 samples. Within each sample I have two RNA-seq FASTQ reads which I downloaded. So as of now, I am planning to run tophat for each of the samples (for a total of 10 accepted_hits.bam files). From here I plan on running cufflinks for each sample with its respective .bam file to produce a total of 10 transcripts.gtf files.
My goal is to compare the differential gene expression of group A and group B. I am using the hg19 reference genome.
I have two questions. First, is there a way I can run tophat/cufflinks on all the samples all at once or do I have to run one sample, then wait until its done, then do the next sample and so on?
Second, once I have the transcripts.gtf files for each of the samples for groups A & B, is it necessary to merge them into one single .gtf file? I assume this would be done with cuffmerge right? What exactly does cuffmerge do and what are the advantages in using it. In addition, is there a need to use cuffcompare since I am comparing the differential gene expression of group A and group B or can I go straight to cuffdiff.
In general, my second question is essentially asking the pipeline for using software after tophat & cufflinks based on what I am trying to do. It would be great if someone could give me a suggestion as to the order of how I should approach this.
THANKS!
I am working on a project with two groups (A & B) each containing 5 samples. Within each sample I have two RNA-seq FASTQ reads which I downloaded. So as of now, I am planning to run tophat for each of the samples (for a total of 10 accepted_hits.bam files). From here I plan on running cufflinks for each sample with its respective .bam file to produce a total of 10 transcripts.gtf files.
My goal is to compare the differential gene expression of group A and group B. I am using the hg19 reference genome.
I have two questions. First, is there a way I can run tophat/cufflinks on all the samples all at once or do I have to run one sample, then wait until its done, then do the next sample and so on?
Second, once I have the transcripts.gtf files for each of the samples for groups A & B, is it necessary to merge them into one single .gtf file? I assume this would be done with cuffmerge right? What exactly does cuffmerge do and what are the advantages in using it. In addition, is there a need to use cuffcompare since I am comparing the differential gene expression of group A and group B or can I go straight to cuffdiff.
In general, my second question is essentially asking the pipeline for using software after tophat & cufflinks based on what I am trying to do. It would be great if someone could give me a suggestion as to the order of how I should approach this.
THANKS!
Comment