Hi, i am new to this and I would like to be able to feed my RNA data into the VST algorithm from DESeq, to normalize my data. But since there is so many different informations in here, I am not sure if I do it the right way, what I do is for each experiment/replicate:
python tophat -p 8 -G genome.gtf -o output_dir genome.bt2 X1.fastq X2.fastq
This generates a directory containing accepted_hits.bam
I then sort this using
Samtools sort –n input.bam out.sam
I then run cufflinks and use the transcripts.gtf file
cufflinks -p 8 -o newoutputDir outputdir/accepted_hits.bam
To generate the list of counts I then do:
htseq-count --stranded=no out.sam transcripts.gtf > output-htseq.txt
which then gives me a list like this:
CUFF.1 745
CUFF.10 14
CUFF.100 0
CUFF.1000 1
CUFF.10000 0
CUFF.10001 1
CUFF.10002 57
Would this be a correct way of doing it? The output I get is just assigned to arbitrary values in the GTF file, and I have not been able to find that much about assigning the count values to gene identifiers, does anyone have any experience with that?
python tophat -p 8 -G genome.gtf -o output_dir genome.bt2 X1.fastq X2.fastq
This generates a directory containing accepted_hits.bam
I then sort this using
Samtools sort –n input.bam out.sam
I then run cufflinks and use the transcripts.gtf file
cufflinks -p 8 -o newoutputDir outputdir/accepted_hits.bam
To generate the list of counts I then do:
htseq-count --stranded=no out.sam transcripts.gtf > output-htseq.txt
which then gives me a list like this:
CUFF.1 745
CUFF.10 14
CUFF.100 0
CUFF.1000 1
CUFF.10000 0
CUFF.10001 1
CUFF.10002 57
Would this be a correct way of doing it? The output I get is just assigned to arbitrary values in the GTF file, and I have not been able to find that much about assigning the count values to gene identifiers, does anyone have any experience with that?
Comment