Hi!
I have a problem with recovering added ERCC to my RNAseq samples. Briefly, I'm doing smart-seq2 on single human T cells and align with STAR to hg19+ERCC sequences. Both FASTA and GTF file have the ERCCs but when I look at gene count results they don't show up at all, regardless if I use STARs genecount option, HTseq-count or RSEM.
Let's assume I "forgot" to add the ERCC spike-ins and the count is actually 0... Shouldn't the gene names still appear in the downstream files but the value just be 0? If I run Samtools idxstats on the STAR output (sorted or unsorted bam file), it shows the ERCC "chromosomes".
I'm confused by all of this and can't even figure out where in the pipeline my mistake might be. Help!
Here are my commands:
STAR --runMode genomeGenerate --runThreadN 8 --genomeDir indices/STAR --genomeFastaFiles path/to/genomeE.fa --sjdbGTFfile path/to/genesE.gtf --genomeChrBinNbits 12
STAR --runMode alignReads \
--genomeLoad NoSharedMemory \
--genomeDir indices/STAR \
--readFilesIn XX_R1_001.fastq.gz XX_R2_001.fastq.gz \
--outFileNamePrefix /results/ercc/$i \
--quantMode GeneCounts \
--twopassMode Basic \
--outSAMtype BAM Unsorted SortedByCoordinate \
--readFilesCommand zcat
htseq-count --mode=union --idattr=gene_name -f bam -order pos --stranded=no XX-Aligned.bam /path/to/genesE.gtf > XX-gene.count
### I tried stranded=yes or reverse but that didn't help either.
Any pointers highly appreciated!!
Alex
I have a problem with recovering added ERCC to my RNAseq samples. Briefly, I'm doing smart-seq2 on single human T cells and align with STAR to hg19+ERCC sequences. Both FASTA and GTF file have the ERCCs but when I look at gene count results they don't show up at all, regardless if I use STARs genecount option, HTseq-count or RSEM.
Let's assume I "forgot" to add the ERCC spike-ins and the count is actually 0... Shouldn't the gene names still appear in the downstream files but the value just be 0? If I run Samtools idxstats on the STAR output (sorted or unsorted bam file), it shows the ERCC "chromosomes".
I'm confused by all of this and can't even figure out where in the pipeline my mistake might be. Help!
Here are my commands:
STAR --runMode genomeGenerate --runThreadN 8 --genomeDir indices/STAR --genomeFastaFiles path/to/genomeE.fa --sjdbGTFfile path/to/genesE.gtf --genomeChrBinNbits 12
STAR --runMode alignReads \
--genomeLoad NoSharedMemory \
--genomeDir indices/STAR \
--readFilesIn XX_R1_001.fastq.gz XX_R2_001.fastq.gz \
--outFileNamePrefix /results/ercc/$i \
--quantMode GeneCounts \
--twopassMode Basic \
--outSAMtype BAM Unsorted SortedByCoordinate \
--readFilesCommand zcat
htseq-count --mode=union --idattr=gene_name -f bam -order pos --stranded=no XX-Aligned.bam /path/to/genesE.gtf > XX-gene.count
### I tried stranded=yes or reverse but that didn't help either.
Any pointers highly appreciated!!
Alex
Comment