Hi all,
I have been receiving some confusing output with an RNA-seq dataset that I am trying to ultimately determine differential expression and fpkm values for.
Background: Library prep was stranded TruSeq (dUTP method) of mouse metatranscriptome and was given assembled contigs of transcriptome (Trinity) across 18 samples as well as RNA seq fastq files, performed on 4 lanes and two mate pair files, for each of these samples.
I ran HiSAT2 three times on one of the samples' pairs of reads with the following options (-fr, -rf and -ff) to make sure I am aligning using the correct strandedness (threw in -ff for funsies) but all gave the same output, with really only half maybe of the pairs aligning:
To further complicate things, I ran infer_experiment.py from RseQC to double check that the data is indeed stranded and all my results came back as suggesting unstranded paired reads:
But the sequencing core assured me that they were using a stranded protocol. I am thinking maybe the issues of low alignment as well as the [un]stranded issue may be related and due to something that I am missing here, or maybe there is not an issue here and the results are fine.
Any thoughts/suggestions would be much appreciated!
I have been receiving some confusing output with an RNA-seq dataset that I am trying to ultimately determine differential expression and fpkm values for.
Background: Library prep was stranded TruSeq (dUTP method) of mouse metatranscriptome and was given assembled contigs of transcriptome (Trinity) across 18 samples as well as RNA seq fastq files, performed on 4 lanes and two mate pair files, for each of these samples.
I ran HiSAT2 three times on one of the samples' pairs of reads with the following options (-fr, -rf and -ff) to make sure I am aligning using the correct strandedness (threw in -ff for funsies) but all gave the same output, with really only half maybe of the pairs aligning:
Code:
hisat2 -fr -x ../../contigs/Sample_120507/Sample_120507 -q -1 120507_CGATGT_S1_L001_R1_001.fastq -2 120507_CGATGT_S1_L001_R2_001.fa stq -S fr.sam RESULTS: 6185153 reads; of these: 6185153 (100.00%) were paired; of these: 5629996 (91.02%) aligned concordantly 0 times 475008 (7.68%) aligned concordantly exactly 1 time 80149 (1.30%) aligned concordantly >1 times ---- 5629996 pairs aligned concordantly 0 times; of these: 1575716 (27.99%) aligned discordantly 1 time ---- 4054280 pairs aligned 0 times concordantly or discordantly; of these: 8108560 mates make up the pairs; of these: 3511037 (43.30%) aligned 0 times 3273195 (40.37%) aligned exactly 1 time 1324328 (16.33%) aligned >1 times 71.62% overall alignment rate hisat2 -rf -x ../../contigs/Sample_120507/Sample_120507 -q -1 120507_CGATGT_S1_L001_R1_001.fastq -2 120507_CGATGT_S1_L001_R2_001.fastq -S rf.sam 6185153 reads; of these: 6185153 (100.00%) were paired; of these: 5629996 (91.02%) aligned concordantly 0 times 475008 (7.68%) aligned concordantly exactly 1 time 80149 (1.30%) aligned concordantly >1 times ---- 5629996 pairs aligned concordantly 0 times; of these: 1575716 (27.99%) aligned discordantly 1 time ---- 4054280 pairs aligned 0 times concordantly or discordantly; of these: 8108560 mates make up the pairs; of these: 3511037 (43.30%) aligned 0 times 3273195 (40.37%) aligned exactly 1 time 1324328 (16.33%) aligned >1 times 71.62% overall alignment rate hisat2 -ff -x ../../contigs/Sample_120507/Sample_120507 -q -1 120507_CGATGT_S1_L001_R1_001.fastq -2 120507_CGATGT_S1_L001_R2_001.fastq -S ff.sam RESULTS: 6185153 reads; of these: 6185153 (100.00%) were paired; of these: 5629996 (91.02%) aligned concordantly 0 times 475008 (7.68%) aligned concordantly exactly 1 time 80149 (1.30%) aligned concordantly >1 times ---- 5629996 pairs aligned concordantly 0 times; of these: 1575716 (27.99%) aligned discordantly 1 time ---- 4054280 pairs aligned 0 times concordantly or discordantly; of these: 8108560 mates make up the pairs; of these: 3511037 (43.30%) aligned 0 times 3273195 (40.37%) aligned exactly 1 time 1324328 (16.33%) aligned >1 times 71.62% overall alignment rate
Code:
$ infer_experiment.py -i ff.sam -r ../Sample_120507/fragGeneScan.gff.bed Reading reference gene model ../Sample_120507/fragGeneScan.gff.bed ... Done Loading SAM/BAM file ... Total 200000 usable reads were sampled This is PairEnd Data Fraction of reads failed to determine: 0.0005 Fraction of reads explained by "1++,1--,2+-,2-+": 0.5232 Fraction of reads explained by "1+-,1-+,2++,2--": 0.4764 $ infer_experiment.py -i rf.sam -r ../Sample_120507/fragGeneScan.gff.bed Reading reference gene model ../Sample_120507/fragGeneScan.gff.bed ... Done Loading SAM/BAM file ... Total 200000 usable reads were sampled This is PairEnd Data Fraction of reads failed to determine: 0.0005 Fraction of reads explained by "1++,1--,2+-,2-+": 0.5232 Fraction of reads explained by "1+-,1-+,2++,2--": 0.4764 $ infer_experiment.py -i fr.sam -r ../Sample_120507/fragGeneScan.gff.bed Reading reference gene model ../Sample_120507/fragGeneScan.gff.bed ... Done Loading SAM/BAM file ... Total 200000 usable reads were sampled This is PairEnd Data Fraction of reads failed to determine: 0.0005 Fraction of reads explained by "1++,1--,2+-,2-+": 0.5232 Fraction of reads explained by "1+-,1-+,2++,2--": 0.4764
Any thoughts/suggestions would be much appreciated!
Comment