Dear All,
there have been multiple questions about the ENCODE RNA-seq alignments on the UCSC portal. These alignments had been generated by a 3-year old version of STAR and use some non-conventional formatting (e.g. they are not compatible with Cufflinks).
To bring this data up-to-date, I have remapped it using the latest version of STAR. The new alignments use conventional formatting and should be compatible with most downstream software. Importantly, annotations are used to improve the mapping accuracy. The BAMs for all of the ENCODE phase 2 (2008-2012) long RNA-seq data can be downloaded here:
This is NOT an official ENCODE release. For all the metadata, please refer to UCSC ENCODE portal:
To reduce file sizes, the quality scores were not recorded, and the read names were replaced with numbers.
The files are directly compatible with Cufflinks.
CSHL data is stranded (dUTP protocol) and Cufflinks has to be run with --library-type fr-firststrand
Caltech and HAIB data are unstranded and can be run with default --library-type.
STAR version: STAR_2.3.1u (2013/11/24)
Genome: hg19 + phiX + NIST ERCC spike-ins
Annotations: Gencode18
Please let me know if you have any issues or questions
Cheers
Alex
there have been multiple questions about the ENCODE RNA-seq alignments on the UCSC portal. These alignments had been generated by a 3-year old version of STAR and use some non-conventional formatting (e.g. they are not compatible with Cufflinks).
To bring this data up-to-date, I have remapped it using the latest version of STAR. The new alignments use conventional formatting and should be compatible with most downstream software. Importantly, annotations are used to improve the mapping accuracy. The BAMs for all of the ENCODE phase 2 (2008-2012) long RNA-seq data can be downloaded here:
This is NOT an official ENCODE release. For all the metadata, please refer to UCSC ENCODE portal:
To reduce file sizes, the quality scores were not recorded, and the read names were replaced with numbers.
The files are directly compatible with Cufflinks.
CSHL data is stranded (dUTP protocol) and Cufflinks has to be run with --library-type fr-firststrand
Caltech and HAIB data are unstranded and can be run with default --library-type.
STAR version: STAR_2.3.1u (2013/11/24)
Genome: hg19 + phiX + NIST ERCC spike-ins
Annotations: Gencode18
Please let me know if you have any issues or questions
Cheers
Alex
Comment