Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • ENCODE long RNA-seq remapped

    Dear All,

    there have been multiple questions about the ENCODE RNA-seq alignments on the UCSC portal. These alignments had been generated by a 3-year old version of STAR and use some non-conventional formatting (e.g. they are not compatible with Cufflinks).

    To bring this data up-to-date, I have remapped it using the latest version of STAR. The new alignments use conventional formatting and should be compatible with most downstream software. Importantly, annotations are used to improve the mapping accuracy. The BAMs for all of the ENCODE phase 2 (2008-2012) long RNA-seq data can be downloaded here:


    This is NOT an official ENCODE release. For all the metadata, please refer to UCSC ENCODE portal:


    To reduce file sizes, the quality scores were not recorded, and the read names were replaced with numbers.

    The files are directly compatible with Cufflinks.
    CSHL data is stranded (dUTP protocol) and Cufflinks has to be run with --library-type fr-firststrand
    Caltech and HAIB data are unstranded and can be run with default --library-type.

    STAR version: STAR_2.3.1u (2013/11/24)
    Genome: hg19 + phiX + NIST ERCC spike-ins
    Annotations: Gencode18

    Please let me know if you have any issues or questions
    Cheers
    Alex
    Last edited by alexdobin; 06-04-2014, 06:58 PM. Reason: replaced with up-to-date URL

  • #2
    Thanks Alex!

    Is these data normalized in some way?

    L
    Last edited by Liy; 12-06-2013, 02:51 AM.

    Comment


    • #3
      Hi Liy,

      at the moment I have posted only the alignments - BAM files, so there is no normalization of any kind. I was contemplating also making the signal (wiggle) tracks - these can be made in many different ways (normalization, unique- vs multi-mappers, etc.).

      Cheers
      Alex

      Comment


      • #4
        mouse long RNASeq

        Originally posted by alexdobin View Post
        Dear All,

        there have been multiple questions about the ENCODE RNA-seq alignments on the UCSC portal. These alignments had been generated by a 3-year old version of STAR and use some non-conventional formatting (e.g. they are not compatible with Cufflinks).

        To bring this data up-to-date, I have remapped it using the latest version of STAR. The new alignments use conventional formatting and should be compatible with most downstream software. Importantly, annotations are used to improve the mapping accuracy. The BAMs for all of the ENCODE phase 2 (2008-2012) long RNA-seq data can be downloaded here:ftp://ftp2.cshl.edu/gingeraslab/trac...t/ENCODE2/BAM/

        This is NOT an official ENCODE release. For all the metadata, please refer to UCSC ENCODE portal:


        To reduce file sizes, the quality scores were not recorded, and the read names were replaced with numbers.

        The files are directly compatible with Cufflinks.
        CSHL data is stranded (dUTP protocol) and Cufflinks has to be run with --library-type fr-firststrand
        Caltech and HAIB data are unstranded and can be run with default --library-type.

        STAR version: STAR_2.3.1u (2013/11/24)
        Genome: hg19 + phiX + NIST ERCC spike-ins
        Annotations: Gencode18

        Please let me know if you have any issues or questions
        Cheers
        Alex

        Alex, does anything like that exist also for the mouse RNASeq dataset?

        I`m trying to get the counts per gene using HTseq on files I generated from the encode (CSHL long RNASeq) bam files (sorted & turned into sam using samtools). However only 13% of the reads actually map to features, regardless of the GTF fie I use. Could the reason be the same?

        I appreciate any hints!

        Maike

        Comment


        • #5
          Hi Maike,

          it's very likely that HTseq has troubles with the old BAM format. The main problem is that in this old format the mates were assigned the same strand (for better viewability on UCSC browser), however, this is not a standard convention for Illumina reads.

          I am remapping the ENCODE CSHL mouse data to mm10 and Gencode M2 annotations (just released!), and will post the BAMs early next week.

          Cheers
          Alex

          Comment


          • #6
            Thank you Alex, for answering and doing the work!
            Maike

            Comment


            • #7
              The re-mapped ENCODE2 mouse CSHL data is posted here:
              ftp://ftp2.cshl.edu/gingeraslab/trac...AM/Mouse_CSHL/

              Comment


              • #8
                mouse encode rnaseq

                This is really helpful, thank you!

                Comment


                • #9
                  Originally posted by alexdobin View Post
                  The re-mapped ENCODE2 mouse CSHL data is posted here:
                  ftp://ftp2.cshl.edu/gingeraslab/trac...AM/Mouse_CSHL/
                  Dobin, regarding the mouse-remapping, are you using this reference ftp://ftp2.cshl.edu/gingeraslab/trac...GencodeM2.tgz?

                  Thanks.

                  Comment


                  • #10
                    Originally posted by Auction View Post
                    Dobin, regarding the mouse-remapping, are you using this reference ftp://ftp2.cshl.edu/gingeraslab/trac...GencodeM2.tgz?

                    Thanks.
                    Yes, this is correct.

                    Comment


                    • #11
                      Dobin

                      The reference in ftp://ftp2.cshl.edu/gingeraslab/trac..._GencodeM2.tgz only provides the GTF file and STAR indexed reference. Where can we download the fasta files for both mm10 and ERCC markers?

                      Thanks.

                      Comment


                      • #12
                        Hi Alex,

                        I could not connect to the cshl ftp address and the link is broken. Could you please tell me where I can download the data now?

                        Many thanks,
                        Rui

                        Originally posted by alexdobin View Post
                        Dear All,

                        there have been multiple questions about the ENCODE RNA-seq alignments on the UCSC portal. These alignments had been generated by a 3-year old version of STAR and use some non-conventional formatting (e.g. they are not compatible with Cufflinks).

                        To bring this data up-to-date, I have remapped it using the latest version of STAR. The new alignments use conventional formatting and should be compatible with most downstream software. Importantly, annotations are used to improve the mapping accuracy. The BAMs for all of the ENCODE phase 2 (2008-2012) long RNA-seq data can be downloaded here:ftp://ftp2.cshl.edu/gingeraslab/trac...t/ENCODE2/BAM/

                        This is NOT an official ENCODE release. For all the metadata, please refer to UCSC ENCODE portal:


                        To reduce file sizes, the quality scores were not recorded, and the read names were replaced with numbers.

                        The files are directly compatible with Cufflinks.
                        CSHL data is stranded (dUTP protocol) and Cufflinks has to be run with --library-type fr-firststrand
                        Caltech and HAIB data are unstranded and can be run with default --library-type.

                        STAR version: STAR_2.3.1u (2013/11/24)
                        Genome: hg19 + phiX + NIST ERCC spike-ins
                        Annotations: Gencode18

                        Please let me know if you have any issues or questions
                        Cheers
                        Alex

                        Comment


                        • #13
                          Originally posted by rzhang View Post
                          Hi Alex,

                          I could not connect to the cshl ftp address and the link is broken. Could you please tell me where I can download the data now?

                          Many thanks,
                          Rui
                          Hi Rui,

                          this is the new location of the ENCODE2 RNA-seq BAMs:


                          Cheers
                          Alex

                          Comment


                          • #14
                            Originally posted by alexdobin View Post
                            Hi Rui,

                            this is the new location of the ENCODE2 RNA-seq BAMs:


                            Cheers
                            Alex
                            this is so very useful, thank you very much, for mouse ENCODE CSHL data in particular! I was aligning these data: http://www.ncbi.nlm.nih.gov/geo/quer...i?acc=GSE39524, but it's been pretty painful since it's ABI SOLID platform...

                            BTW, Alex: have the mouse ENCODE data (CSHL long RNA-seq, the ones you have shared) been published yet?
                            Last edited by apredeus; 06-11-2014, 10:32 AM.

                            Comment


                            • #15
                              Originally posted by apredeus View Post
                              this is so very useful, thank you very much, for mouse ENCODE CSHL data in particular! I was aligning these data: http://www.ncbi.nlm.nih.gov/geo/quer...i?acc=GSE39524, but it's been pretty painful since it's ABI SOLID platform...

                              BTW, Alex: have the mouse ENCODE data (CSHL long RNA-seq, the ones you have shared) been published yet?
                              Hi @apredeus,
                              our mouse paper is under review, however, these mouse data were released by ENCODE in 2013 and are now free of any restrictions, you can check this in the last column of this table:
                              https://genome.ucsc.edu/ENCODE/dataSummaryMouse.html

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Recent Advances in Sequencing Analysis Tools
                                by seqadmin


                                The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                                05-06-2024, 07:48 AM
                              • seqadmin
                                Essential Discoveries and Tools in Epitranscriptomics
                                by seqadmin




                                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                04-22-2024, 07:01 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Today, 02:46 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 05-07-2024, 06:57 AM
                              0 responses
                              13 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 05-06-2024, 07:17 AM
                              0 responses
                              16 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 05-02-2024, 08:06 AM
                              0 responses
                              23 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X