Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Sajna
    Member
    • Oct 2015
    • 14

    TCGA : RNASeq version 1 pipeline

    The following documentation provides good details of the pipeline :


    However, on visiting the the project website at http://seqware.sf.net,
    not able to find any data under the files menu. Also browsed UNC website with UNCids, with no results

    Please could you guide me?. Where can I obtain the RNASeqversion1
    pipeline?
  • GenoMax
    Senior Member
    • Feb 2008
    • 7142

    #2
    Are you looking for the TCGA data from UNC? That is available from TCGA data portal: https://tcga-data.nci.nih.gov/tcga/

    Comment

    • Sajna
      Member
      • Oct 2015
      • 14

      #3
      I have a few sequence read archive(SRA) studies. I want to perform gene quantification for the studies using TCGA RNASeq version 1 pipeline.

      I need the script which could run the entire pipeline for RNASeq version 1 on my sra files.

      And as I mentioned earlier the following link provides details of obtaining the pipeline. However on visiting the gitshub page, here is no data(RNASeq version 1 pipeline) listed! I donot want to use RNASeq vwersion 2 right now, want to reuse TCGA RNASeq V1 pipeline!

      https://confluence.broadinstitute.or...=1363806109000

      Please advice.
      Last edited by Sajna; 10-27-2015, 08:22 PM.

      Comment

      • Sajna
        Member
        • Oct 2015
        • 14

        #4
        And as I mentioned earlier the follwoing link provides details of obtaining the pipeline. However on visiting the gitshub page, here is no data(RNASeq version 1 pipeline) listed! I donot want to use RNASeq vwersion 2 right now, want to reuse TCGA RNASeq V1 pipeline!

        https://confluence.broadinstitute.or...=1363806109000

        Comment

        • GenoMax
          Senior Member
          • Feb 2008
          • 7142

          #5
          At this point in time trying to run SeqWare and version 1 of TCGA RNAseq pipeline would at best be an exercise in futility. You may be better off using new versions of bwa and MapSplice .

          That said this file has additional details about software used in v.1 and v.2: https://tcga-data.nci.nih.gov/tcgafi...ESCRIPTION.txt

          All the data that was submitted under TCGA was reprocessed using v.2 of the pipeline and that is what should be considered current based on communication from UNC TCGA folks.

          Comment

          • Sajna
            Member
            • Oct 2015
            • 14

            #6
            Thanks Genomax. I will get into details of version 2 and process using BWA or I will consider Mapsplice for quantification.
            Last edited by Sajna; 10-29-2015, 09:45 AM.

            Comment

            • Sajna
              Member
              • Oct 2015
              • 14

              #7
              TCGA Mapsplice RNASeqV2 pipeline : Error: check reads format failed

              Hi All,

              I am using Mapsplice run (v2.0). My fastq files have the Sanger/Illumina 1.9 format. I removed the blank spaces and also removed length= and now the head of the file looks like this:

              head ERR519523_1.fastq
              @ERR519523.1:1:100
              CAAACCAATGGCTCCACCCGTACCTGGCTCTGCCTCTACCCACCGACATTGCTCCTGTGGTCCTACTCAGAAGTAGTTCAGCACTCAGGACAGCTTCCAC
              +ERR519523.1:1:100
              CCCFFFFFHHHHHJJIJJJJGHIJGIIJIIJIIIGIGIHIIJJJJGHJJJIJFJIHHHHHDFFFFECCCEEDD>CCCCDEDDDDDD?CDABC@BDCCC3>
              @ERR519523.2:2:65
              TGCATAGAGATAGAAACAGAAAATAGAATGGTGGTTGCAGGGTCTGGAAAGAGAGGAGGAGCGCA
              +ERR519523.2:2:65
              @@@DDDDDHDDDHIIBHA@FEH@@C<EEEHCFHH)?FDC<DF9BDHG9B9B;D=BF=FG;C(:5'
              @ERR519523.3:3:100
              GGACGCATAAGAGTTACAGGCTCTATACACAGGGACTTTCCTTCCTGGAAACCCGGTAGGAAATCCCATTATGGCTGCCTGTTTGCCAAACTATTCCCTT


              When I run mapsplice.py script using the following command, I encounter the error :

              "pairend read name not end with /1 or /2 the 1th read in /ERR519523/ERR519523_1.fastq
              @ERR519523.1:1:100
              [FAILED]
              Error: check reads format failed"

              COMMAND :
              python /opt/MapSplice_multi_threads_2.0.1.9/mapsplice.py -c /hg19_chromosomes/ -x /ebwt/humanchridx_M_rCRS -1 /ERR519523_1.fastq -2 ERR519523_2.fastq
              [Thu Oct 29 17:31:33 2015] Preparing output location mapsplice_out/

              [Thu Oct 29 17:31:33 2015] Beginning Mapsplice run (v2.0)
              -----------------------------------------------
              bin directory: [/opt/MapSplice_multi_threads_2.0.1.9/bin/]
              [Thu Oct 29 17:31:33 2015] Checking for files or directory
              [Thu Oct 29 17:31:33 2015] Checking for files or directory
              [Thu Oct 29 17:31:33 2015] Checking for files or directory
              [Thu Oct 29 17:31:33 2015] Checking for Bowtie index files
              [Thu Oct 29 17:31:33 2015] reads all chromo sizes
              [Thu Oct 29 17:31:42 2015] check reads format
              ERR519523_1.fastq is fastq format
              pairend read name not end with /1 or /2
              the 1th read in /ERR519523/ERR519523_1.fastq
              @ERR519523.1:1:100
              [FAILED]
              Error: check reads format failed

              Please help!!
              Last edited by Sajna; 10-29-2015, 09:45 AM.

              Comment

              • GenoMax
                Senior Member
                • Feb 2008
                • 7142

                #8
                When you extracted the reads from the SRA file did you use the -F/--origfmt switch to preserve the illumina read ID?

                Comment

                • Sajna
                  Member
                  • Oct 2015
                  • 14

                  #9
                  converted the .sra format files to fastq format using latest sratoolkit version with the function fastq-dump srafilenames.sra --split-3 since the data was paired-end.

                  No other specifications were made.

                  Comment

                  • Sajna
                    Member
                    • Oct 2015
                    • 14

                    #10
                    When I converted sra file to fastq using fastq-dump it looked like this :

                    @ERR519523.1 1 length=100
                    CAAACCAATGGCTCCACCCGTACCTGGCTCTGCCTCTACCCACCGACATTGCTCCTGTGGTCCTACTCAGAAGTAGTTCAGCACTCAGGACAGCTTCCAC
                    +ERR519523.1 1 length=100
                    CCCFFFFFHHHHHJJIJJJJGHIJGIIJIIJIIIGIGIHIIJJJJGHJJJIJFJIHHHHHDFFFFECCCEEDD>CCCCDEDDDDDD?CDABC@BDCCC3>
                    @ERR519523.2 2 length=65
                    TGCATAGAGATAGAAACAGAAAATAGAATGGTGGTTGCAGGGTCTGGAAAGAGAGGAGGAGCGCA
                    +ERR519523.2 2 length=65
                    @@@DDDDDHDDDHIIBHA@FEH@@C<EEEHCFHH)?FDC<DF9BDHG9B9B;D=BF=FG;C(:5'
                    @ERR519523.3 3 length=100
                    GGACGCATAAGAGTTACAGGCTCTATACACAGGGACTTTCCTTCCTGGAAACCCGGTAGGAAATCCCATTATGGCTGCCTGTTTGCCAAACTATTCCCTT

                    Then I removed blank spaces and replaced with ' :' and 'length=' was removed and the fastq files were sent to mapsplice, but i got the below mentioned error :

                    "pairend read name not end with /1 or /2 the 1th read in /ERR519523/ERR519523_1.fastq
                    @ERR519523.1:1:100
                    [FAILED]
                    Error: check reads format failed"

                    Please help...

                    Comment

                    • GenoMax
                      Senior Member
                      • Feb 2008
                      • 7142

                      #11
                      You should have used --split-files. Re-extract your data from the SRA file.

                      Edit: Let me look at that SRA#.

                      Edit 2: It appears that the submitters have modified the original illumina fastq read headers in this submission (or they were never submitted to SRA as -F option is only generating a number). After you split the files with just "--split-files" you are going to have to add the /1 and /2 at the end of the fastq headers since MapSplice expects them to be present.
                      Last edited by GenoMax; 10-29-2015, 04:39 AM.

                      Comment

                      • Sajna
                        Member
                        • Oct 2015
                        • 14

                        #12
                        Otherwise, I tried the tool that Mapsplice pipeline uses (UNC ubu.jar) for preparing fastq files for Mapsplice. Command to format fastq is as follows:

                        java -Xmx512M -jar ubu.jar fastq-format --phred33to64 --strip --suffix /1 –in raw_1.fastq --out working/prep_1.fastq >
                        working/mapsplice_prep1.log

                        I tried that, however I get the error : Fastq format not recognizable...

                        I will tryout what you suggested tomorrow morning when at work...and hopefully that should work..lets see
                        Last edited by Sajna; 10-29-2015, 09:55 AM.

                        Comment

                        • GenoMax
                          Senior Member
                          • Feb 2008
                          • 7142

                          #13
                          That is correct.

                          Comment

                          • Sajna
                            Member
                            • Oct 2015
                            • 14

                            #14
                            Genomax, it worked!!!! Many Thanks and good day to you.

                            Comment

                            • Sajna
                              Member
                              • Oct 2015
                              • 14

                              #15
                              TCGA RSEM_ref files

                              I have used "Mapsplice" to align all the SRA fastq samples successfully, and used bedtools coverage function to retrieve the raw read counts. But then the next task was to combine level 3 data from TCGA with the mapsplice aligned SRA samples for differential expression analysis. Having done that I noticed that the number of DE genes are very high. Referencing back, I understood that the "raw counts" reported by TCGA are expected counts from the RSEM software. Although in the RSEM paper, it is mentioned that edgeR and DESeq can process the RSEM counts, it appears that edgeR requires intergers as input. Well...I have now decided to run RSEM on the SRA Sam/Bam files.

                              The TCGA mRNA_Seq pipeline detailed at the following URL requires the hg19_M_rCRS_ref.transcripts.fa file for running RSEM-calculate-expression and to Translate to transcriptome coords.



                              However the file which should be available from the follwoing URL is missing:



                              Also I require the reference mapping file to run RSEM: https://webshare.bioinf.unc.edu/publ...ownToLocus.txt

                              The file is truncated fromGithub' as well.

                              Where can I access the files?
                              Last edited by Sajna; 11-23-2015, 10:27 PM.

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM
                              • SEQadmin2
                                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                by SEQadmin2


                                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                Introduction

                                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                05-22-2026, 06:42 AM
                              • SEQadmin2
                                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                by SEQadmin2

                                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                05-06-2026, 09:04 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              19 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 11:40 AM
                              0 responses
                              14 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-28-2026, 11:40 AM
                              0 responses
                              29 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-26-2026, 10:12 AM
                              0 responses
                              31 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...