Unconfigured Ad

**GenoMax** · 10-27-2015, 03:33 AM

Are you looking for the TCGA data from UNC? That is available from TCGA data portal: https://tcga-data.nci.nih.gov/tcga/

**Sajna** · 10-27-2015, 08:16 PM

I have a few sequence read archive(SRA) studies. I want to perform gene quantification for the studies using TCGA RNASeq version 1 pipeline.

I need the script which could run the entire pipeline for RNASeq version 1 on my sra files.

And as I mentioned earlier the following link provides details of obtaining the pipeline. However on visiting the gitshub page, here is no data(RNASeq version 1 pipeline) listed! I donot want to use RNASeq vwersion 2 right now, want to reuse TCGA RNASeq V1 pipeline!

https://confluence.broadinstitute.or...=1363806109000

Please advice.

**Sajna** · 10-27-2015, 08:21 PM

And as I mentioned earlier the follwoing link provides details of obtaining the pipeline. However on visiting the gitshub page, here is no data(RNASeq version 1 pipeline) listed! I donot want to use RNASeq vwersion 2 right now, want to reuse TCGA RNASeq V1 pipeline!

https://confluence.broadinstitute.or...=1363806109000

**GenoMax** · 10-28-2015, 03:08 AM

At this point in time trying to run SeqWare and version 1 of TCGA RNAseq pipeline would at best be an exercise in futility. You may be better off using new versions of bwa and MapSplice .

That said this file has additional details about software used in v.1 and v.2: https://tcga-data.nci.nih.gov/tcgafi...ESCRIPTION.txt

All the data that was submitted under TCGA was reprocessed using v.2 of the pipeline and that is what should be considered current based on communication from UNC TCGA folks.

**Sajna** · 10-28-2015, 04:08 AM

Thanks Genomax. I will get into details of version 2 and process using BWA or I will consider Mapsplice for quantification.

**Sajna** · 10-29-2015, 03:59 AM

TCGA Mapsplice RNASeqV2 pipeline : Error: check reads format failed

Hi All,

I am using Mapsplice run (v2.0). My fastq files have the Sanger/Illumina 1.9 format. I removed the blank spaces and also removed length= and now the head of the file looks like this:

head ERR519523_1.fastq
@ERR519523.1:1:100
CAAACCAATGGCTCCACCCGTACCTGGCTCTGCCTCTACCCACCGACATTGCTCCTGTGGTCCTACTCAGAAGTAGTTCAGCACTCAGGACAGCTTCCAC
+ERR519523.1:1:100
CCCFFFFFHHHHHJJIJJJJGHIJGIIJIIJIIIGIGIHIIJJJJGHJJJIJFJIHHHHHDFFFFECCCEEDD>CCCCDEDDDDDD?CDABC@BDCCC3>
@ERR519523.2:2:65
TGCATAGAGATAGAAACAGAAAATAGAATGGTGGTTGCAGGGTCTGGAAAGAGAGGAGGAGCGCA
+ERR519523.2:2:65
@@@DDDDDHDDDHIIBHA@FEH@@C<EEEHCFHH)?FDC<DF9BDHG9B9B;D=BF=FG;C(:5'
@ERR519523.3:3:100
GGACGCATAAGAGTTACAGGCTCTATACACAGGGACTTTCCTTCCTGGAAACCCGGTAGGAAATCCCATTATGGCTGCCTGTTTGCCAAACTATTCCCTT

When I run mapsplice.py script using the following command, I encounter the error :

"pairend read name not end with /1 or /2 the 1th read in /ERR519523/ERR519523_1.fastq
@ERR519523.1:1:100
[FAILED]
Error: check reads format failed"

COMMAND :
python /opt/MapSplice_multi_threads_2.0.1.9/mapsplice.py -c /hg19_chromosomes/ -x /ebwt/humanchridx_M_rCRS -1 /ERR519523_1.fastq -2 ERR519523_2.fastq
[Thu Oct 29 17:31:33 2015] Preparing output location mapsplice_out/

[Thu Oct 29 17:31:33 2015] Beginning Mapsplice run (v2.0)
-----------------------------------------------
bin directory: [/opt/MapSplice_multi_threads_2.0.1.9/bin/]
[Thu Oct 29 17:31:33 2015] Checking for files or directory
[Thu Oct 29 17:31:33 2015] Checking for files or directory
[Thu Oct 29 17:31:33 2015] Checking for files or directory
[Thu Oct 29 17:31:33 2015] Checking for Bowtie index files
[Thu Oct 29 17:31:33 2015] reads all chromo sizes
[Thu Oct 29 17:31:42 2015] check reads format
ERR519523_1.fastq is fastq format
pairend read name not end with /1 or /2
the 1th read in /ERR519523/ERR519523_1.fastq
@ERR519523.1:1:100
[FAILED]
Error: check reads format failed

Please help!!

**GenoMax** · 10-29-2015, 04:04 AM

When you extracted the reads from the SRA file did you use the -F/--origfmt switch to preserve the illumina read ID?

**Sajna** · 10-29-2015, 04:10 AM

converted the .sra format files to fastq format using latest sratoolkit version with the function fastq-dump srafilenames.sra --split-3 since the data was paired-end.

No other specifications were made.

**Sajna** · 10-29-2015, 04:15 AM

When I converted sra file to fastq using fastq-dump it looked like this :

@ERR519523.1 1 length=100
CAAACCAATGGCTCCACCCGTACCTGGCTCTGCCTCTACCCACCGACATTGCTCCTGTGGTCCTACTCAGAAGTAGTTCAGCACTCAGGACAGCTTCCAC
+ERR519523.1 1 length=100
CCCFFFFFHHHHHJJIJJJJGHIJGIIJIIJIIIGIGIHIIJJJJGHJJJIJFJIHHHHHDFFFFECCCEEDD>CCCCDEDDDDDD?CDABC@BDCCC3>
@ERR519523.2 2 length=65
TGCATAGAGATAGAAACAGAAAATAGAATGGTGGTTGCAGGGTCTGGAAAGAGAGGAGGAGCGCA
+ERR519523.2 2 length=65
@@@DDDDDHDDDHIIBHA@FEH@@C<EEEHCFHH)?FDC<DF9BDHG9B9B;D=BF=FG;C(:5'
@ERR519523.3 3 length=100
GGACGCATAAGAGTTACAGGCTCTATACACAGGGACTTTCCTTCCTGGAAACCCGGTAGGAAATCCCATTATGGCTGCCTGTTTGCCAAACTATTCCCTT

Then I removed blank spaces and replaced with ' :' and 'length=' was removed and the fastq files were sent to mapsplice, but i got the below mentioned error :

"pairend read name not end with /1 or /2 the 1th read in /ERR519523/ERR519523_1.fastq
@ERR519523.1:1:100
[FAILED]
Error: check reads format failed"

Please help...

**GenoMax** · 10-29-2015, 04:24 AM

You should have used --split-files. Re-extract your data from the SRA file.

Edit: Let me look at that SRA#.

Edit 2: It appears that the submitters have modified the original illumina fastq read headers in this submission (or they were never submitted to SRA as -F option is only generating a number). After you split the files with just "--split-files" you are going to have to add the /1 and /2 at the end of the fastq headers since MapSplice expects them to be present.

**Sajna** · 10-29-2015, 09:43 AM

Otherwise, I tried the tool that Mapsplice pipeline uses (UNC ubu.jar) for preparing fastq files for Mapsplice. Command to format fastq is as follows:

java -Xmx512M -jar ubu.jar fastq-format --phred33to64 --strip --suffix /1 –in raw_1.fastq --out working/prep_1.fastq >
working/mapsplice_prep1.log

I tried that, however I get the error : Fastq format not recognizable...

I will tryout what you suggested tomorrow morning when at work...and hopefully that should work..lets see

**GenoMax** · 10-29-2015, 09:45 AM

That is correct.

**Sajna** · 11-02-2015, 12:20 AM

Genomax, it worked!!!! Many Thanks and good day to you.

**Sajna** · 11-23-2015, 10:24 PM

TCGA RSEM_ref files

I have used "Mapsplice" to align all the SRA fastq samples successfully, and used bedtools coverage function to retrieve the raw read counts. But then the next task was to combine level 3 data from TCGA with the mapsplice aligned SRA samples for differential expression analysis. Having done that I noticed that the number of DE genes are very high. Referencing back, I understood that the "raw counts" reported by TCGA are expected counts from the RSEM software. Although in the RSEM paper, it is mentioned that edgeR and DESeq can process the RSEM counts, it appears that edgeR requires intergers as input. Well...I have now decided to run RSEM on the SRA Sam/Bam files.

The TCGA mRNA_Seq pipeline detailed at the following URL requires the hg19_M_rCRS_ref.transcripts.fa file for running RSEM-calculate-expression and to Translate to transcriptome coords.

https://webshareex.bioinf.unc.edu/public/mRNAseq_TCGA/UNC_mRNAseq_summary.pdf

However the file which should be available from the follwoing URL is missing:

Error 404 - Unknown

https://webshare.bioinf.unc.edu/public/mRNAseq_TCGA/rsem_ref/hg19_M_rCRS_ref.transcripts.fa

Also I require the reference mapping file to run RSEM: https://webshare.bioinf.unc.edu/publ...ownToLocus.txt

The file is truncated fromGithub' as well.

Where can I access the files?

Topics	Statistics	Last Post
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, 06-02-2026, 12:03 PM	0 responses 19 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 12:03 PM
DNA Methylation Study Reveals How Epigenetic Changes Pass Between Generations by SEQadmin2 Started by SEQadmin2, 06-02-2026, 11:40 AM	0 responses 14 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 11:40 AM
MetaBeeAI Helps Scientists Process Research Literature Faster by SEQadmin2 Started by SEQadmin2, 05-28-2026, 11:40 AM	0 responses 29 views 0 reactions	Last Post by SEQadmin2 05-28-2026, 11:40 AM
Scientists Solve a 25-Year Mystery in RNA Interference by SEQadmin2 Started by SEQadmin2, 05-26-2026, 10:12 AM	0 responses 31 views 0 reactions	Last Post by SEQadmin2 05-26-2026, 10:12 AM

Unconfigured Ad

TCGA : RNASeq version 1 pipeline

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News