Seqanswers Leaderboard Ad

**lindseykelly** · 07-12-2012, 05:59 AM

This was the response from the Galaxy team, in case someone else has this question:

Yes, you have this correct. The general path would be to:

- join forward and reverse data per run
- run FASTQ Groomer & FastQC
(note: if your data is already in Sanger FASTQ format with Phred+33 quality scaled
values, the datatype '.fastqsanger' can be directly assigned and the FASTQ Groomer
step skipped. This is likely true if your data is a from the latest CASAVA pipeline, but
please double check.)
- discard data as needed based on quality
- split forward and reverse data that passes QC
- concatenate all forward reads from a sample into one FASTQ file
- concatenate all reverse reads from a sample into one FASTQ file.
- for each sample, run TopHat using the two concatenated FASTQ files

To manipulate paired end data, please see the tools -> NGS: QC and manipulation: FASTQ splitter & FASTQ joiner.

To combined data files head-to-tail from multiple runs into a single FASTQ file please see the tool -> Text Manipulation: Concatenate datasets.

I am not sure of the actual volume of data, but if these start to get large or TopHat errors with a memory problem, a local or cluster instance would be the recommendation: http://getgalaxy.org

For reference:

404 Not Found

http://tophat.cbcb.umd.edu/manual.html

http://www.nature.com/nprot/journal/v7/n3/full/nprot.2012.016.html

Hopefully this helps. Others are welcome to post comments/suggestions.

Jen
Galaxy team

**[email protected]** · 09-12-2012, 09:03 PM

This may be helpful to you:

404 Resource at '/content/dam/illumina-marketing/documents/products/datasheets/datasheet_rnaseq_analysis.pdf' not found: No resource found

http://www.illumina.com/documents/products/datasheets/datasheet_rnaseq_analysis.pdf

**mhkiani** · 10-29-2013, 12:44 PM

Broken paired reads

I got some RNA-seq paired 100bp data and when I did the RNA-seq analyis with CLC, I got more than 50% broken pairs among the reads and I'm not sure why.

**sugo** · 01-16-2014, 10:59 AM

What is the purpose of joining the forward and reverse reads prior to QC? Couldn't the QC be run on the separate reads?

**Mike2188** · 07-30-2014, 01:09 PM

If you do each file individually then you run into errors during alignments. For instance if I had 100,000 paired end reads in two files forward.fq and reverse.fq and I performed some trimming and quality filtering on each individually, I might end up with one file with 90,000 reads and one with 89,000. Now when I go to do alignments, the program will assume the first read in forward.fastq corresponds to the first read in reverse.fastq - but now the files are uneven. The alignments won't work correctly because of this.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 18 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Initial QC and grooming for Illumina HiSeq2000 paired end RNAseq on Galaxy

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News