Unconfigured Ad

**GenoMax** · 01-28-2016, 09:16 AM

You may want to concatenate the files for each sample into one and then use the multiple threads option for tophat to achieve faster processing.

**TPH** · 01-28-2016, 10:24 AM

Thank you very much. really appreciate your help

**GenoMax** · 01-28-2016, 10:35 AM

I should clarify that you would want to concatenate all R1 pieces and all R2 pieces for each sample and then use resulting R1 and R2 files for tophat runs.

**TPH** · 01-28-2016, 10:44 AM

Thanks again, I saw in a post it is not recommended to concatenate data but run in parallel instead. Its totally clear how concatenated data can use for the analysis, but I do not understand how parallel running for individual file works and downside of concatenating files. Do you have any idea about that? It would be a great help.

**GenoMax** · 01-28-2016, 10:58 AM

There is many ways to skin a cat and you could certainly do this in parallel (as Pierre suggests in biostars thread) with original file pieces.

You would want to take into consideration the amount of hardware resources you have available. If you are on a cluster with plenty of nodes/RAM by all means go for processing the individual pair chinks in parallel (with multiple threads). If you have limited hardware (i.e. single server) you may want to either run the chunk jobs serially (or combine and then run them as one). If you did the analysis in chunks then you would use cuffmerge to merge your results.

**TPH** · 01-28-2016, 11:13 AM

I work in a cluster. I did the analysis by executing tophat command individually to each of the seven files with its paired file without any concatenation. I realized later the way I feed the data in was wrong because it took the data as seven different replicates. This is the way I wrote the command and I replicated it six more times.
tophat -p 8 -o tophat_out -G $genomeSeq $genomeIndex R1_001.fastq R2_001.fastq
If I want to process the data in parallel what would be the best way to put the data in? Could you please help me to figure out the correct the command for that?

**GenoMax** · 01-28-2016, 12:31 PM

I assume you have 7 separate directories for the tophat output for the 7 files for each condition because of how you ran the analysis? You could merge the "accpeted_hits.bam" files for each condition into one as Pierre suggested in the other thread. What are you going to use for the downstream analysis, cuffdiff?

**TPH** · 01-28-2016, 12:57 PM

yea that's the output I have. So using "cat" command for the accepted_hits.bam files would work as same as concatenating starting fastq files. Thank you very much.
Yes, I am using Cuffdiff for the final step.

**dpryan** · 01-28-2016, 01:57 PM

You can't concatenate BAM files with "cat", though you could with "samtools cat". I would strongly encourage you to "samtools merge" instead, though!

**TPH** · 01-28-2016, 01:59 PM

Thank you so much.

Topics	Statistics	Last Post
High-Resolution Sequencing Exposes Hidden Toxoplasma Diversity by SEQadmin2 Started by SEQadmin2, 07-02-2026, 11:08 AM	0 responses 16 views 0 reactions	Last Post by SEQadmin2 07-02-2026, 11:08 AM
New AI Model Captures Long-Range Genomic Signals to Improve RNA Splice Site Prediction by SEQadmin2 Started by SEQadmin2, 06-30-2026, 05:37 AM	0 responses 17 views 0 reactions	Last Post by SEQadmin2 06-30-2026, 05:37 AM
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 20 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 54 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM

Unconfigured Ad

Tuxedo suite / Parallel Processing

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News