Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • TPH
    Member
    • Jan 2016
    • 19

    Tuxedo suite / Parallel Processing

    Hello,
    I have a paired-end RNAseq data set for two treatment conditions without any replicates. I want to check isoform variation in a particular gene and gene expression variations in general. Two paired-end file for each sample has been broken down in to seven files as the data was generated. I want to run these data in parallel using tuxedo suit.
    The thing is I am not clear whether this tophat input command takes comma separated files as replicates or pieces of a single fastq file for two paired-end files.
    tophat [options]* <genome_index_base> <reads1_1[,...,readsN_1]> [reads1_2,...readsN_2]
    And what would be the next steps in running tuxedo suite parallel ?
    Could anyone please help me.
    Thank you very much
    TPH
  • GenoMax
    Senior Member
    • Feb 2008
    • 7142

    #2
    You may want to concatenate the files for each sample into one and then use the multiple threads option for tophat to achieve faster processing.

    Comment

    • TPH
      Member
      • Jan 2016
      • 19

      #3
      Thank you very much. really appreciate your help

      Comment

      • GenoMax
        Senior Member
        • Feb 2008
        • 7142

        #4
        I should clarify that you would want to concatenate all R1 pieces and all R2 pieces for each sample and then use resulting R1 and R2 files for tophat runs.

        Comment

        • TPH
          Member
          • Jan 2016
          • 19

          #5
          Thanks again, I saw in a post it is not recommended to concatenate data but run in parallel instead. Its totally clear how concatenated data can use for the analysis, but I do not understand how parallel running for individual file works and downside of concatenating files. Do you have any idea about that? It would be a great help.

          Comment

          • GenoMax
            Senior Member
            • Feb 2008
            • 7142

            #6
            There is many ways to skin a cat and you could certainly do this in parallel (as Pierre suggests in biostars thread) with original file pieces.

            You would want to take into consideration the amount of hardware resources you have available. If you are on a cluster with plenty of nodes/RAM by all means go for processing the individual pair chinks in parallel (with multiple threads). If you have limited hardware (i.e. single server) you may want to either run the chunk jobs serially (or combine and then run them as one). If you did the analysis in chunks then you would use cuffmerge to merge your results.

            Comment

            • TPH
              Member
              • Jan 2016
              • 19

              #7
              I work in a cluster. I did the analysis by executing tophat command individually to each of the seven files with its paired file without any concatenation. I realized later the way I feed the data in was wrong because it took the data as seven different replicates. This is the way I wrote the command and I replicated it six more times.
              tophat -p 8 -o tophat_out -G $genomeSeq $genomeIndex R1_001.fastq R2_001.fastq
              If I want to process the data in parallel what would be the best way to put the data in? Could you please help me to figure out the correct the command for that?

              Comment

              • GenoMax
                Senior Member
                • Feb 2008
                • 7142

                #8
                I assume you have 7 separate directories for the tophat output for the 7 files for each condition because of how you ran the analysis? You could merge the "accpeted_hits.bam" files for each condition into one as Pierre suggested in the other thread. What are you going to use for the downstream analysis, cuffdiff?

                Comment

                • TPH
                  Member
                  • Jan 2016
                  • 19

                  #9
                  yea that's the output I have. So using "cat" command for the accepted_hits.bam files would work as same as concatenating starting fastq files. Thank you very much.
                  Yes, I am using Cuffdiff for the final step.

                  Comment

                  • dpryan
                    Devon Ryan
                    • Jul 2011
                    • 3478

                    #10
                    You can't concatenate BAM files with "cat", though you could with "samtools cat". I would strongly encourage you to "samtools merge" instead, though!

                    Comment

                    • TPH
                      Member
                      • Jan 2016
                      • 19

                      #11
                      Thank you so much.

                      Comment

                      Latest Articles

                      Collapse

                      • GATTACAT
                        Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                        by GATTACAT
                        Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                        07-01-2026, 11:43 AM
                      • SEQadmin2
                        Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                        by SEQadmin2


                        I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                        Here are nine questions we think about, in roughly the order they matter, before...
                        06-18-2026, 07:11 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by SEQadmin2, 07-02-2026, 11:08 AM
                      0 responses
                      16 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-30-2026, 05:37 AM
                      0 responses
                      17 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-26-2026, 11:10 AM
                      0 responses
                      20 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-17-2026, 06:09 AM
                      0 responses
                      54 views
                      0 reactions
                      Last Post SEQadmin2  
                      Working...