Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • mascano
    Junior Member
    • Aug 2013
    • 8

    Tophat2 long_spanning_reads error -- cannot open xx.bam for reading

    This is what I am running:
    TopHat run (v2.0.9), Bowtie version: 2.1.0.0, Samtools version: 0.1.19.0

    This is what I entered:
    Code:
    tophat2 -p 24 -G /Users/mascano/Sequence_Analyses/Reference/Homo_sapiens_NCBI_build37.2/Homo_sapiens/NCBI/build37.2/Annotation/Genes/genes.gtf -o PMA_0hr_TotalRNA /Users/mascano/Sequence_Analyses/Reference/Homo_sapiens_NCBI_build37.2/Homo_sapiens/NCBI/build37.2/Sequence/Bowtie2index/genome /Users/mascano/Sequence_Analyses/DATA/THP1_timecourse/Act_1_ATCACG_L002_R1_001.fastq
    And finally, here is my tophat.log
    Code:
    [2013-08-20 12:28:14] Beginning TopHat run (v2.0.9)
    -----------------------------------------------
    [2013-08-20 12:28:14] Checking for Bowtie
    		  Bowtie version:	 2.1.0.0
    [2013-08-20 12:28:14] Checking for Samtools
    		Samtools version:	 0.1.19.0
    [2013-08-20 12:28:14] Checking for Bowtie index files (genome)..
    [2013-08-20 12:28:14] Checking for reference FASTA file
    [2013-08-20 12:28:14] Generating SAM header for /Users/mascano/Sequence_Analyses/Reference/Homo_sapiens_NCBI_build37.2/Homo_sapiens/NCBI/build37.2/Sequence/Bowtie2index/genome
    	format:		 fastq
    	quality scale:	 phred33 (default)
    [2013-08-20 12:28:47] Reading known junctions from GTF file
    [2013-08-20 12:28:51] Preparing reads
    	 left reads: min. length=101, max. length=101, 26861915 kept reads (78856 discarded)
    [2013-08-20 12:38:22] Building transcriptome data files..
    [2013-08-20 12:39:34] Building Bowtie index from genes.fa
    [2013-08-20 12:52:18] Mapping left_kept_reads to transcriptome genes with Bowtie2 
    [2013-08-20 13:03:00] Resuming TopHat pipeline with unmapped reads
    [2013-08-20 13:03:00] Mapping left_kept_reads.m2g_um to genome genome with Bowtie2 
    [2013-08-20 13:36:56] Mapping left_kept_reads.m2g_um_seg1 to genome genome with Bowtie2 (1/4)
    [2013-08-20 13:43:24] Mapping left_kept_reads.m2g_um_seg2 to genome genome with Bowtie2 (2/4)
    [2013-08-20 13:51:47] Mapping left_kept_reads.m2g_um_seg3 to genome genome with Bowtie2 (3/4)
    [2013-08-20 13:59:15] Mapping left_kept_reads.m2g_um_seg4 to genome genome with Bowtie2 (4/4)
    [2013-08-20 14:10:43] Searching for junctions via segment mapping
    [2013-08-20 14:15:54] Retrieving sequences for splices
    [2013-08-20 14:18:21] Indexing splices
    [2013-08-20 14:19:02] Mapping left_kept_reads.m2g_um_seg1 to genome segment_juncs with Bowtie2 (1/4)
    [2013-08-20 14:20:37] Mapping left_kept_reads.m2g_um_seg2 to genome segment_juncs with Bowtie2 (2/4)
    [2013-08-20 14:22:42] Mapping left_kept_reads.m2g_um_seg3 to genome segment_juncs with Bowtie2 (3/4)
    [2013-08-20 14:24:24] Mapping left_kept_reads.m2g_um_seg4 to genome segment_juncs with Bowtie2 (4/4)
    [2013-08-20 14:26:36] Joining segment hits
    	[FAILED]
    Error running 'long_spanning_reads':Error: cannot open PMA_0hr_TotalRNA/tmp/left_kept_reads.m2g_um.bam for reading
    The output directory is created, as are the subdirectories. The tmp directory contains plenty of files, including "left_kept_reads.m2g_um.bam"
    That file is ~1GB (and it's permissions are me:read and write, staff:read only, everyone:read only)

    Help is appreciated
  • mascano
    Junior Member
    • Aug 2013
    • 8

    #2
    A bit of an update

    Running 16 threads, instead of 24, allowed tophat to complete the run:
    Code:
    tophat2 -p 16 -G /Users/mascano/Sequence_Analyses/Reference/Homo_sapiens_NCBI_build37.2/Homo_sapiens/NCBI/build37.2/Annotation/Genes/genes.gtf -o PMA_0hr_TotalRNA /Users/mascano/Sequence_Analyses/Reference/Homo_sapiens_NCBI_build37.2/Homo_sapiens/NCBI/build37.2/Sequence/Bowtie2index/genome /Users/mascano/Sequence_Analyses/DATA/THP1_timecourse/Act_1_ATCACG_L002_R1_001.fastq

    I have a 2 x 2.4Ghz 6-core Xeon - so that's 12-core physical plus 12 virtual with hyperthreading, which theoretically means I can assign '-p 24'

    My guess is memory usage, but not entirely clear. I have 64GB RAM (which is the maximum allowed, until Mavericks OSX comes out).

    Any advice on how to assign 24 threads without TopHat2 failing? Would calling the '-mm' argument work?

    Comment

    • GenoMax
      Senior Member
      • Feb 2008
      • 7142

      #3
      Why is using all 24 threads so important?

      Have you considered the possibility that the storage subsystem you have on this machine is probably a bottleneck (look in the activity monitor to see if you are maxing out the throughput).

      So rather than having 24 cores in some sort of iowait round robin state it may be better to start with a smaller number of cores and experiment to find the optimal performance balance.

      Comment

      • mascano
        Junior Member
        • Aug 2013
        • 8

        #4
        Thank you for the reply and suggestion. I had not considered that the HDD io may be the bottleneck; I imagine an SSD may improve it. However, in looking at the disk activity, I haven't seen it peak anywhere near 6 Gb/sec (or 768MB) which should be the bandwidth of my HDD (using the ICH10 bridge), in a successful run (using 16 threads).

        I doubt it will skyrocket to that throughput ceiling with all 24 threads, no?

        Comment

        • GenoMax
          Senior Member
          • Feb 2008
          • 7142

          #5
          There is theoretical throughput and real life performance. Since tophat suite is developed on Mac it should be optimized for OS X.

          If you are interested you could look at specific application level stats by following the suggestions in this post: http://blog.yerkanian.com/2011/10/17...io-on-macos-x/

          Check using the following to see CPU level performance for various processes in a terminal window (adjust parameters as needed by looking at man entry for top).

          Code:
          $ top -n10 -u
          Last edited by GenoMax; 08-21-2013, 10:03 AM.

          Comment

          • mascano
            Junior Member
            • Aug 2013
            • 8

            #6
            Under -p 16 conditions:

            Memory usage peaked at 4 GB for long_spanning process. But IO for HDD did not exceed 20MB/sec (read or write) total. I used
            Code:
            sudo iotop -C 5
            as well as viewing memory usage and disk activity via Activity Monitor.

            I was monitoring during these log events, which is when it would fail if -p 24:
            Code:
            [2013-08-21 14:59:08] Mapping left_kept_reads.m2g_um_seg4 to genome segment_juncs with Bowtie2 (4/4)
            [2013-08-21 15:01:19] Joining segment hits
            [2013-08-21 15:05:52] Reporting output tracks

            Comment

            • GenoMax
              Senior Member
              • Feb 2008
              • 7142

              #7
              So we know that 16 threads work but not 24. OS X may need some cores to keep essential parts of the OS running. Next thing to try would be to increment 16 towards 24 and see at what point the process fails.

              Comment

              • mascano
                Junior Member
                • Aug 2013
                • 8

                #8
                So I can move to 22 cores either as a single run or parallel process runs that total up to 22. That said, I'm not being shy about using the computer simultaneously for other applications (MsOffice, Chrome, Mail, etc.) so, at least for my configuration, 22 threads is more than satisfactory.

                Comment

                • GenoMax
                  Senior Member
                  • Feb 2008
                  • 7142

                  #9
                  Since you did the experiment...

                  How much time (if any) is saved by going from 16 to 22 cores for the same job? A rough estimate is fine if you did not time the runs.

                  Comment

                  • mascano
                    Junior Member
                    • Aug 2013
                    • 8

                    #10
                    Not using the exact same fastq, but of similar size (~30Mio):
                    I went from ~2hrs at 16 cores, to 1.5hrs at 22 cores, to 2.5 hrs at 11 cores. Although, when I run two shelled processes at 11 cores each, one of them is consistently around 2.5 and the other one 3.5 hrs. I think I'll have to just wing this and figure out the best balance of the number of parallel processes vs the number of cores per process.

                    Comment

                    Latest Articles

                    Collapse

                    • SEQadmin2
                      Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                      by SEQadmin2


                      I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                      Here are nine questions we think about, in roughly the order they matter, before...
                      06-18-2026, 07:11 AM
                    • SEQadmin2
                      From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                      by SEQadmin2


                      Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                      The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                      ...
                      06-02-2026, 10:05 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by SEQadmin2, Today, 05:37 AM
                    0 responses
                    5 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-26-2026, 11:10 AM
                    0 responses
                    16 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-17-2026, 06:09 AM
                    0 responses
                    49 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-09-2026, 11:58 AM
                    0 responses
                    109 views
                    0 reactions
                    Last Post SEQadmin2  
                    Working...