Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to change Bowtie settings under TopHat

    Dear all

    After setting bowtie and tophat under a path I ran tophat on 20MB fastq reads.
    And I found only few aligned reads and junctions in output. When I checked logs the results made me suspicious.

    After that
    when i ran just default bowtie command, 90% of my reads are failed to align. Does any one know why ?

    If I need to change any parameters in bowtie where and how could I change them so that tophat recognizes them every time when i ran tophat?

    My reads are 35bp single end illumina-solexa fastq reads (hg18). It would be great if you provide any parameters if I need to change in bowtie especially for these kinda reads.

    Thanx in advance
    Last edited by repinementer; 08-09-2010, 01:36 AM.

  • #2
    You may not be able to do it as TopHat needs Bowtie to run in a specific way. The only way to crack is to re-compile TopHat, changing the way it calls Bowtie, and you're at the risk of crashing TopHat.

    Comment


    • #3
      Actually, the 'tophat' command itself is a python script that could potentially be edited if you wanted to hack at that. Save a copy first in case a change makes it stop working.

      Comment


      • #4
        Here is the "call bowtie" part in tophat python script. From what I can see (by a quick search), only -k, -m, -v (or -n) are set (see the red part below), which means only output the reads that have mismatch number <= a specific number (set by -n or -v) and total number of qualified reads <= maxhit (set by -k and -m, in tophat it's -g/--max-multihits). So, by default, tophat call bowtie as "bowtie -v N -k 20 -m 20". You can change it if you want to use parameters like "-a --best --strata" (be aware that -a and -k are mutually exclusive).


        # Call bowtie
        def bowtie(params,
        bwt_idx_prefix,
        reads_list,
        reads_format,
        num_mismatches,
        mapped_reads,
        unmapped_reads,
        extra_output = "",
        t_mapping = False,
        multihits_out = None): #only --prefilter-multihits should activate this parameter for the initial prefilter search
        start_time = datetime.now()
        bwt_idx_name = bwt_idx_prefix.split('/')[-1]
        reads_file=reads_list[0]
        readfile_basename=getFileBaseName(reads_file)
        if t_mapping:
        print >> sys.stderr, "[%s] Mapping %s against transcriptome %s with Bowtie %s" % (start_time.strftime("%c"),
        readfile_basename, bwt_idx_name, extra_output)
        else:
        print >> sys.stderr, "[%s] Mapping %s against %s with Bowtie %s" % (start_time.strftime("%c"),
        readfile_basename, bwt_idx_name, extra_output)
        bwt_log = open(logging_dir + 'bowtie.'+readfile_basename+'.fixmap.log', "w")
        #bwt_mapped=mapped_reads
        unmapped_reads_out=unmapped_reads
        if use_BWT_FIFO:
        global unmapped_reads_fifo
        if os.path.exists(unmapped_reads_fifo):
        os.remove(unmapped_reads_fifo)
        try:
        os.mkfifo(unmapped_reads_fifo)
        except OSError, o:
        die(fail_str+"Error at mkfifo("+unmapped_reads_fifo+'). '+str(o))
        unmapped_reads_out=unmapped_reads_fifo

        # Launch Bowtie
        try:
        bowtie_cmd = [bowtie_path]

        if reads_format == "fastq":
        bowtie_cmd += ["-q"]
        elif reads_format == "fasta":
        bowtie_cmd += ["-f"]

        if params.read_params.color:

        bowtie_cmd += ["-C", "--col-keepends"]
        unzip_cmd=None
        if use_zpacker:
        unzip_cmd=[ params.system_params.zipper ]
        unzip_cmd.extend(params.system_params.zipper_opts)
        zip_cmd=unzip_cmd[:]
        unzip_cmd+=['-cd']
        zip_cmd+=['-c']

        fifo_pid=None
        if unmapped_reads:
        bowtie_cmd += ["--un", unmapped_reads_out]

        if use_BWT_FIFO:
        print >> run_log, ' '.join(zip_cmd)+' < '+ unmapped_reads_fifo + ' > '+ unmapped_reads + ' & '
        fifo_pid=os.fork()
        if fifo_pid==0:
        def on_sig_exit(sig, func=None):
        os._exit(os.EX_OK)
        signal.signal(signal.SIGTERM, on_sig_exit)
        subprocess.call(zip_cmd,
        stdin=open(unmapped_reads_fifo, "r"),
        stdout=open(unmapped_reads, "wb"))
        os._exit(os.EX_OK)

        fix_map_cmd = [prog_path('fix_map_ordering')]

        max_hits = params.max_hits
        if t_mapping:
        max_hits = params.t_max_hits
        if params.bowtie_alignment_option == "-n" and num_mismatches>3:
        num_mismatches=3
        bowtie_cmd += [params.bowtie_alignment_option, str(num_mismatches),
        "-p", str(params.system_params.num_cpus),
        "-k", str(max_hits),
        "-m", str(max_hits)]

        if multihits_out:
        bowtie_cmd += ["--max", multihits_out]
        else:
        bowtie_cmd += ["--max", "/dev/null"]


        bowtie_cmd += [ bwt_idx_prefix ]
        bowtie_proc=None
        shellcmd=""
        unzip_proc=None
        zip_proc=None
        if multihits_out:
        #special prefilter bowtie run: we use prep_reads on the fly
        #in order to get multi-mapped reads to exclude later
        prep_cmd = prep_reads_cmd(params, ",".join(reads_list))
        preplog_fname=logging_dir + "prep_reads_prefilter.log"
        prepfilter_log = open(preplog_fname,"w")
        unzip_proc = subprocess.Popen(prep_cmd,
        stdout=subprocess.PIPE,
        stderr=prepfilter_log)
        shellcmd=' '.join(prep_cmd) + "|"
        else:
        z_input=use_zpacker and reads_file.endswith(".z")
        if z_input:
        unzip_proc = subprocess.Popen(unzip_cmd,
        stdin=open(reads_file, "rb"),
        stdout=subprocess.PIPE)
        shellcmd=' '.join(unzip_cmd) + "< " +reads_file +"|"
        else:
        #must be uncompressed fastq input (e.g. unmapped reads from a previous run)
        bowtie_cmd += [reads_file]
        bowtie_proc = subprocess.Popen(bowtie_cmd,
        stdout=subprocess.PIPE,
        stderr=bwt_log)
        if unzip_proc:
        #input is compressed OR prep_reads is used as a filter
        bowtie_cmd += ['-']
        bowtie_proc = subprocess.Popen(bowtie_cmd,
        stdin=unzip_proc.stdout,
        stdout=subprocess.PIPE,
        stderr=bwt_log)
        unzip_proc.stdout.close() # see http://bugs.python.org/issue7678

        fix_map_cmd += ['-']
        shellcmd += ' '.join(bowtie_cmd) + '|' + ' '.join(fix_map_cmd)
        if use_zpacker:
        shellcmd += "|"+ ' '.join(zip_cmd)
        fix_order_proc = subprocess.Popen(fix_map_cmd,
        stdin=bowtie_proc.stdout,

        stdout=subprocess.PIPE)
        zip_proc = subprocess.Popen(zip_cmd,
        preexec_fn=subprocess_setup,
        stdin=fix_order_proc.stdout,
        stdout=open(mapped_reads, "wb"))
        fix_order_proc.stdout.close()
        else:
        fix_order_proc = subprocess.Popen(fix_map_cmd,
        preexec_fn=subprocess_setup,
        stdin=bowtie_proc.stdout,
        stdout=open(mapped_reads, "w"))
        bowtie_proc.stdout.close()
        shellcmd += " > " + mapped_reads
        print >> run_log, shellcmd
        if use_zpacker:
        zip_proc.communicate()
        else:
        fix_order_proc.communicate()
        if use_BWT_FIFO:
        if fifo_pid and not os.path.exists(unmapped_reads):
        try:
        os.kill(fifo_pid, signal.SIGTERM)
        except:
        pass
        except OSError, o:
        die(fail_str+"Error: "+str(o))

        # Success
        #finish_time = datetime.now()
        #duration = finish_time - start_time
        #print >> sys.stderr, "\t\t\t[%s elapsed]" % formatTD(duration)
        if use_BWT_FIFO:
        try:
        os.remove(unmapped_reads_fifo)
        except:
        pass
        if multihits_out and not os.path.exists(multihits_out):
        open(multihits_out, "w").close()
        return (mapped_reads, unmapped_reads)
        Last edited by sterding; 04-04-2012, 07:52 AM.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Best Practices for Single-Cell Sequencing Analysis
          by seqadmin



          While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
          06-06-2024, 07:15 AM
        • seqadmin
          Latest Developments in Precision Medicine
          by seqadmin



          Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

          Somatic Genomics
          “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
          05-24-2024, 01:16 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 06-17-2024, 06:54 AM
        0 responses
        10 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 06-14-2024, 07:24 AM
        0 responses
        20 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 06-13-2024, 08:58 AM
        0 responses
        17 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 06-12-2024, 02:20 PM
        0 responses
        20 views
        0 likes
        Last Post seqadmin  
        Working...
        X