No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • bowtie 80 mer-long mapping


    I try to map relatively long reads (80-bases-long single read, fastq format) to hg19 using bowtie.
    Does anybody tell me which parameters should be modify for this kind of mapping ?
    I used
    bowtie -p 4 --best --strata -m 1 --sam /index_hg19 -q 80_mer_read.fastq output.sam

    I may forget some important parameters to be changed.


  • #2
    80bp are not especially long, i would say.
    which instrument was used to generate the data?
    did you run a quality check, e.g. FastQC?


    • #3
      Thanks volks,
      This is an Illumina GAIIx data.
      I just want to know if i need to add or change some parameters based on the read length.
      Do I use the same command-line for 80 bp reads mapping as I used to use for 35 bp reads? I feel i need to change some parameters based on the read length.
      May be I'm wrong.


      • #4
        Hi kenosaki,

        No need to adjust bowtie parameters for read length.



        • #5
          I've found a big improvement from trimming the reads, for aligning some of the longer Illumina reads. Here's a function in zsh that I've used for clipping everything from the right end of a read below a certain read quality.


          # ord.awk --- do ord and chr
          # taken from the gawk texinfo manual
          # therefore, this may be covered by the GNU Free Documentation License
          # the GFDL still allows commercial redistribution, however

          # Global identifiers:
          # _ord_: numerical values indexed by characters
          # _ord_init: function to initialize _ord_
          BEGIN { _ord_init() }

          function _ord_init( low, high, i, t)
          low = sprintf("%c", 7) # BEL is ascii 7
          if (low == "\a") { # regular ascii
          low = 0
          high = 127
          } else if (sprintf("%c", 128 + 7) == "\a") {
          # ascii, mark parity
          low = 128
          high = 255
          } else { # ebcdic(!)
          low = 0
          high = 255

          for (i = low; i <= high; i++) {
          t = sprintf("%c", i)
          _ord_[t] = i

          function ord(str, c)
          # only first character is of interest
          c = substr(str, 1, 1)
          return _ord_[c]

          function chr(c)
          # force c to be numeric by adding 0
          return sprintf("%c", c + 0)

          #### test code ####
          # BEGIN \
          # {
          # for (; {
          # printf("enter a character: ")
          # if (getline var <= 0)
          # break
          # printf("ord(%s) = %d\n", var, ord(var))
          # }
          # }


          #! /usr/bin/zsh

          # the 'raw' version of this this doesn't subtract 33 from the (raw)
          # qualities
          # also, it only trims the back end part of the read (not the front)
          # cuts off everything from the end less than $1
          function trimReadsRaw() {
          awk -f ord.awk \
          --source '{name=$0; getline; read=$0;
          getline; strand=$0; getline; qual=$0; len=length(qual); start=len;
          start=1; minEnd=start+20; end=0;
          for (i=len; i>=minEnd; i--) {
          if (ord(substr(qual,i,1)) >= '$thresh') { end=i; break; }
          if ( (end-start) < 20 ) { next; }
          print name; print substr(read,start,end-start+1); print strand;
          print substr(qual,start,end-start+1);
          }' --



          • #6
            If you have genomic data I would use another aligner because bowtie can't deal with indels. Bwa is good for example, as it novoalign,

            For transcriptome data you could try adjusting the following settings.

            -n/--seedmms <int> max mismatches in seed (can be 0-3, default: -n 2)
            -e/--maqerr <int> max sum of mismatch quals across alignment for -n (def: 70)
            -l/--seedlen <int> seed length for -n (default: 28)

            eg -n 3 -e 100 -l 40


            Latest Articles


            • seqadmin
              Advanced Tools Transforming the Field of Cytogenomics
              by seqadmin

              At the intersection of cytogenetics and genomics lies the exciting field of cytogenomics. It focuses on studying chromosomes at a molecular scale, involving techniques that analyze either the whole genome or particular DNA sequences to examine variations in structure and behavior at the chromosomal or subchromosomal level. By integrating cytogenetic techniques with genomic analysis, researchers can effectively investigate chromosomal abnormalities related to diseases, particularly...
              Yesterday, 06:26 AM
            • seqadmin
              How RNA-Seq is Transforming Cancer Studies
              by seqadmin

              Cancer research has been transformed through numerous molecular techniques, with RNA sequencing (RNA-seq) playing a crucial role in understanding the complexity of the disease. Maša Ivin, Ph.D., Scientific Writer at Lexogen, and Yvonne Goepel Ph.D., Product Manager at Lexogen, remarked that “The high-throughput nature of RNA-seq allows for rapid profiling and deep exploration of the transcriptome.” They emphasized its indispensable role in cancer research, aiding in biomarker...
              09-07-2023, 11:15 PM
            • seqadmin
              Methods for Investigating the Transcriptome
              by seqadmin

              Ribonucleic acid (RNA) represents a range of diverse molecules that play a crucial role in many cellular processes. From serving as a protein template to regulating genes, the complex processes involving RNA make it a focal point of study for many scientists. This article will spotlight various methods scientists have developed to investigate different RNA subtypes and the broader transcriptome.

              Whole Transcriptome RNA-seq
              Whole transcriptome sequencing...
              08-31-2023, 11:07 AM





            Topics Statistics Last Post
            Started by seqadmin, Today, 06:57 AM
            0 responses
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 07:53 AM
            0 responses
            Last Post seqadmin  
            Started by seqadmin, 09-25-2023, 07:42 AM
            0 responses
            Last Post seqadmin  
            Started by seqadmin, 09-22-2023, 09:05 AM
            0 responses
            Last Post seqadmin