Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bowtie 80 mer-long mapping

    Hi,

    I try to map relatively long reads (80-bases-long single read, fastq format) to hg19 using bowtie.
    Does anybody tell me which parameters should be modify for this kind of mapping ?
    I used
    bowtie -p 4 --best --strata -m 1 --sam /index_hg19 -q 80_mer_read.fastq output.sam

    I may forget some important parameters to be changed.

    Thanks,

  • #2
    80bp are not especially long, i would say.
    which instrument was used to generate the data?
    did you run a quality check, e.g. FastQC?

    Comment


    • #3
      Thanks volks,
      This is an Illumina GAIIx data.
      I just want to know if i need to add or change some parameters based on the read length.
      Do I use the same command-line for 80 bp reads mapping as I used to use for 35 bp reads? I feel i need to change some parameters based on the read length.
      May be I'm wrong.

      Comment


      • #4
        Hi kenosaki,

        No need to adjust bowtie parameters for read length.

        Douglas

        Comment


        • #5
          I've found a big improvement from trimming the reads, for aligning some of the longer Illumina reads. Here's a function in zsh that I've used for clipping everything from the right end of a read below a certain read quality.


          [ord.awk

          # ord.awk --- do ord and chr
          # taken from the gawk texinfo manual
          # therefore, this may be covered by the GNU Free Documentation License
          # the GFDL still allows commercial redistribution, however

          # Global identifiers:
          # _ord_: numerical values indexed by characters
          # _ord_init: function to initialize _ord_
          BEGIN { _ord_init() }

          function _ord_init( low, high, i, t)
          {
          low = sprintf("%c", 7) # BEL is ascii 7
          if (low == "\a") { # regular ascii
          low = 0
          high = 127
          } else if (sprintf("%c", 128 + 7) == "\a") {
          # ascii, mark parity
          low = 128
          high = 255
          } else { # ebcdic(!)
          low = 0
          high = 255
          }

          for (i = low; i <= high; i++) {
          t = sprintf("%c", i)
          _ord_[t] = i
          }
          }

          function ord(str, c)
          {
          # only first character is of interest
          c = substr(str, 1, 1)
          return _ord_[c]
          }

          function chr(c)
          {
          # force c to be numeric by adding 0
          return sprintf("%c", c + 0)
          }

          #### test code ####
          # BEGIN \
          # {
          # for (; {
          # printf("enter a character: ")
          # if (getline var <= 0)
          # break
          # printf("ord(%s) = %d\n", var, ord(var))
          # }
          # }

          ]//

          [trimReadsRaw
          #! /usr/bin/zsh

          # the 'raw' version of this this doesn't subtract 33 from the (raw)
          # qualities
          # also, it only trims the back end part of the read (not the front)
          # cuts off everything from the end less than $1
          function trimReadsRaw() {
          thresh=$1
          awk -f ord.awk \
          --source '{name=$0; getline; read=$0;
          getline; strand=$0; getline; qual=$0; len=length(qual); start=len;
          start=1; minEnd=start+20; end=0;
          for (i=len; i>=minEnd; i--) {
          if (ord(substr(qual,i,1)) >= '$thresh') { end=i; break; }
          }
          if ( (end-start) < 20 ) { next; }
          print name; print substr(read,start,end-start+1); print strand;
          print substr(qual,start,end-start+1);
          }' --
          }

          ]//trimReadsRaw

          Comment


          • #6
            If you have genomic data I would use another aligner because bowtie can't deal with indels. Bwa is good for example, as it novoalign,

            For transcriptome data you could try adjusting the following settings.

            -n/--seedmms <int> max mismatches in seed (can be 0-3, default: -n 2)
            -e/--maqerr <int> max sum of mismatch quals across alignment for -n (def: 70)
            -l/--seedlen <int> seed length for -n (default: 28)

            eg -n 3 -e 100 -l 40

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Exploring the Dynamics of the Tumor Microenvironment
              by seqadmin




              The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
              07-08-2024, 03:19 PM
            • seqadmin
              Exploring Human Diversity Through Large-Scale Omics
              by seqadmin


              In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
              06-25-2024, 06:43 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 07-10-2024, 07:30 AM
            0 responses
            24 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 07-03-2024, 09:45 AM
            0 responses
            201 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 07-03-2024, 08:54 AM
            0 responses
            210 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 07-02-2024, 03:00 PM
            0 responses
            192 views
            0 likes
            Last Post seqadmin  
            Working...
            X