Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • 2 questions on Bowtie

    Hellow fellas

    I rencently started working on using crossbow to process mouse genome sequence. I am from computer science background and I know quite little about biology

    After some experiments and reading through the manual of bowtie, I still didn't get:
    1. The command for bowtie is like:
    "./bowtie index/index -p 2 --partition $partition_size -X $insert_size -m 1 -v 2 --best --strata --mm --12 --startverbose --mmsweep\"
    It is using -m 1 in conjunction with "-best --strata". My understanding is when there's only 1 reportable alignment in the best alignment "stratum", this alignment will be reported.
    Bowtie discards alignments when accidentally 2 or more meet the same "best" criteria. is this right?

    2. In the output of bowtie, the last second column is called "hit". From the manual:
    "this column contains the number of other instances where
    the same sequence aligned against the same reference characters as
    were aligned against in the reported alignment. This is *not* the
    number of other places the read aligns with the same number of
    mismatches."
    I could not understand this statement. Can any biologist here give me some example of what this means?

    Really appreciate if you can help me out here.
    Cheers

  • #2
    Regarding 1.

    Yes thats right -m1 --best --strata will not report any reads that map (within the best stratum of alignment) to more than one location.

    I am not sure where you got the --partition, --startverbose and --mmsweep parameters from? Maybe you are using some older version of bowtie? The new version certainly doesn't have them.

    Regarding 2., this is how I would interpret it. Others can correct me if I am wrong.

    Lets say read AGC was reported as mapping to a location whose sequence is ACC.

    The parameters allowed 1 mismatch.

    The "hit" column will then report the number of other locations with the exact sequence ACC to which the read (AGC) can map to but wasn't reported.

    However, this is different from the number of all other locations to which the read (AGC) could map allowing the 1 mismatch. AGC could also map to locations whose sequence is AGG. However this number is not included in the "hit" column since the reported alignment is to ACC.

    Comment


    • #3
      Thank you very much for reply akundaje!

      I am not sure where you got the --partition, --startverbose and --mmsweep parameters from? Maybe you are using some older version of bowtie? The new version certainly doesn't have them.
      I was working on crossbow (version 0.1.3). And the " --partition, --startverbose and --mmsweep " options are from the script "crossbow.pl" which drives crossbow running on your local hadoop cluster (in /local folder).

      Regarding 2., this is how I would interpret it. Others can correct me if I am wrong.

      Lets say read AGC was reported as mapping to a location whose sequence is ACC.

      The parameters allowed 1 mismatch.

      The "hit" column will then report the number of other locations with the exact sequence ACC to which the read (AGC) can map to but wasn't reported.

      However, this is different from the number of all other locations to which the read (AGC) could map allowing the 1 mismatch. AGC could also map to locations whose sequence is AGG. However this number is not included in the "hit" column since the reported alignment is to ACC.
      So if I have some exact reference sequence "AGC", say 5, then the "hit" will give me 4 in all these 5 alignment records, if they are reportable. And if there's another sequence "GGC", say also 5 of them, "hit" also give's 4 in all those 5 aligment. Am I right?

      Elton

      Comment


      • #4
        I did not exactly understand your example. Just to reiterate

        Your read sequence is AGC.

        The line in the bowtie output for this read says that it maps a location (say chr1,20) at which the sequence is ACC.

        Now the same sequence ACC (note: this is the sequence TO WHICH the read was mapped to and may not be the same as the read sequence) exists in say 10 other locations in the genome (i.e. a total of 11 locations that have sequence ACC).

        Then for this line of bowtie output, the "hit" column should have a value of 10.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-25-2024, 11:49 AM
        0 responses
        19 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-24-2024, 08:47 AM
        0 responses
        19 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        62 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        60 views
        0 likes
        Last Post seqadmin  
        Working...
        X