Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bowtie2: a specific use where the parameters are not respected

    Hello everyone,
    I'm writing this post because I observed a strange behavior with bowtie2. And I suspect this behavior to affect mostly alignments on references with short sequence.

    I'm using Bowtie2 in order to map short reads (between 15 and 28 nt) against mature mirna reference sequences (ranging from 17 to 28nt in length).
    I don't want any mismatch to occur in an alignment, and luckily the maximum seed length in bowtie2 is 28 nt, so I can forbid any mismatch in my alignment using the parameters -L 28 (the seed length), -N 0 (the mismatch number within the seed), and --no-1mm-upfront (to forbid any 1mm alignment attempt before trying the multiseed heuristic).

    Here is the command I'm using:
    Code:
    bowtie2 --end-to-end -a -D 20 -R 3 -N 0 --no-1mm-upfront -L 28 -i S,1,0.5 --norc -x mirbase_hsa -q my_trimmed_reads.fq -S out.sam
    99.9% of aligned reads don't have any mismatchs. But for a few hundreds of them, i can observe mismatchs and insertions! These incriminated reads are 22 to 24 nt long, so the seed should cover the whole sequence and no mismatch should be accepted. Here is an IGV screenshot of such reads mapping with an insertion and a mismatch on a 21 long reference sequence:



    Here is the fastq line for the first read with an insertion and a mismatch (all reads with that pattern look the same):
    Code:
    @NS500388:284:HMVTJBGXY:1:11103:12180:4814 1:N:0:GAGTGG
    TCACAGTGAACCGGTCTCTTTT
    +
    AAAAAEEEEEEEEEEEEEEEEE
    And here is the fasta line for the reference sequence:
    Code:
    >hsa-miR-128-3p
    TCACAGTGAACCGGTCTCTTT
    As a keen observer would note, the reference sequence match the read, except for 1 'T' missing on the 3' end. So bowtie2 should not allow this alignment, as we are in end-to-end mode.
    But it would appear that bowtie2 employs a malicious strategy in order to respect the end-to-end rule, by creating one deletion and a mismatch, thus disrespecting the NO MISMATCH rule.

    In local mode, this read would have been accepted, with a 1 nt long soft-clip on the right side.

    I observed this behavior with other references, and it's always the same pattern: an insertion and a mismatch are created to allow a long read on a shorter reference sequence, even if I specified 0 mismatch allowed in the parameter.

    So I'm wondering: Why is this happening? I'm I missing something?
    Any help is very welcome!

    PS: my bowtie2 version is 2.2.4


    ******** UPDATE ********

    I kept on with my investigation, and I realized the weird behavior I described can't be observed if the reference file contains only the sequence mentioned previously (hsa-miR-128-3p). If you run Bowtie2 with this sequence only, everything works fine, meaning the read is not mapped (as expected).
    BUT, if you add just one other reference sequence, and this sequence must start with the letter 'T', then Bowtie2 map the read with an insertion and a mismatch.
    Then I tried with another reference starting with an 'A', and in this case the read is not mapped. Which is very puzzling.

    You can try this at home with the following reference to print in a file named mirbase_hsa.fa:
    Code:
    >hsa-miR-128-3p
    TCACAGTGAACCGGTCTCTTT
    >miR-test
    T
    And copy the following read to map (the same as before) in a file named my_trimmed_reads.fq:
    Code:
    @NS500388:284:HMVTJBGXY:1:11103:12180:4814 1:N:0:GAGTGG
    TCACAGTGAACCGGTCTCTTTT
    +
    AAAAAEEEEEEEEEEEEEEEEE
    Then you can run the following script:
    Code:
    bowtie2-build mirbase_hsa.fa mirbase_hsa
    bowtie2 --end-to-end -a -D 20 -R 3 -N 0 --no-1mm-upfront -L 28 -i S,1,0.5 --norc -x mirbase_hsa -q my_trimmed_reads.fq -S out.sam
    And you should see that the read is mapping when it should not. You can also try to change the following ref (miR-test) and make it start with an 'A' instead of a 'T' and you should see that the read is not mapping.

    I tried this with the latest version of Bowtie2 (2.3.1) and this behavior can still be observed.
    Last edited by FlorianT; 03-30-2017, 12:14 AM. Reason: Update with new informations

Latest Articles

Collapse

  • seqadmin
    The Impact of AI in Genomic Medicine
    by seqadmin



    Article Coming Soon......
    Today, 02:07 PM
  • seqadmin
    Multiomics Techniques Advancing Disease Research
    by seqadmin


    New and advanced multiomics tools and technologies have opened new avenues of research and markedly enhanced various disciplines such as disease research and precision medicine1. The practice of merging diverse data from various ‘omes increasingly provides a more holistic understanding of biological systems. As Maddison Masaeli, Co-Founder and CEO at Deepcell, aptly noted, “You can't explain biology in its complex form with one modality.”

    A major leap in the field has
    ...
    02-08-2024, 06:33 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 02-23-2024, 04:11 PM
0 responses
33 views
0 likes
Last Post seqadmin  
Started by seqadmin, 02-21-2024, 08:52 AM
0 responses
46 views
0 likes
Last Post seqadmin  
Started by seqadmin, 02-20-2024, 08:57 AM
0 responses
37 views
0 likes
Last Post seqadmin  
Started by seqadmin, 02-14-2024, 09:19 AM
0 responses
63 views
0 likes
Last Post seqadmin  
Working...
X