Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bowtie: mapped read not in paired alignment

    Hi,
    I have a problem with bowtie and paired end reads:
    A read is being reported as unmapped because its mate can not be mapped.
    What am I doing wrong?

    I have a test genome:
    Code:
    >chr1
    AAGCTGGGATCGATGCCACGTAGACGTAGTTTAGTGAAACTGATGATTCCCTGATAGTAGCTGACTGATGGGGCAGTAAAAGTCCCCAGTCAGTTGACCTGAC
    and two test paired read files with reads taken from genome (m1a,m2a,m1b) and mutated/not matching m2b:
    read1:
    Code:
    >m1a
    TGGGATCGATGCCACGTAGACG
    >m2a
    TGGGATCGATGCCACGTAGACG
    read2:
    Code:
    >m1b
    GGGCAGTAAAAGTCCCCAGTCAGTT
    >m2b
    GGGCAGTTTTTGTGGGGAGTCAGTT
    When I align each file separately I get expected results (3 matching, one not):
    Code:
    @HD     VN:1.0  SO:unsorted
    @SQ     SN:chr1 LN:103
    @PG     ID:Bowtie       VN:0.12.5       CL:"bowtie/bowtie-0.12.5/bowtie -f -S genome r1.fasta"
    m1a     0       chr1    5       255     22M     *       0       0       TGGGATCGATGCCACGTAGACG  IIIIIIIIIIIIIIIIIIIIII  XA:i:0  MD:Z:22 NM:i:0
    m2a     0       chr1    5       255     22M     *       0       0       TGGGATCGATGCCACGTAGACG  IIIIIIIIIIIIIIIIIIIIII  XA:i:0  MD:Z:22 NM:i:0
    Code:
    @HD     VN:1.0  SO:unsorted
    @SQ     SN:chr1 LN:103
    @PG     ID:Bowtie       VN:0.12.5       CL:"bowtie/bowtie-0.12.5/bowtie -f -S genome r2.fasta"
    m1b     0       chr1    71      255     25M     *       0       0       GGGCAGTAAAAGTCCCCAGTCAGTT       IIIIIIIIIIIIIIIIIIIIIIIII       XA:i:0  MD:Z:25 NM:i:0
    m2b     4       *       0       0       *       *       0       0       GGGCAGTTTTTGTGGGGAGTCAGTT       IIIIIIIIIIIIIIIIIIIIIIIII       XM:i:0
    When I align them as paired reads, suddenly m2a does not match anymore. I would have excpected it
    to be reported as m2a mapped but mate (m2b) unmapped:
    Code:
    @HD     VN:1.0  SO:unsorted
    @SQ     SN:chr1 LN:103
    @PG     ID:Bowtie       VN:0.12.5       CL:"bowtie/bowtie-0.12.5/bowtie -f -S -m1 --best --strata --ff genome -1 r1.fasta -2 r2.fasta"
    m1a     67      chr1    5       255     22M     =       71      91      TGGGATCGATGCCACGTAGACG  IIIIIIIIIIIIIIIIIIIIII  XA:i:0  MD:Z:22 NM:i:0
    m1b     131     chr1    71      255     25M     =       5       -91     GGGCAGTAAAAGTCCCCAGTCAGTT       IIIIIIIIIIIIIIIIIIIIIIIII       XA:i:0  MD:Z:25 NM:i:0
    m2a     77      *       0       0       *       *       0       0       TGGGATCGATGCCACGTAGACG  IIIIIIIIIIIIIIIIIIIIII  XM:i:0
    m2b     141     *       0       0       *       *       0       0       GGGCAGTTTTTGTGGGGAGTCAGTT       IIIIIIIIIIIIIIIIIIIIIIIII       XM:i:0

  • #2
    Hi,

    Bowtie's paired-end mode is only for finding paired-end alignments. If you're after both paired and unpaired alignments for the same set of reads, you can run Bowtie in paired-end mode with --un <file>, then run bowtie in unpaired mode with <file> as the input.

    Thanks,
    Ben

    Comment


    • #3
      But the SAM format is complex enough to capture this information. Would you consider implementing this if one files a bug report?
      To align the same input three times (paired, read1, read2) and then combine the outputs seems like a detour to me.

      thank you very much for the quick answer,
      ido

      Comment


      • #4
        I bet it's already in there as a feature request, but feel free to file one if it's not. Please don't file as a bug.

        Thanks,
        Ben

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 11:49 AM
        0 responses
        13 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-24-2024, 08:47 AM
        0 responses
        16 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        61 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        60 views
        0 likes
        Last Post seqadmin  
        Working...
        X