Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • reventropy
    Junior Member
    • Apr 2014
    • 7

    High discordant alignments

    I've set up a galaxy workflow for paired end first stranded RNAseq, and I've gotten some odd summary results from Tophat2 alignment. At least I think they're odd as I'm new to this.

    Left reads:
    Input : 218685181
    Mapped : 193500858 (88.5% of input)
    of these: 14727362 ( 7.6%) have multiple alignments (40016 have >20)
    Right reads:
    Input : 218685181
    Mapped : 196263585 (89.7% of input)
    of these: 14724480 ( 7.5%) have multiple alignments (40380 have >20)
    Unpaired reads:
    Input : 5950944
    Mapped : 5300035 (89.1% of input)
    of these: 227937 ( 4.3%) have multiple alignments (142 have >20)
    89.1% overall read mapping rate.

    Aligned pairs: 173668750
    of these: 13863688 ( 8.0%) have multiple alignments
    170432898 (98.1%) are discordant alignments
    1.5% concordant pair alignment rate.
    Here's the flagstat output


    490744296 + 0 in total (QC-passed reads + QC-failed reads)
    0 + 0 duplicates
    490744296 + 0 mapped (100.00%:-nan%)
    486148534 + 0 paired in sequencing
    241299292 + 0 read1
    244849242 + 0 read2
    523372 + 0 properly paired (0.11%:-nan%)
    443477134 + 0 with itself and mate mapped
    42671400 + 0 singletons (8.78%:-nan%)
    418612688 + 0 with mate mapped to a different chr
    312416516 + 0 with mate mapped to a different chr (mapQ>=5)
    For the number of reads mapped, the concordant pairs seem extremely low. I'm wondering if I missed a parameter in Tophat or Bowtie? Notably, I have not set a read group identifier in Bowtie (necessary?), nor could I figure out how from the Bowtie documentation. I also wonder if something could be awry with my fastq files, as they have been concatenated from a split dataset. Here are the first couple reads from the foreward and reverse data respectively.

    @HW-ST997:217:C3KKGACXX:4:1101:1432:2038 1:N:0:TGACCA
    TTCATCTTTAGATAATGAATTATATCCAAGATCAGACTGGCCACCTGTACTAGATCTATCATCAGTAGCATATACTTTGATTAAACCCG
    +
    FF00B<<FFFFFFBBFFFBFIFBBF0BBFFFFBFFFFIF<FFF<FBFF7BBBB<<B<''<B<BBB<<BBBBBFFFBBF<<B<7B7<BBB
    @HW-ST997:217:C3KKGACXX:4:1101:1474:2051 1:N:0:TGACCA
    GAGGGAGTATAGGGCTGTGACTAGTATGTTGAGTCCTGTAAGTAGGAGAGTGATATTTGATCAGGAGAACGTGGTTACTAGCACAGAGA
    +
    FIFIIBFBBFFFIIFFFFFFFFFFFBFFIIIFFFIIIFFFFFFFFFBF<BBBBF0BFFFBFFBFFFFFFFBFBFBFB<BBBBBBBBBFB
    @HW-ST997:217:C3KKGACXX:4:1101:1451:2106 1:N:0:TGACCA
    ACTGGGAAACGTTCACGCTGGGTCCAGCATTTGCCATGGACAAGATGCCAGGACCCGTATGCTTCAGGATGAAGTTCTTGTCATCAAAT
    +
    FIIFFBBFFFFFFBB7<7BBFFF77BBFFIFFFIFBFFFIFFIIF<B<0<BB7BBBBB<BBBBBBBB0BBBB0<7<BBBB0'0B<B<BB




    @HW-ST997:217:C3KKGACXX:4:1101:1452:2018 2:N:0:TGACCA
    TTACCCCCATACTCCTTACACTATTCCTCATCANCCNACTAAAAATATTAAACACAAACTACCACCTACCTCCCTCACCAAAGCCCATA
    +
    FFFFFFFF7FFFIIIIIFFFFFFFIIFFFFFFB#0B#07<FFFIFFFFIFBFFIFFFFFFFFBFF<BB<BFFFFB<BBBBBFBFFB<BB
    @HW-ST997:217:C3KKGACXX:4:1101:1474:2051 2:N:0:TGACCA
    AGTCATTCTCATAATCGCCCACGGGCTTACATCNTCNTTACTATTCTGCCTAGCAAACTCAAACTACGAACGCACTCACAGTCGCATCA
    +
    FFFIIFFFIIFIIFFBFBFFFIIIIFFFIFFFF#0<#07<BBFFFBBFBFFBBFFFFFBFFFFFFFFFFFFFBBBBFFBFFBBBFBBFB
    @HW-ST997:217:C3KKGACXX:4:1101:1409:2234 2:N:0:TGACCA
    ATCTCAGAAAAGAAGACATGGAATATGCCCTGNNTANACTGGATGACACCAAATTCCGCTCTCATGAGGGTGAAACTTCCTACATCCGA
    +
    <BFFFIFFIIIBBFFBFBBFFFFF7FFFFFII##07#07BFFBFFBFFFIFFFBF7BBFFBBBBBBB<BB0<B<'7<BBBBBBBBBBB<
    Thanks in advance for any help!

    -Jeremy
  • yueluo
    Member
    • Aug 2013
    • 82

    #2
    What options did you use when running tophat/bowtie ?
    Since you use stranded-data, you might want to check the '--library-type' option.

    Comment

    • reventropy
      Junior Member
      • Apr 2014
      • 7

      #3
      Thanks for the response yueluo. I ran it through a galaxy wrapper but I selected the first-strand option, so the wrapper should be passing the command onto Bowtie. I just spoke with a colleague who informed me that my paired end reads appear to be out of order.

      For instance:

      Read1-foreward:
      1101:1432:2038 1:N:0:TGACCA
      Read1-Reverse
      1101:1452:2018 2:N:0:TGACCA

      This may have happened when I concatenated the files, or it might just be how I received the sequencing data. Do you have any ideas about how I can re-sort by coordinates?

      Comment

      • Brian Bushnell
        Super Moderator
        • Jan 2014
        • 2709

        #4
        I suggest you go back to the raw files, and map them without modifying them in any way. If you want to merge multiple datasets, you can do that after you have the sam/bam files.

        Comment

        • reventropy
          Junior Member
          • Apr 2014
          • 7

          #5
          I suggest you go back to the raw files, and map them without modifying them in any way. If you want to merge multiple datasets, you can do that after you have the sam/bam files.
          After looking into this some more, I'm not sure there is a way to feed multiple files into the galaxy Tophat2 wrapper. Fortunately it looks like they have tool specifically for combining paired end read files (which I swear I looked for before ). We'll see if this works. As a backup, we'll run another instance of Tophat2 via command line arguments.

          You suggest not modifying them in any way. Does this include trimming/clipping and other QC measures? I am worried about this as it seems that if a read has enough low scoring bases, then it might be cut from say the forward file but not the reverse, leading again to misalignment.

          Comment

          • Brian Bushnell
            Super Moderator
            • Jan 2014
            • 2709

            #6
            Originally posted by reventropy View Post
            You suggest not modifying them in any way. Does this include trimming/clipping and other QC measures? I am worried about this as it seems that if a read has enough low scoring bases, then it might be cut from say the forward file but not the reverse, leading again to misalignment.
            That's exactly why I made the suggestion; there are a lot of poorly-written tools that break read pairing, and that's usually the culprit.

            If you need to do quality or adapter trimming, I can suggest BBDuk, which is made to handle single or paired files, keeping reads together. It's extremely fast and uses a better quality-trimming algorithm than most alternatives, as well as being more sensitive in adapter-trimming (you can specify the number of mismatches allowed). You can also use it for contaminant removel (phiX, e.coli, various spike-ins or vectors).

            Comment

            Latest Articles

            Collapse

            • SEQadmin2
              Nine Things a Sample Prep Scientist Thinks About Before Sequencing
              by SEQadmin2


              I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


              Here are nine questions we think about, in roughly the order they matter, before...
              06-18-2026, 07:11 AM
            • SEQadmin2
              From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
              by SEQadmin2


              Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


              The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
              ...
              06-02-2026, 10:05 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by SEQadmin2, 06-17-2026, 06:09 AM
            0 responses
            26 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-09-2026, 11:58 AM
            0 responses
            43 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-05-2026, 10:09 AM
            0 responses
            48 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-04-2026, 08:59 AM
            0 responses
            49 views
            0 reactions
            Last Post SEQadmin2  
            Working...