Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • biznatch
    Senior Member
    • Nov 2010
    • 124

    Extending aligned sequences, plus/minus strand

    I'm getting confused trying to sort out how sequence reads (50bp) relate to actual chip-seq fragments (~250bp), specifically in regards to plus/minus strand.
    I'm aligning a fastq file of ~20 million single end reads using Bowtie, then I want to find peaks using SISSRS. The first 5 columns of Bowtie output are described as:

    1. Name of read that aligned
    2. Reference strand aligned to, + for forward strand, - for reverse
    3. Name of reference sequence where alignment occurs, or numeric ID if no name was provided
    4. 0-based offset into the forward reference strand where leftmost character of the alignment occurs
    5. Read sequence (reverse-complemented if orientation is -)

    If I plot the Bowtie aligned reads I'd expect something like this, at any given peak:


    Image link: http://i.imgur.com/pnrFW.png

    This is how I believe SISSRS expects things to look, based on their paper
    (http://nar.oxfordjournals.org/conten...1/F1.large.jpg)

    But I'm getting this (+ and - aligned reads more or less overlap):


    Image link: http://i.imgur.com/eXUcU.png

    Here's an example:


    Image link: http://i.imgur.com/vyxJ8.png

    Did something go wrong with the ChIP-seq? (This isn't my data, it's from published data).
    Given an alignment as in Fig.1, I would extend the reads to simulate the actual sequences like this:


    Image link: http://i.imgur.com/OONAW.png

    SISSRS would estimate the sequence length based on how far apart the + and - clusters are, and the peak would be found in the middle of the + and - clusters.
    Since my data is as in Fig.2, I think SISSRS is underestimating the the sequences sizes, and placing the peaks slightly off from where they should be.

    How should I extend the reads to more accurately visualize the sequences, and do I need to modify the data before submitting to SISSRS?
    Maybe like this?


    Image link: http://i.imgur.com/Hbpk1.png

    Or like this?


    Image link: http://i.imgur.com/CFhv8.png

    This is my first time analyzing chip-seq data, thanks for the help!
    Last edited by biznatch; 01-21-2011, 08:42 AM. Reason: Added links below each image in case they're not showing up.
  • simonandrews
    Simon Andrews
    • May 2009
    • 870

    #2
    There seems to have been a problem with the images you put on your post (at least I can't see them).

    Generally though the picture you showed from the NAR paper is a somewhat extreme example of what we've seen in our ChIP experiments. Peaks we've seen do show a positional variation between forward and reverse strands, but not a separation. The bulk of the peak will overlap. I suppose the amount of overlap will depend on the insert size in your library, and the length of sequences you generate.

    I've actually moved to now extending my single end ChIP data to the average insert length of the library in order to merge the forward and reverse peaks to get a single peak which more closely approximates what was extracted from the library. I'm not sure this offers any more sensitivity over a strand-aware peak detection, but it's certainly proved to be easier to visually interpret the results this way.

    Comment

    • biznatch
      Senior Member
      • Nov 2010
      • 124

      #3
      I added links below each image to the .png hopefully those will work.

      Originally posted by simonandrews View Post
      I've actually moved to now extending my single end ChIP data to the average insert length of the library in order to merge the forward and reverse peaks to get a single peak which more closely approximates what was extracted from the library.
      Which direction do you extend it? Do forward and reverse extend in different directions, extend out from the middle of each read regardless of strand?

      Comment

      • Chipper
        Senior Member
        • Mar 2008
        • 323

        #4
        Look at the start points only, in your example on chr19 almost all reverse read starts to the right of the center and most of the forward reads starts to the left which is what you would expect. SISSRs will use only start positions, and look for changes in strand preference within a window. Most likely it will report (at least) two sites in your example.

        Normally most ChIP reads corresponds to the ends of fragments so you extend it directionally from the start point to the estimated size from the library to calculate overlaps and visualize in wiggle files.

        Comment

        Latest Articles

        Collapse

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by SEQadmin2, Today, 06:09 AM
        0 responses
        11 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-09-2026, 11:58 AM
        0 responses
        33 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-05-2026, 10:09 AM
        0 responses
        38 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-04-2026, 08:59 AM
        0 responses
        43 views
        0 reactions
        Last Post SEQadmin2  
        Working...