Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extending aligned sequences, plus/minus strand

    I'm getting confused trying to sort out how sequence reads (50bp) relate to actual chip-seq fragments (~250bp), specifically in regards to plus/minus strand.
    I'm aligning a fastq file of ~20 million single end reads using Bowtie, then I want to find peaks using SISSRS. The first 5 columns of Bowtie output are described as:

    1. Name of read that aligned
    2. Reference strand aligned to, + for forward strand, - for reverse
    3. Name of reference sequence where alignment occurs, or numeric ID if no name was provided
    4. 0-based offset into the forward reference strand where leftmost character of the alignment occurs
    5. Read sequence (reverse-complemented if orientation is -)

    If I plot the Bowtie aligned reads I'd expect something like this, at any given peak:


    Image link: http://i.imgur.com/pnrFW.png

    This is how I believe SISSRS expects things to look, based on their paper
    (http://nar.oxfordjournals.org/conten...1/F1.large.jpg)

    But I'm getting this (+ and - aligned reads more or less overlap):


    Image link: http://i.imgur.com/eXUcU.png

    Here's an example:


    Image link: http://i.imgur.com/vyxJ8.png

    Did something go wrong with the ChIP-seq? (This isn't my data, it's from published data).
    Given an alignment as in Fig.1, I would extend the reads to simulate the actual sequences like this:


    Image link: http://i.imgur.com/OONAW.png

    SISSRS would estimate the sequence length based on how far apart the + and - clusters are, and the peak would be found in the middle of the + and - clusters.
    Since my data is as in Fig.2, I think SISSRS is underestimating the the sequences sizes, and placing the peaks slightly off from where they should be.

    How should I extend the reads to more accurately visualize the sequences, and do I need to modify the data before submitting to SISSRS?
    Maybe like this?


    Image link: http://i.imgur.com/Hbpk1.png

    Or like this?


    Image link: http://i.imgur.com/CFhv8.png

    This is my first time analyzing chip-seq data, thanks for the help!
    Last edited by biznatch; 01-21-2011, 08:42 AM. Reason: Added links below each image in case they're not showing up.

  • #2
    There seems to have been a problem with the images you put on your post (at least I can't see them).

    Generally though the picture you showed from the NAR paper is a somewhat extreme example of what we've seen in our ChIP experiments. Peaks we've seen do show a positional variation between forward and reverse strands, but not a separation. The bulk of the peak will overlap. I suppose the amount of overlap will depend on the insert size in your library, and the length of sequences you generate.

    I've actually moved to now extending my single end ChIP data to the average insert length of the library in order to merge the forward and reverse peaks to get a single peak which more closely approximates what was extracted from the library. I'm not sure this offers any more sensitivity over a strand-aware peak detection, but it's certainly proved to be easier to visually interpret the results this way.

    Comment


    • #3
      I added links below each image to the .png hopefully those will work.

      Originally posted by simonandrews View Post
      I've actually moved to now extending my single end ChIP data to the average insert length of the library in order to merge the forward and reverse peaks to get a single peak which more closely approximates what was extracted from the library.
      Which direction do you extend it? Do forward and reverse extend in different directions, extend out from the middle of each read regardless of strand?

      Comment


      • #4
        Look at the start points only, in your example on chr19 almost all reverse read starts to the right of the center and most of the forward reads starts to the left which is what you would expect. SISSRs will use only start positions, and look for changes in strand preference within a window. Most likely it will report (at least) two sites in your example.

        Normally most ChIP reads corresponds to the ends of fragments so you extend it directionally from the start point to the estimated size from the library to calculate overlaps and visualize in wiggle files.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Recent Advances in Sequencing Analysis Tools
          by seqadmin


          The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
          05-06-2024, 07:48 AM
        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 05-10-2024, 06:35 AM
        0 responses
        19 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-09-2024, 02:46 PM
        0 responses
        23 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-07-2024, 06:57 AM
        0 responses
        21 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-06-2024, 07:17 AM
        0 responses
        21 views
        0 likes
        Last Post seqadmin  
        Working...
        X