Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • qiudao
    Member
    • May 2008
    • 23

    why reads piled up at repeat region?

    Hi, Sorry it seems a silly question. But I really want to know the reasons why after aligning, you can find a lot of reads are piling up at the repeat region of the genome. and sometime people will filter them out.
    so why does reads exhibit such a pattern around repeat region? do we really need to filter them out? thanks

    Q
  • clivey
    Member
    • Jul 2008
    • 24

    #2
    depends what they are.

    If they are polyA rich - they are probably a result of image artefacts (ie not derived from templates) - often caused by flowcell edges on newer systems. Other imaging problems can also give you spurious reads with low complexity sequences. As repeats are also low-complexity these spurious reads tend to cluster there and often with a few 'differences'....... you need to filter them out. Unfortunately as they are image artefacts they tend to have good base scores, so you have to use rules based on their sequence composition and/or their positions in the tiles. Some people have written tile edgae and 'bad region' detection methods and these can be used to exclude reads that fall within them. Im afraid images still tell you a lot about your 'end data' and how well set up your system is.

    Comment

    • qiudao
      Member
      • May 2008
      • 23

      #3
      clivey,
      Thank you for your answer. I have checked the reads. They, however, are not poly A reached. Could the reads at the repeat regions are real signal? How could we determine if such a pattern is pure technology artifact? Thanks.

      Comment

      • clivey
        Member
        • Jul 2008
        • 24

        #4
        interesting,

        well - look at the X and Y corordinates of the reads , are they spatially correlated ? when you look at the images is there anything about them that wrong ? have you managed to create primer-dimers, or other strange contructs that would be over-represented ? if your library prep unbiased in terms of sampling statistics and complexity ? impossible to say without seeing data - and I can only point you to known artiefacts.

        Comment

        • Chipper
          Senior Member
          • Mar 2008
          • 323

          #5
          Satellite repeats? They tend to always give signals in ChIP-seq, but I thought it was because of gaps in the alignment, i. e. sequences will pile up (with more mismatches) if copies of the repeats are present in the genome but not in the aligned reference. This often occurs in repeats that have a high SNP density.

          Comment

          • H4lcyon
            Junior Member
            • Oct 2008
            • 1

            #6
            Originally posted by qiudao View Post
            clivey,
            Thank you for your answer. I have checked the reads. They, however, are not poly A reached. Could the reads at the repeat regions are real signal? How could we determine if such a pattern is pure technology artifact? Thanks.
            Hi, some repeats are actually responsible for binding sites. You can read this following paper regarding regarding that observation. Bourque et al. (2008) Evolution of the mammalian transcription factor binding repertoire via transposable elements. Genome Res. 2008 Oct 3.

            Maybe you're finding new examples of it.
            From bench side to cubicle...

            Comment

            • ShaunMahony
              Member
              • Apr 2008
              • 27

              #7
              It is possible that some of these signals correspond to true instances of the TF binding to repeats. However, everyone working with ChIP-seq data has seen the larger problem; large areas of the genome where many thousands of reads stack up on top of each other. These regions are typically associated with repeat elements such as LINEs and SINEs. They show up in the same genomic location no matter what TF you ChIP, and they also show up in controls (like WCE), suggesting that they are not IP signals. Tim Danford in our lab calls these artifacts "towers".

              We think that towers are related to repeat copy numbers. Imagine that there is a type of satellite repeat (e.g. some type of SINE) that is present in only one copy in the reference genome. Of course, when you do the experiment, you are not sampling from the reference genome, you are sampling from the genome of your cells. What if that same SINE is present 1000 times in your cell's genome? When you sequence, you may randomly pick up background signal along each of those thousand copies (no antibody is perfect). When you map these tags back to the reference genome, they all go to only one place, since the reference has only one copy of the repeat. Therefore, you see those towers in every experiment that you do on those cells.

              The answer to your question about "do we need to throw these things out" is no, but you do need some sort of control run (WCE, or something else) for each different genotype that you work with.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Pathogen Surveillance with Advanced Genomic Tools
                by seqadmin




                The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                03-24-2025, 11:48 AM
              • seqadmin
                New Genomics Tools and Methods Shared at AGBT 2025
                by seqadmin


                This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                The Headliner
                The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                03-03-2025, 01:39 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 03-20-2025, 05:03 AM
              0 responses
              49 views
              0 reactions
              Last Post seqadmin  
              Started by seqadmin, 03-19-2025, 07:27 AM
              0 responses
              57 views
              0 reactions
              Last Post seqadmin  
              Started by seqadmin, 03-18-2025, 12:50 PM
              0 responses
              49 views
              0 reactions
              Last Post seqadmin  
              Started by seqadmin, 03-03-2025, 01:15 PM
              0 responses
              200 views
              0 reactions
              Last Post seqadmin  
              Working...