Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • unique reads for downstream analysis

    I had a general query with short fixed length reads. Though I intend it for Solexa data only, it might be directly applicable to solid as well

    For analysis after obtaining the set of reads, do people prefer taking a unique non redundant set of reads, before doing analysis like snp discovery, chip-seq, etc? Are the exactly same reads any information?

    for de novo assembly, velvet behaves slightly different with a unique set of reads, than with some of the reads repeated.
    --
    bioinfosm

  • #2
    I have similar question. I have seen people mentioning to remove redundant reads for chip-seq to minimize the risk of amplification bias. I tried it, and it certainly impact the peak finding a lot. I guess you should not take unique reads for RNA-seq or small RNA-seq. I will be glad to hear what other people's experience and comment about it.

    Comment


    • #3
      I can imagine that the answer depends on the project.
      If in a low coverage sequencing project I have many reads starting at the same position this would suggest PCR duplicates, especially for paired-end reads (assuming that PE duplicates are those for which both reads are exactly duplicated).

      A highly expressed short gene in RNA-seq, on the other hand, will have many reads that start at the same position without them being PCR duplicates.
      Just removing them would then lead to an underestimate of expression level.

      The recent Sanger paper (Kozarewa et al.) calculated expected duplicate frequencies based on average coverage and read length for whole genome sequencing.
      This makes sense, but I think only if you have an even distribution of coverage across the genome.
      If something like mtDNA is present that has excess coverage I could get an overestimate of duplicate frequency if I assume they are all due to amplification bias.

      So, yes, if it were easy to distinguish between duplicates due to high coverage and PCR duplicates, it might be preferable to eliminate them, at least for SNPs, RNA-Seq where counts matter...

      But again, for example in RNA-seq, how to distinguish between duplicates due to high coverage and PCR duplicates ? Maybe calculating an expected duplicate frequency like the Sanger paper but on a gene by gene basis ?

      May I ask what you mean by "velvet behaves slightly different" ?

      Comment


      • #4
        Thanks for the notes.

        Velvet does de novo sequencing and gives different results if you input a non-redundant set of reads, than using all the reads as input
        Another de novo tool edena produces a non redundant set of reads before it proceeds with the de novo assembly...
        --
        bioinfosm

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Recent Advances in Sequencing Analysis Tools
          by seqadmin


          The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
          Today, 07:48 AM
        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Today, 07:17 AM
        0 responses
        7 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-02-2024, 08:06 AM
        0 responses
        19 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-30-2024, 12:17 PM
        0 responses
        20 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-29-2024, 10:49 AM
        0 responses
        28 views
        0 likes
        Last Post seqadmin  
        Working...
        X