Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • unique reads for downstream analysis

    I had a general query with short fixed length reads. Though I intend it for Solexa data only, it might be directly applicable to solid as well

    For analysis after obtaining the set of reads, do people prefer taking a unique non redundant set of reads, before doing analysis like snp discovery, chip-seq, etc? Are the exactly same reads any information?

    for de novo assembly, velvet behaves slightly different with a unique set of reads, than with some of the reads repeated.
    --
    bioinfosm

  • #2
    I have similar question. I have seen people mentioning to remove redundant reads for chip-seq to minimize the risk of amplification bias. I tried it, and it certainly impact the peak finding a lot. I guess you should not take unique reads for RNA-seq or small RNA-seq. I will be glad to hear what other people's experience and comment about it.

    Comment


    • #3
      I can imagine that the answer depends on the project.
      If in a low coverage sequencing project I have many reads starting at the same position this would suggest PCR duplicates, especially for paired-end reads (assuming that PE duplicates are those for which both reads are exactly duplicated).

      A highly expressed short gene in RNA-seq, on the other hand, will have many reads that start at the same position without them being PCR duplicates.
      Just removing them would then lead to an underestimate of expression level.

      The recent Sanger paper (Kozarewa et al.) calculated expected duplicate frequencies based on average coverage and read length for whole genome sequencing.
      This makes sense, but I think only if you have an even distribution of coverage across the genome.
      If something like mtDNA is present that has excess coverage I could get an overestimate of duplicate frequency if I assume they are all due to amplification bias.

      So, yes, if it were easy to distinguish between duplicates due to high coverage and PCR duplicates, it might be preferable to eliminate them, at least for SNPs, RNA-Seq where counts matter...

      But again, for example in RNA-seq, how to distinguish between duplicates due to high coverage and PCR duplicates ? Maybe calculating an expected duplicate frequency like the Sanger paper but on a gene by gene basis ?

      May I ask what you mean by "velvet behaves slightly different" ?

      Comment


      • #4
        Thanks for the notes.

        Velvet does de novo sequencing and gives different results if you input a non-redundant set of reads, than using all the reads as input
        Another de novo tool edena produces a non redundant set of reads before it proceeds with the de novo assembly...
        --
        bioinfosm

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Genetic Variation in Immunogenetics and Antibody Diversity
          by seqadmin



          The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
          11-06-2024, 07:24 PM
        • seqadmin
          Choosing Between NGS and qPCR
          by seqadmin



          Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
          10-18-2024, 07:11 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 11-01-2024, 06:09 AM
        0 responses
        29 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 10-30-2024, 05:31 AM
        0 responses
        21 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 10-24-2024, 06:58 AM
        0 responses
        26 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 10-23-2024, 08:43 AM
        0 responses
        57 views
        0 likes
        Last Post seqadmin  
        Working...
        X