Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • A really bad library?

    Hi guys,
    I am a beginner in the data analysis of experiments RNASeq.
    I am currently trying to align reads (36 bp) from an RNASeq experiment done on a machine Illumina GaIIx.
    As a first analysis I am aligning the reads to the genome with Bowtie, in order to understand how many reads mapped outside coding regions of genes (theoretically, I do not expect to see anything).
    For many samples I can map more than 70% of reads, but of these 60% -70% mapped out of the genes annotated regions.
    Among the readings that map in the genes, many (60-70%) are potential "PCR duplicates".

    Evidently there was some serious problem in the construction of these libraries and in fact I do not think they are usable for expression quantification.

    So, these are my questions:
    1) What is an acceptable percentage of reads (from a good library) that map on the genome outside regions annotated as genes?
    2) Based on your experience, from SE library of 36-bp, what might be an acceptable percentage of "PCR duplicates" (I saw on the forum that this is a much debated topic...)?
    3) In the case of a read that can be mapped in "n" places on the genome, above which the value of "n" is advisable to discard the reading (in practice I refer to the options -m/-k of Bowtie).

    Thank you for your support!

    Francesco.

  • #2
    It really depends how you make your library. If you use poly-T selection you will see more reads mapped to exons. If you use ribo depletion and random priming you will see a lot of transcribed non-coding reads not mapping to exons.

    Comment


    • #3
      I would say 60-80% of mapped reads in exons is pretty common (with polyA+ selection protocol). It also depends on the pre- or post-alignment filtering (are multi mappers kept, what is the sequence quality threshold, how many mismatches are allowed, etc). And of course on the reference annotation and species: for instance, RefSeq annotation describes reliable gene models but is not exhaustive, whereas Ensembl includes automatic gene predictions too.

      Regarding PCR duplicates: how do you define them? I mean, how can you be sure that two identical reads come from the same mRNA and not from different molecules? The higher the expression level, the higher the probability to get identical reads from the same gene.. However, there is something you may want to look at, which is called "library complexity". Check this paper from Levin et al in Nat. Methods.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Best Practices for Single-Cell Sequencing Analysis
        by seqadmin



        While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
        Yesterday, 07:15 AM
      • seqadmin
        Latest Developments in Precision Medicine
        by seqadmin



        Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

        Somatic Genomics
        “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
        05-24-2024, 01:16 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Today, 06:58 AM
      0 responses
      6 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, Yesterday, 08:18 AM
      0 responses
      15 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, Yesterday, 08:04 AM
      0 responses
      14 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 06-03-2024, 06:55 AM
      0 responses
      13 views
      0 likes
      Last Post seqadmin  
      Working...
      X