Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Characterizing the problem of pseudogene reads in mapping and parameter tuning

    We are interested in exploring the contribution of pseudogenes and reads from pseudogenes in the challenge of mapping reads and how different aligners perform. We’re testing the performance of three different aligners on this specific problem: TopHat/Bowtie2, STAR and Subjunc.

    The challenge of pseudogenes and pseudogene reads is how each algorithm decides on a location, when faced with an ambiguous situation. Obviously how each algorithm performs will be a function of several parameters.

    As each algorithm approaches ambiguous mapping situations differently, I was hoping to crowd-source some ideas as to which parameters in each algorithm might make the most differences.

    For example, there is an option in STAR that allows you to more heavily weight the transcriptome in mapping decisions:

    sjdbScore 2

    int: extra alignment score for alignmets that cross database junctions


    In addition to identifying parameters and how they might affect mapping in this respect, any general ideas surrounding this problem are welcomed.

    Thanks in advance!

  • #2
    As pseudogenes are not really conserved and thus have a high mutation rate, you might have better luck mapping pseudogene reads to their real counterparts with BBMap, as it has higher sensitivity. Of course, it depends on your goal - when you have a read from a pseudogene that is not part of the reference genome, do you want it to map to a real gene, or nowhere at all? If it maps to a real gene, that could cause false variation calls, but on the other hand, by examining them closely, you can determine that the reads originated from a pseudogene and thus improve the reference.

    The way you can tell is that DNA reads originating from a pseudogene will map like RNA-seq reads (spanning introns) and have the reads that span introns will typically have a high SNP rate, with the same SNPs expressed in all reads spanning the intron but none of the reads that map to both the intron and exon.

    Comment


    • #3
      Our goal is to better understand how several aligners handle the problem. Just as you say, BBMap, with its higher sensitivity will behave differently than other aligners. We want to understand that difference and find the algorithm-specific parameters that adjust that sensitivity.

      Our reason for this goal is because we've seen more than one paper point out that pseudogene reads and existence of pseudogenes in the genome (and transcriptome) are likely causes for errors in read counts with RNA-seq. I haven't seen anyone actually show to what extent this is the case, and we'd like to understand the size of the effect on a genome scale.

      Does that help to clarify our objectives?

      Comment


      • #4
        Ah, I understand better now. Possibly, as a negative control, references should be used in which all known pseudogenes are masked, under the assumption that they should not be expressed in RNA-seq data.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Recent Advances in Sequencing Analysis Tools
          by seqadmin


          The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
          05-06-2024, 07:48 AM
        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 05-14-2024, 07:03 AM
        0 responses
        15 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-10-2024, 06:35 AM
        0 responses
        37 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-09-2024, 02:46 PM
        0 responses
        46 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-07-2024, 06:57 AM
        0 responses
        39 views
        0 likes
        Last Post seqadmin  
        Working...
        X