Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • aquleaf
    Member
    • Mar 2010
    • 38

    a Question about Duplicated Genes

    Hi all.

    I have a question about duplicated genes which have several copies in the genome such as Ccl21c. Tophat was used to map the RNA-Seq reads back to the genome and HTSeq was used to count the reads map to each gene. All of the reads stem from the duplicated genes will be get rid of by HTSeq because they are aligned to multiple places. Is there anyway to quantify these genes? Or could I mask the genomic regions of duplicated genes before running Tophat?

    Any suggestion will be much appreciated.

    Best
  • swbarnes2
    Senior Member
    • May 2008
    • 910

    #2
    Don't mask the genome.

    The last thing you want is reads being forced to align to the wrong place, because you masked away the right place.

    If the genes are true duplicates, it's going to be pretty impossible to separate out what reads cam from where.

    Comment

    • aquleaf
      Member
      • Mar 2010
      • 38

      #3
      Thanks very much for your reply. We don't want to distinguish which locus these reads come from for a gene with several copies in the genome, we wish to count all the reads uniquely mapped to this gene. For example, if a gene has 4 copies in the genome, we want to count the number of reads mapped to those regions which were not mapped to other genes. Is there any way to achieve that?

      One more request, could anyone recommend a tool to modify GTF file used in the RNA-Seq analysis? We used the reference GTF file from UCSC and found lots of genes seem identical, such as Gm14430, Gm4724 and Gm14434. We wish to merge such genes into a single gene.

      Thanks a bunch!

      Best

      Originally posted by swbarnes2 View Post
      Don't mask the genome.

      The last thing you want is reads being forced to align to the wrong place, because you masked away the right place.

      If the genes are true duplicates, it's going to be pretty impossible to separate out what reads cam from where.

      Comment

      • liux
        Member
        • Mar 2009
        • 30

        #5
        We mask all but one copy if the duplicates are exactly the same (~50% of genes) or less than 1% (other 40%); and flag the rest.

        It is probably OK if just looking at the gene expression. recently we start to integrate ChIPseq data with mRNAseq data. I can see this approach will cause problems there.

        Comment

        • aquleaf
          Member
          • Mar 2010
          • 38

          #6
          Originally posted by liux View Post
          We mask all but one copy if the duplicates are exactly the same (~50% of genes) or less than 1% (other 40%); and flag the rest.

          It is probably OK if just looking at the gene expression. recently we start to integrate ChIPseq data with mRNAseq data. I can see this approach will cause problems there.
          Thanks very much for your reply. How do you mask them? Is there any software to mask them?

          We will also combine the ChIP-Seq data with mRNA-Seq data later on. What's the problem if the duplicates were masked?

          Thanks very much!

          Comment

          Latest Articles

          Collapse

          • GATTACAT
            Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
            by GATTACAT
            Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
            07-01-2026, 11:43 AM
          • SEQadmin2
            Nine Things a Sample Prep Scientist Thinks About Before Sequencing
            by SEQadmin2


            I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

            Here are nine questions we think about, in roughly the order they matter, before...
            06-18-2026, 07:11 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by SEQadmin2, 07-02-2026, 11:08 AM
          0 responses
          16 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-30-2026, 05:37 AM
          0 responses
          17 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-26-2026, 11:10 AM
          0 responses
          20 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-17-2026, 06:09 AM
          0 responses
          54 views
          0 reactions
          Last Post SEQadmin2  
          Working...