Unconfigured Ad

**swbarnes2** · 03-01-2013, 04:24 PM

Don't mask the genome.

The last thing you want is reads being forced to align to the wrong place, because you masked away the right place.

If the genes are true duplicates, it's going to be pretty impossible to separate out what reads cam from where.

**aquleaf** · 03-01-2013, 05:15 PM

Thanks very much for your reply. We don't want to distinguish which locus these reads come from for a gene with several copies in the genome, we wish to count all the reads uniquely mapped to this gene. For example, if a gene has 4 copies in the genome, we want to count the number of reads mapped to those regions which were not mapped to other genes. Is there any way to achieve that?

One more request, could anyone recommend a tool to modify GTF file used in the RNA-Seq analysis? We used the reference GTF file from UCSC and found lots of genes seem identical, such as Gm14430, Gm4724 and Gm14434. We wish to merge such genes into a single gene.

Thanks a bunch!

Best

Originally posted by swbarnes2 View Post

Don't mask the genome.

The last thing you want is reads being forced to align to the wrong place, because you masked away the right place.

If the genes are true duplicates, it's going to be pretty impossible to separate out what reads cam from where.

**JackieBadger** · 03-02-2013, 03:50 AM

http://bioinformatics.oxfordjournals.org/content/28/21/2711.abstract

http://genome.cshlp.org/content/20/11/1613.abstract

**liux** · 03-02-2013, 10:09 AM

We mask all but one copy if the duplicates are exactly the same (~50% of genes) or less than 1% (other 40%); and flag the rest.

It is probably OK if just looking at the gene expression. recently we start to integrate ChIPseq data with mRNAseq data. I can see this approach will cause problems there.

**aquleaf** · 03-02-2013, 04:13 PM

Originally posted by liux View Post

We mask all but one copy if the duplicates are exactly the same (~50% of genes) or less than 1% (other 40%); and flag the rest.

It is probably OK if just looking at the gene expression. recently we start to integrate ChIPseq data with mRNAseq data. I can see this approach will cause problems there.

Thanks very much for your reply. How do you mask them? Is there any software to mask them?

We will also combine the ChIP-Seq data with mRNA-Seq data later on. What's the problem if the duplicates were masked?

Thanks very much!

Topics	Statistics	Last Post
High-Resolution Sequencing Exposes Hidden Toxoplasma Diversity by SEQadmin2 Started by SEQadmin2, 07-02-2026, 11:08 AM	0 responses 16 views 0 reactions	Last Post by SEQadmin2 07-02-2026, 11:08 AM
New AI Model Captures Long-Range Genomic Signals to Improve RNA Splice Site Prediction by SEQadmin2 Started by SEQadmin2, 06-30-2026, 05:37 AM	0 responses 17 views 0 reactions	Last Post by SEQadmin2 06-30-2026, 05:37 AM
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 20 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 54 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM

Unconfigured Ad

a Question about Duplicated Genes

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News