Hello All,
I wanted to know whether repeatmasking a genome or genome fragment, before performing gene annotation, is a good idea. The literature (http://www.nature.com/nrg/journal/v1...l/nrg3174.html, http://onlinelibrary.wiley.com/doi/1...eva.12178/full) and software such as MAKER, and PASA advocate repeatmasking before annotation of genes. While this seems like a logical thing to do, I am wondering about the following questions and looking for answers
1) If a non-repeat related gene contains a repetitive region (lets say part of the gag-pol domain is present in a gene, or some exon contains a satellite repeat), does a masking software like repeatmasker mask these regions.
2) How many cases like the above actually exist in all types of reference genomes. and what is its impact on gene calling/annotation
3) For genome reference guided transcriptome assembly purposes is it recommended that a masked genome be used? I agree that for expression quantification, this may lead to overestimation or under representation in some cases.
Thank you in advance. Your answers will help me design a better annotation pipeline for my clients.
-Abhijit
I wanted to know whether repeatmasking a genome or genome fragment, before performing gene annotation, is a good idea. The literature (http://www.nature.com/nrg/journal/v1...l/nrg3174.html, http://onlinelibrary.wiley.com/doi/1...eva.12178/full) and software such as MAKER, and PASA advocate repeatmasking before annotation of genes. While this seems like a logical thing to do, I am wondering about the following questions and looking for answers
1) If a non-repeat related gene contains a repetitive region (lets say part of the gag-pol domain is present in a gene, or some exon contains a satellite repeat), does a masking software like repeatmasker mask these regions.
2) How many cases like the above actually exist in all types of reference genomes. and what is its impact on gene calling/annotation
3) For genome reference guided transcriptome assembly purposes is it recommended that a masked genome be used? I agree that for expression quantification, this may lead to overestimation or under representation in some cases.
Thank you in advance. Your answers will help me design a better annotation pipeline for my clients.
-Abhijit