We're currently grappling with the reference sequence name synonym issue. When we use mapping results or assembly results from different programs in conjunction with features from yet other sources, we find that we have to allow for a single reference sequence to have multiple names (IDs). Reference sequences may already exist (mapping) or may be generated for contigs (assembly). The result is that we have mapped reads with reference sequence IDs that we need to either convert to a standard set of IDs before adding the reads to BAM files or we need to use a synonym table and perform multiple BAM file queries according to the number of different names (IDs) a reference sequence might have. Using GFF files gives analogous problems.
How have others dealt with this? Should we just get used to the idea of always having perform a sed-like editing task in order to ensure a common naming convention for the reference sequences as referred to by mapped reads and features?
How have others dealt with this? Should we just get used to the idea of always having perform a sed-like editing task in order to ensure a common naming convention for the reference sequences as referred to by mapped reads and features?