Hi all,
Hope to get some help here. My PhD project involved identifying unique and conserved CDSs which overlapped a specific epigenetic motif. I have used bedtools intersect function to obtain all the CDSs which overlapped with those motifs.
I am now trying to find a way to compare all of these "annotation files" which are formatted into tab delimited text files, and contains information as shown in parenthesis: (start coordinate of the motif, end coordinate of the motif, start coordinate of the overlapped CDS, end coordinate of the overlapped CDS, strand of CDS, function, nucleotide fasta sequence, amino acid fasta sequence). There is a total of 9 files which I aimed to compare.
The prefereable output file I am hoping to obtain is the conserved CDSs present across all of these files and inclusive of the same information (which arestart coordinate of the motif, end coordinate of the motif, start coordinate of the overlapped CDS, end coordinate of the overlapped CDS, strand of CDS, function, nucleotide fasta sequence, amino acid fasta sequence)). However, if that is not possible, only the sequences detected to be conserved would be sufficient.
What would be the recommended tool or scripts or methods in order to perform this? I am currently aiming to compare the nucleotide sequences as I want to look for overlapped CDSs that are homologous. Will comparing the amino acid sequences be a better idea? Will ACT or MAUVE be able to achieve this? Is it possible to achieve through any text manipulation script such as awk scripts?
Thank you in advance for any suggestions and kind help. Thanks a lot!
Hope to get some help here. My PhD project involved identifying unique and conserved CDSs which overlapped a specific epigenetic motif. I have used bedtools intersect function to obtain all the CDSs which overlapped with those motifs.
I am now trying to find a way to compare all of these "annotation files" which are formatted into tab delimited text files, and contains information as shown in parenthesis: (start coordinate of the motif, end coordinate of the motif, start coordinate of the overlapped CDS, end coordinate of the overlapped CDS, strand of CDS, function, nucleotide fasta sequence, amino acid fasta sequence). There is a total of 9 files which I aimed to compare.
The prefereable output file I am hoping to obtain is the conserved CDSs present across all of these files and inclusive of the same information (which arestart coordinate of the motif, end coordinate of the motif, start coordinate of the overlapped CDS, end coordinate of the overlapped CDS, strand of CDS, function, nucleotide fasta sequence, amino acid fasta sequence)). However, if that is not possible, only the sequences detected to be conserved would be sufficient.
What would be the recommended tool or scripts or methods in order to perform this? I am currently aiming to compare the nucleotide sequences as I want to look for overlapped CDSs that are homologous. Will comparing the amino acid sequences be a better idea? Will ACT or MAUVE be able to achieve this? Is it possible to achieve through any text manipulation script such as awk scripts?
Thank you in advance for any suggestions and kind help. Thanks a lot!