Hi all
I'd like to align RNAseq data to a reference genome of a very closely related species (in tomato); the aim would be to identify the SNPs (and possible InDels) to build a consensus and, possibly, perform other analyses such as ka/ks. Since it's the first time, in order to get the right (or "a good") direction, I was looking for tips...
My idea would be to align data with Tophat on the genome + gff3 file (however, a transcriptome file containing just CDSs multifasta is also available, even if I do not know if the 5' and 3' end would then be probably missed in the alignment) and then get a consensus (I have the impression that most people prefer GATK) for each specie. Then I could infer some phylogenetical analysis with PAML (maybe yn00 for assessing genes under selective pressure??)
Am I going the right way and/or do you have some hints?
many thanks
Marco
I'd like to align RNAseq data to a reference genome of a very closely related species (in tomato); the aim would be to identify the SNPs (and possible InDels) to build a consensus and, possibly, perform other analyses such as ka/ks. Since it's the first time, in order to get the right (or "a good") direction, I was looking for tips...
My idea would be to align data with Tophat on the genome + gff3 file (however, a transcriptome file containing just CDSs multifasta is also available, even if I do not know if the 5' and 3' end would then be probably missed in the alignment) and then get a consensus (I have the impression that most people prefer GATK) for each specie. Then I could infer some phylogenetical analysis with PAML (maybe yn00 for assessing genes under selective pressure??)
Am I going the right way and/or do you have some hints?
many thanks
Marco