Hi all,
Thought I would see if anyone has a different (better!) pipeline for collecting a set of SNPs from many bam files for use with STRUCTURE or other population genetic software. The end goal is to have a matrix with SNPs as rows and samples in column.
Ex.
x y z
Site 1 T T G
Site 2 G A C
Site 3 A G G
1. Create mpileup file
2. Call SNPs (with your favorite software) for each sample
3. Create merged SNP list, with the end of goal of having each position where there is a SNP in at least one sample
4. Create consensus fasta file from mpileup for each sample
5. Extract consensus nucleotide from each position (#3) from consensus fasta (#4)
6. Merge files
One problem is that some SNP sites found in particular samples may not pass quality filters in other samples. I would love to hear of other pipelines.
Thanks!
Thought I would see if anyone has a different (better!) pipeline for collecting a set of SNPs from many bam files for use with STRUCTURE or other population genetic software. The end goal is to have a matrix with SNPs as rows and samples in column.
Ex.
x y z
Site 1 T T G
Site 2 G A C
Site 3 A G G
1. Create mpileup file
2. Call SNPs (with your favorite software) for each sample
3. Create merged SNP list, with the end of goal of having each position where there is a SNP in at least one sample
4. Create consensus fasta file from mpileup for each sample
5. Extract consensus nucleotide from each position (#3) from consensus fasta (#4)
6. Merge files
One problem is that some SNP sites found in particular samples may not pass quality filters in other samples. I would love to hear of other pipelines.
Thanks!
Comment