I am looking at the paired end reads of 20 bacteria (4 treatment groups, 5 replicates per treatment group) as well as the ancestor.
Here are the steps I took so far:
1. Use FASTQC to report overrepresented sequences
2. Use cut adapt in order to cut adapter sequences
3. Obtain reference genome, which was made via de novo assembly. An assembly evaluation was also conducted
4. Map reads using BWA
5. Process SAM files with Picard (Clean SAM files, convert to BAM, sort, mark duplicates, add read groups, and merge 5 replicates from each treatment group to create 4 BAM files.
6. Call SNPs with SAMtools
End result: 4 VCF files. 1 for each treatment group.
What do I do now?
How do I interpret these results?
Also, I haven't done anything with the ancestor sample yet, how should I go about doing this?
Results that I am looking for:
1. Which genes exhibited change between ancestor and treatment groups?
2. Are the genes that changed between ancestor and treatment groups the same?
3. What is the function of the gene?
Edit: I have also been tinkering with GATK, but the tutorial is very confusing. I am just trying to get some meaningful results, but it's difficult to do so without a direction as to how I can obtain them.
Edit: So far I have only obtained the number of SNPs found within each treatment group by using the wc and grep command in linux. This is a very basic result, but I guess it's a start.
Here are the steps I took so far:
1. Use FASTQC to report overrepresented sequences
2. Use cut adapt in order to cut adapter sequences
3. Obtain reference genome, which was made via de novo assembly. An assembly evaluation was also conducted
4. Map reads using BWA
5. Process SAM files with Picard (Clean SAM files, convert to BAM, sort, mark duplicates, add read groups, and merge 5 replicates from each treatment group to create 4 BAM files.
6. Call SNPs with SAMtools
End result: 4 VCF files. 1 for each treatment group.
What do I do now?
How do I interpret these results?
Also, I haven't done anything with the ancestor sample yet, how should I go about doing this?
Results that I am looking for:
1. Which genes exhibited change between ancestor and treatment groups?
2. Are the genes that changed between ancestor and treatment groups the same?
3. What is the function of the gene?
Edit: I have also been tinkering with GATK, but the tutorial is very confusing. I am just trying to get some meaningful results, but it's difficult to do so without a direction as to how I can obtain them.
Edit: So far I have only obtained the number of SNPs found within each treatment group by using the wc and grep command in linux. This is a very basic result, but I guess it's a start.
Comment