Hi Everyone,
Here is a brief summary of what I am trying to do with my project: Essentially, I want to find how a mutation in the mRNA sequence affects the amino acid sequence of a protein. I have whole exome sequencing data as a .sam file and I am interested in finding the flanking sequence +X nucleotides and -X nucleotides upstream and downstream from a specific site of the mutation. From here, I want to determine the amino acid sequence of that flanking sequence but it has to be correctly in frame from the original sequence.
Here are a few questions that I had in terms of using SAMtools and accomplishing these tasks:
1) I assume I need to find the consensus sequence for the reads in my whole exome sequencing data and how would I be able to do this with SAMtools. I found the mpileup command, but what would be the the reference fasta file in my case. Is finding the consensus even needed?
2) My main issue is going from the .sam file reads to being able to pinpoint the location of interest and get the flanking sequence. What do I need to do to process the .sam exome sequencing file to be able to determine the flanking sequence?
3) Once i find the flanking sequence, how do I figure out the amino acid sequence and adjust accordingly to make sure it is in frame?
4) How do i account for the multiple transcripts that may exist for a particular gene because of alternative splicing?
Sorry for all the questions, it is my first time working in this area. I appreciate any help! Thanks in advance!
Here is a brief summary of what I am trying to do with my project: Essentially, I want to find how a mutation in the mRNA sequence affects the amino acid sequence of a protein. I have whole exome sequencing data as a .sam file and I am interested in finding the flanking sequence +X nucleotides and -X nucleotides upstream and downstream from a specific site of the mutation. From here, I want to determine the amino acid sequence of that flanking sequence but it has to be correctly in frame from the original sequence.
Here are a few questions that I had in terms of using SAMtools and accomplishing these tasks:
1) I assume I need to find the consensus sequence for the reads in my whole exome sequencing data and how would I be able to do this with SAMtools. I found the mpileup command, but what would be the the reference fasta file in my case. Is finding the consensus even needed?
2) My main issue is going from the .sam file reads to being able to pinpoint the location of interest and get the flanking sequence. What do I need to do to process the .sam exome sequencing file to be able to determine the flanking sequence?
3) Once i find the flanking sequence, how do I figure out the amino acid sequence and adjust accordingly to make sure it is in frame?
4) How do i account for the multiple transcripts that may exist for a particular gene because of alternative splicing?
Sorry for all the questions, it is my first time working in this area. I appreciate any help! Thanks in advance!
Comment