Hello all,
I've been looking all over but cannot seem to find the right answer to the following question:
given a BAM file of paired-end reads (2x150bp) mapped to the human genome (I sequenced a pool of various cDNA fragments of +/-500bp with Illumina, mapped with bbmap), what would be a straightforward method to obtain the corresponding amino acid sequence covered by each read pair, including the sequence in between but excluding introns? I'm not looking for a method to remap the reads to the human proteome (like here https://www.biostars.org/p/150241/), I'd rather use the genomic positions of the beginning of R1 and of R2 from the BAM file, convert this to the corresponding amino acid position of the corresponding protein, and have the protein sequences between these positions (not the entire gene, if there are multiple transcripts the canonical one is fine) as output...
I've attached a small drawing for clarity - the example basic output in this situation would be "ATHYPPGMEDDKYKPIPNG".
Anyone could put me on the right track here? Some handy tools that I'm unaware of?
Thanks.
I've been looking all over but cannot seem to find the right answer to the following question:
given a BAM file of paired-end reads (2x150bp) mapped to the human genome (I sequenced a pool of various cDNA fragments of +/-500bp with Illumina, mapped with bbmap), what would be a straightforward method to obtain the corresponding amino acid sequence covered by each read pair, including the sequence in between but excluding introns? I'm not looking for a method to remap the reads to the human proteome (like here https://www.biostars.org/p/150241/), I'd rather use the genomic positions of the beginning of R1 and of R2 from the BAM file, convert this to the corresponding amino acid position of the corresponding protein, and have the protein sequences between these positions (not the entire gene, if there are multiple transcripts the canonical one is fine) as output...
I've attached a small drawing for clarity - the example basic output in this situation would be "ATHYPPGMEDDKYKPIPNG".
Anyone could put me on the right track here? Some handy tools that I'm unaware of?
Thanks.