Hello,
So, from Ensembl FTP, we can download a transcriptome file which is a FASTA file containing the header info, which is transcript name, chromosome position, etc. and the dna sequence itself. On the other hand, I have a FASTQ file from RNA-seq experiment.
What I want to do is, generate the FASTA file like Ensembl transcriptome. I think I read this is called consensus FASTA.
What I imagine the step to generate this is like this:
1. Align the reads to the transcriptome reference, we get SAM/BAM
2. Assemble the SAM/BAM according to coordinate
3. Solve the occurence of SNP and indel
4. Generate FASTA file with header information and sequence assembled from step 3
For step 1, I know I can use bowtie2. For step 2, I don't know the tools but I think I can write my own program. The problem is step 3. I don't know how.
In that case, probably you can suggest me well known pipeline to do this because I think this is a general things to do.
What do you suggest for that? Thank you for your reply.
So, from Ensembl FTP, we can download a transcriptome file which is a FASTA file containing the header info, which is transcript name, chromosome position, etc. and the dna sequence itself. On the other hand, I have a FASTQ file from RNA-seq experiment.
What I want to do is, generate the FASTA file like Ensembl transcriptome. I think I read this is called consensus FASTA.
What I imagine the step to generate this is like this:
1. Align the reads to the transcriptome reference, we get SAM/BAM
2. Assemble the SAM/BAM according to coordinate
3. Solve the occurence of SNP and indel
4. Generate FASTA file with header information and sequence assembled from step 3
For step 1, I know I can use bowtie2. For step 2, I don't know the tools but I think I can write my own program. The problem is step 3. I don't know how.
In that case, probably you can suggest me well known pipeline to do this because I think this is a general things to do.
What do you suggest for that? Thank you for your reply.
Comment