Hello everyone,
as others, I am quite excited about pseudo alignment produced by kallisto in minutes instead of real alignment computed for hours. Now, it would be useful to visualise it using IGV.
So from the .gdb file we extracted cds of our bacteria using python scripts. The name of each sequence in cds was the gene_id (which was the same as transcript_id). Exactly, how we would expect.
On this cds file I run kallisto index to index it and then I produced according to the manual of kallisto pseudobam file. (https://pachterlab.github.io/kallisto/manual.html)
kallisto quant -i cds.idx -o output -b 100 --single -l 100 -s 1 --pseudobam <all_RNAseq_reads.fq.gz> | samtools view -Sb - > pseudomap.bam
The .bam file was then sorted and indexed and loaded with .fasta and .gtf file to IGV giving following error
File does not contain any sequence names which match the current genome.
File: *****S5_genome_87, S5_genome_88, S5_genome_89, S5_genome_90, ...
Genome: S5_genome,
S5_genome_XX are gene_ids of our genome and S5 is our genome. So, I thought, that IGV thinks, that every transcript is a chromosome (from few related posts like http://seqanswers.com/forums/archive...p/t-16407.html). So I ve created alias file like this:
S5_genome_87 S5_genome
S5_genome_88 S5_genome
... ...
Now it loaded the file, but reads are not visualised at all. I guess I miss something somewhere. Imho the easiest way would be to edit somehow the .bam file (or the .sam file before it is converted to .bam) to include the information of the only one chromosome of the genome.
If you are still reading, thank you for it. Any help appreciated.
as others, I am quite excited about pseudo alignment produced by kallisto in minutes instead of real alignment computed for hours. Now, it would be useful to visualise it using IGV.
So from the .gdb file we extracted cds of our bacteria using python scripts. The name of each sequence in cds was the gene_id (which was the same as transcript_id). Exactly, how we would expect.
On this cds file I run kallisto index to index it and then I produced according to the manual of kallisto pseudobam file. (https://pachterlab.github.io/kallisto/manual.html)
kallisto quant -i cds.idx -o output -b 100 --single -l 100 -s 1 --pseudobam <all_RNAseq_reads.fq.gz> | samtools view -Sb - > pseudomap.bam
The .bam file was then sorted and indexed and loaded with .fasta and .gtf file to IGV giving following error
File does not contain any sequence names which match the current genome.
File: *****S5_genome_87, S5_genome_88, S5_genome_89, S5_genome_90, ...
Genome: S5_genome,
S5_genome_XX are gene_ids of our genome and S5 is our genome. So, I thought, that IGV thinks, that every transcript is a chromosome (from few related posts like http://seqanswers.com/forums/archive...p/t-16407.html). So I ve created alias file like this:
S5_genome_87 S5_genome
S5_genome_88 S5_genome
... ...
Now it loaded the file, but reads are not visualised at all. I guess I miss something somewhere. Imho the easiest way would be to edit somehow the .bam file (or the .sam file before it is converted to .bam) to include the information of the only one chromosome of the genome.
If you are still reading, thank you for it. Any help appreciated.
Comment