Announcement

Collapse
No announcement yet.

visualisation of RNAseq (kallisto to IGV)

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • KamilSJaron
    replied
    I wrote a small python script for conversion of .sam produced by kallisto to .sam readable by IGV using .gtf file. It is not perfect (I was bit in rush when I was writing it) and all transctipts on reverse reverse strand have reads viewed as they would be in forward direction (so opposite than they should), but on the correct place (i.e. if you want to check coverage / transcripts, it is fair enough).

    So if you would be interested

    https://github.com/KamilSJaron/Seque...m_convertor.py

    Usage:

    python3 kallisto_sam_convertor.py <pseudoalignment.sam> <annotation.gtf> | samtools view -bS - | samtools sort - -o <output.bam>

    bam should be loadable to IGV.

    ---edit---
    I think, that to correct the script, it's needed to change a bitflag of reads mapping to transcripts from reverse strand (fw reads - to bw reads and visa reverse) and recompute position of the read (should be symmetric around the middle of a transcript.)
    Last edited by KamilSJaron; 12-01-2016, 11:35 AM. Reason: correction of the specification of the problem, the script in post have.

    Leave a comment:


  • KamilSJaron
    started a topic visualisation of RNAseq (kallisto to IGV)

    visualisation of RNAseq (kallisto to IGV)

    Hello everyone,

    as others, I am quite excited about pseudo alignment produced by kallisto in minutes instead of real alignment computed for hours. Now, it would be useful to visualise it using IGV.

    So from the .gdb file we extracted cds of our bacteria using python scripts. The name of each sequence in cds was the gene_id (which was the same as transcript_id). Exactly, how we would expect.

    On this cds file I run kallisto index to index it and then I produced according to the manual of kallisto pseudobam file. (https://pachterlab.github.io/kallisto/manual.html)

    kallisto quant -i cds.idx -o output -b 100 --single -l 100 -s 1 --pseudobam <all_RNAseq_reads.fq.gz> | samtools view -Sb - > pseudomap.bam

    The .bam file was then sorted and indexed and loaded with .fasta and .gtf file to IGV giving following error

    File does not contain any sequence names which match the current genome.
    File: *****S5_genome_87, S5_genome_88, S5_genome_89, S5_genome_90, ...
    Genome: S5_genome,

    S5_genome_XX are gene_ids of our genome and S5 is our genome. So, I thought, that IGV thinks, that every transcript is a chromosome (from few related posts like http://seqanswers.com/forums/archive...p/t-16407.html). So I ve created alias file like this:

    S5_genome_87 S5_genome
    S5_genome_88 S5_genome
    ... ...

    Now it loaded the file, but reads are not visualised at all. I guess I miss something somewhere. Imho the easiest way would be to edit somehow the .bam file (or the .sam file before it is converted to .bam) to include the information of the only one chromosome of the genome.

    If you are still reading, thank you for it. Any help appreciated.
Working...
X