Hi everyone,
I noticed that when I download all transcripts of Hg38 genome build from the UCSC Table browser (104178 transcripts, including all splice variants, miRNAs, piRNAs, rRNAs, etc), I can beautifully blast this large fasta file with the 1.3 million 50-mer Illumina RNAseq reads using a standalone blast+ installation. Takes some time but maps ~ 500 reads/sec. Using the corresponding annotation file, I can then annotate all mappings.
Are there any arguments against this approach, i.e. what are the disadvantages of mapping reads against all transcripts variants versus mapping against the intron-containing complete genome with Tophat.
Cheers,
Andrej
I noticed that when I download all transcripts of Hg38 genome build from the UCSC Table browser (104178 transcripts, including all splice variants, miRNAs, piRNAs, rRNAs, etc), I can beautifully blast this large fasta file with the 1.3 million 50-mer Illumina RNAseq reads using a standalone blast+ installation. Takes some time but maps ~ 500 reads/sec. Using the corresponding annotation file, I can then annotate all mappings.
Are there any arguments against this approach, i.e. what are the disadvantages of mapping reads against all transcripts variants versus mapping against the intron-containing complete genome with Tophat.
Cheers,
Andrej
Comment