I am volunteering in a lab and have been tasked with analysis of some sequences of short mRNA fragments. In a previous life I was a programmer but I have no experience in bioinformatics so this is all new to me.
The plan is to use bowtie to align the cDNA fragments to the Riken mouse cDNA database. The problem is that when I look at that database (ftp://fantom.gsc.riken.jp/fantomdb/3.0/) it looks quite different from mRNA fragments that I download from the UCSC browser.
From the UCSC browser I can get sequences plus a table that has txStart/txEnd and cdsStart/cdsEnd, so I know which chromosome and where on the chromosome each fragment maps, and I know how long the 5' and 3' UTRs are. The Riken database doesn't seem to have that information. It occasionally has some cds location info, but generally not, and nothing about where the fragment maps onto the chromosome.
So my questions are, am I missing something about the Riken data that would make it more useful? If not I will try to assemble a set of non-overlapping sequences from the UCSC browser.
Second, is this a reasonable approach to analyzing short-reads from RNA-seq? Would it be better to use something like cufflinks to align against a full mouse genome?
Thanks in advance for any help.
The plan is to use bowtie to align the cDNA fragments to the Riken mouse cDNA database. The problem is that when I look at that database (ftp://fantom.gsc.riken.jp/fantomdb/3.0/) it looks quite different from mRNA fragments that I download from the UCSC browser.
From the UCSC browser I can get sequences plus a table that has txStart/txEnd and cdsStart/cdsEnd, so I know which chromosome and where on the chromosome each fragment maps, and I know how long the 5' and 3' UTRs are. The Riken database doesn't seem to have that information. It occasionally has some cds location info, but generally not, and nothing about where the fragment maps onto the chromosome.
So my questions are, am I missing something about the Riken data that would make it more useful? If not I will try to assemble a set of non-overlapping sequences from the UCSC browser.
Second, is this a reasonable approach to analyzing short-reads from RNA-seq? Would it be better to use something like cufflinks to align against a full mouse genome?
Thanks in advance for any help.
Comment