Background:
I have RNA-Seq data from 8 time points comparing study to control. Each sample has been processed one sample per lane. I do not trust the reference sequence. What I'm interested in is the most significantly changing transcripts.
Hypothesis.
The kmers that are associated with each transcript should change in a coherent manner as the transcript expression changes. Comparing unique kmers first and extracting the most significantly changing kmers should enrich the transcriptome for the genes that are changing the most.
Problem:
I have too much data to do a de-novo assembly. It's quite good quality and even eliminating low frequency reads (kmers) I still have too much data to feed to an assembler like Trinity.
Question.
Assuming I can select a much smaller set of kmers that are significantly changing. How would I feed the resulting set to an assembler to generate a transcriptome of enriched genes?
Caveats.
I don't mind if the contigs that are created from this process results in partial exons associated with the genes that are changing. I can identify them later.
Soooo
1) How would you process a set of kmers to feed to an assembler resulting in a fasta file of contigs
2) If your answer suggests mapping the kmer back to the source read -- can you also suggest how you would do that efficiently (realistically in a decent time frame)
All thoughts are welcome
Joe Carl
I have RNA-Seq data from 8 time points comparing study to control. Each sample has been processed one sample per lane. I do not trust the reference sequence. What I'm interested in is the most significantly changing transcripts.
Hypothesis.
The kmers that are associated with each transcript should change in a coherent manner as the transcript expression changes. Comparing unique kmers first and extracting the most significantly changing kmers should enrich the transcriptome for the genes that are changing the most.
Problem:
I have too much data to do a de-novo assembly. It's quite good quality and even eliminating low frequency reads (kmers) I still have too much data to feed to an assembler like Trinity.
Question.
Assuming I can select a much smaller set of kmers that are significantly changing. How would I feed the resulting set to an assembler to generate a transcriptome of enriched genes?
Caveats.
I don't mind if the contigs that are created from this process results in partial exons associated with the genes that are changing. I can identify them later.
Soooo
1) How would you process a set of kmers to feed to an assembler resulting in a fasta file of contigs
2) If your answer suggests mapping the kmer back to the source read -- can you also suggest how you would do that efficiently (realistically in a decent time frame)
All thoughts are welcome
Joe Carl
Comment