I'm proposing to do some RNA-seq analysis of human cancer tissue transplanted into a mouse (mouse tissue stroma around the tumor), and look at expression in both human and mouse transcripts. I'm guessing I'll have around 90% human, 10% mouse tissue.
My primary question: is there any public data (GEO, SRA) that has samples like this? I'd like to use public data to assess feasibility and test a few strategies for mapping this data. Which leads to question #2:
Secondary question: what's the best way to map this data? I see a few options: (1) map all reads to human, remainder to mouse (or vice versa), (2) map all reads to human, then all reads to mouse, or (3) map all reads to concatenated reference index, eliminating multimappers. I'm thinking #3 would be best, but requires some extra legwork creating combined fasta files, combined indexes, then disentangling reads that align to human vs mouse in the downstream alignment.
My primary question: is there any public data (GEO, SRA) that has samples like this? I'd like to use public data to assess feasibility and test a few strategies for mapping this data. Which leads to question #2:
Secondary question: what's the best way to map this data? I see a few options: (1) map all reads to human, remainder to mouse (or vice versa), (2) map all reads to human, then all reads to mouse, or (3) map all reads to concatenated reference index, eliminating multimappers. I'm thinking #3 would be best, but requires some extra legwork creating combined fasta files, combined indexes, then disentangling reads that align to human vs mouse in the downstream alignment.
Comment