My data is actually RIP-Seq (RNA immunoprecipitation). However, I don't feel I am getting the answers I need asking around from an RNASeq perspective. I am hoping to get some insight from those familiar with ChIP-Seq.
I have RNA sequencing from mouse neural tissue where I IPed for a protein of interest (no crosslinking). There are three basic treatments. I have: Drug 1, Drug 1 + 2, and Vehicle in triplicate. There is a corresponding knockout sample for each treatment (instead of IgG/bead). In addition to this, I also have the inputs.
From here, there are two basic analyses I want to do:
Relative counts
Get relative counts for each gene in each condition. These data could be used for something like clustering or classification analysis - it answers the questions: "What pool of RNA bound to my protein of interest were immunoprecipitated in each condition and how much (relatively) do I have?"
I can easily normalize using one of the many normalization schemes commonly employed in RNASeq. However, how do I factor in my knockouts for each condition? Or do I just use the input? Can I divide all counts by the totals? Essentially, how to subtract out my background?
Differential expression
Option 1: Treat data as RNASeq. Then use some commonly applicable software (DESeq2, EdgeR, limma for RNASeq) that models the data. Here, the knockouts are treated as an interacting factor in my model matrix. But what about the input? Can they be used as well?
Option 2: Treat data as RIP-Seq. Use RIPSeeker to analyze data. RIPSeeker uses a similar strategy to ChIP-Seq by using peak-calling. But here it seems again I have to chose between knockouts or input.
Then there's this more broad question:
Creating a consensus list
Using input or KO in each of the scenarios above may produce lists that are somewhat different. In fact, even between option 1 and option 2 of differential expression analysis might result in different sets of genes. How do I know which is the "real" list? I am tempted to just look which list conforms with the literature but this seems biased. How can I confidently select which list is "correct"?
I just feel I have so many options in front of me and I want to ensure I am approaching this correctly.
Other relevant threads imply that input may not be as useful:
http://seqanswers.com/forums/showthread.php?t=12092
http://seqanswers.com/forums/showthread.php?t=35377
http://seqanswers.com/forums/showthread.php?t=6918
http://seqanswers.com/forums/showthread.php?t=4480
http://seqanswers.com/forums/showthread.php?t=8783
I have RNA sequencing from mouse neural tissue where I IPed for a protein of interest (no crosslinking). There are three basic treatments. I have: Drug 1, Drug 1 + 2, and Vehicle in triplicate. There is a corresponding knockout sample for each treatment (instead of IgG/bead). In addition to this, I also have the inputs.
From here, there are two basic analyses I want to do:
Relative counts
Get relative counts for each gene in each condition. These data could be used for something like clustering or classification analysis - it answers the questions: "What pool of RNA bound to my protein of interest were immunoprecipitated in each condition and how much (relatively) do I have?"
I can easily normalize using one of the many normalization schemes commonly employed in RNASeq. However, how do I factor in my knockouts for each condition? Or do I just use the input? Can I divide all counts by the totals? Essentially, how to subtract out my background?
Differential expression
Option 1: Treat data as RNASeq. Then use some commonly applicable software (DESeq2, EdgeR, limma for RNASeq) that models the data. Here, the knockouts are treated as an interacting factor in my model matrix. But what about the input? Can they be used as well?
Option 2: Treat data as RIP-Seq. Use RIPSeeker to analyze data. RIPSeeker uses a similar strategy to ChIP-Seq by using peak-calling. But here it seems again I have to chose between knockouts or input.
Then there's this more broad question:
Creating a consensus list
Using input or KO in each of the scenarios above may produce lists that are somewhat different. In fact, even between option 1 and option 2 of differential expression analysis might result in different sets of genes. How do I know which is the "real" list? I am tempted to just look which list conforms with the literature but this seems biased. How can I confidently select which list is "correct"?
I just feel I have so many options in front of me and I want to ensure I am approaching this correctly.
Other relevant threads imply that input may not be as useful:
http://seqanswers.com/forums/showthread.php?t=12092
http://seqanswers.com/forums/showthread.php?t=35377
http://seqanswers.com/forums/showthread.php?t=6918
http://seqanswers.com/forums/showthread.php?t=4480
http://seqanswers.com/forums/showthread.php?t=8783
Comment