Unconfigured Ad

**GenoMax** · 08-21-2017, 11:20 AM

You probably could, as long as the alignments work reasonably well and you use the same reference for all the samples being compared.

You were thinking of assembling the data otherwise and then using it as a reference?

**illuminaGA** · 08-21-2017, 11:45 AM

Originally posted by GenoMax View Post

You probably could, as long as the alignments work reasonably well and you use the same reference for all the samples being compared.

You were thinking of assembling the data otherwise and then using it as a reference?

Thanks. I will try.

Yes, I was thinking about assembly the transcripts first based on the reference-free strategies.

**aprice67** · 08-22-2017, 10:21 AM

I recently published a paper on this exact topic. It show how alignment to non-native reference genomes influences outcomes and gives best practices for doing it. I used an e-coli data set in here too.

The quantitative impact of read mapping to non-native reference genomes in comparative RNA-Seq studies

http://journals.plos.org/plosone/article/related?id=10.1371/journal.pone.0180904

Sequence read alignment to a reference genome is a fundamental step in many genomics studies. Accuracy in this fundamental step is crucial for correct interpretation of biological data. In cases where two or more closely related bacterial strains are being studied, a common approach is to simply map reads from all strains to a common reference genome, whether because there is no closed reference for some strains or for ease of comparison. The assumption is that the differences between bacterial strains are insignificant enough that the results of differential expression analysis will not be influenced by choice of reference. Genes that are common among the strains under study are used for differential expression analysis, while the remaining genes, which may fail to express in one sample or the other because they are simply absent, are analyzed separately. In this study, we investigate the practice of using a common reference in transcriptomic analysis. We analyze two multi-strain transcriptomic data sets that were initially presented in the literature as comparisons based on a common reference, but which have available closed genomic sequence for all strains, allowing a detailed examination of the impact of reference choice. We provide a method for identifying regions that are most affected by non-native alignments, leading to false positives in differential expression analysis, and perform an in depth analysis identifying the extent of expression loss. We also simulate several data sets to identify best practices for non-native reference use.

I hope this helps, please let me know if I can help with anything!

**illuminaGA** · 08-22-2017, 11:18 AM

Originally posted by aprice67 View Post

I recently published a paper on this exact topic. It show how alignment to non-native reference genomes influences outcomes and gives best practices for doing it. I used an e-coli data set in here too.

The quantitative impact of read mapping to non-native reference genomes in comparative RNA-Seq studies

http://journals.plos.org/plosone/article/related?id=10.1371/journal.pone.0180904

Sequence read alignment to a reference genome is a fundamental step in many genomics studies. Accuracy in this fundamental step is crucial for correct interpretation of biological data. In cases where two or more closely related bacterial strains are being studied, a common approach is to simply map reads from all strains to a common reference genome, whether because there is no closed reference for some strains or for ease of comparison. The assumption is that the differences between bacterial strains are insignificant enough that the results of differential expression analysis will not be influenced by choice of reference. Genes that are common among the strains under study are used for differential expression analysis, while the remaining genes, which may fail to express in one sample or the other because they are simply absent, are analyzed separately. In this study, we investigate the practice of using a common reference in transcriptomic analysis. We analyze two multi-strain transcriptomic data sets that were initially presented in the literature as comparisons based on a common reference, but which have available closed genomic sequence for all strains, allowing a detailed examination of the impact of reference choice. We provide a method for identifying regions that are most affected by non-native alignments, leading to false positives in differential expression analysis, and perform an in depth analysis identifying the extent of expression loss. We also simulate several data sets to identify best practices for non-native reference use.

I hope this helps, please let me know if I can help with anything!

Thank you so much. Let me digest the paper.

**aprice67** · 08-22-2017, 11:38 AM

Basically, to sum up, it's okay to use a non-native reference of a closely related strain as long as your reads are pretty long. Reads of 100bp do pretty well, 150 do great, 50 are bad. When you extract counts, be careful of edge cases, (see fig. 5 + 6), with htseq or featurecounts you can specify to avoid these false positives.

I'm happy to answer any specifics if have questions. Good luck!

**illuminaGA** · 08-23-2017, 08:16 AM

Originally posted by aprice67 View Post

Basically, to sum up, it's okay to use a non-native reference of a closely related strain as long as your reads are pretty long. Reads of 100bp do pretty well, 150 do great, 50 are bad. When you extract counts, be careful of edge cases, (see fig. 5 + 6), with htseq or featurecounts you can specify to avoid these false positives.

I'm happy to answer any specifics if have questions. Good luck!

That great information, Thank you so much

Topics	Statistics	Last Post
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 14 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 24 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, 06-02-2026, 12:03 PM	0 responses 29 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 12:03 PM
DNA Methylation Study Reveals How Epigenetic Changes Pass Between Generations by SEQadmin2 Started by SEQadmin2, 06-02-2026, 11:40 AM	0 responses 23 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 11:40 AM

Unconfigured Ad

Can I use E. coli K strains reference for B strains Differential Gene Exp analysis?

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News