Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • illuminaGA
    Member
    • Dec 2012
    • 71

    Can I use E. coli K strains reference for B strains Differential Gene Exp analysis?

    Dear All, I currently have a project for E. coli B strains Differential Gene Exp analysis. I did some search only found genome ref for K strains. So I am wondering Can I use E. coli K strains reference for B strains Differential Gene Exp analysis? Or just do a ref-free assembly?
    Thank you so much.
  • GenoMax
    Senior Member
    • Feb 2008
    • 7142

    #2
    You probably could, as long as the alignments work reasonably well and you use the same reference for all the samples being compared.

    You were thinking of assembling the data otherwise and then using it as a reference?

    Comment

    • illuminaGA
      Member
      • Dec 2012
      • 71

      #3
      Originally posted by GenoMax View Post
      You probably could, as long as the alignments work reasonably well and you use the same reference for all the samples being compared.

      You were thinking of assembling the data otherwise and then using it as a reference?
      Thanks. I will try.

      Yes, I was thinking about assembly the transcripts first based on the reference-free strategies.

      Comment

      • aprice67
        Member
        • Nov 2012
        • 49

        #4
        I recently published a paper on this exact topic. It show how alignment to non-native reference genomes influences outcomes and gives best practices for doing it. I used an e-coli data set in here too.


        Sequence read alignment to a reference genome is a fundamental step in many genomics studies. Accuracy in this fundamental step is crucial for correct interpretation of biological data. In cases where two or more closely related bacterial strains are being studied, a common approach is to simply map reads from all strains to a common reference genome, whether because there is no closed reference for some strains or for ease of comparison. The assumption is that the differences between bacterial strains are insignificant enough that the results of differential expression analysis will not be influenced by choice of reference. Genes that are common among the strains under study are used for differential expression analysis, while the remaining genes, which may fail to express in one sample or the other because they are simply absent, are analyzed separately. In this study, we investigate the practice of using a common reference in transcriptomic analysis. We analyze two multi-strain transcriptomic data sets that were initially presented in the literature as comparisons based on a common reference, but which have available closed genomic sequence for all strains, allowing a detailed examination of the impact of reference choice. We provide a method for identifying regions that are most affected by non-native alignments, leading to false positives in differential expression analysis, and perform an in depth analysis identifying the extent of expression loss. We also simulate several data sets to identify best practices for non-native reference use.


        I hope this helps, please let me know if I can help with anything!

        Comment

        • illuminaGA
          Member
          • Dec 2012
          • 71

          #5
          Originally posted by aprice67 View Post
          I recently published a paper on this exact topic. It show how alignment to non-native reference genomes influences outcomes and gives best practices for doing it. I used an e-coli data set in here too.


          Sequence read alignment to a reference genome is a fundamental step in many genomics studies. Accuracy in this fundamental step is crucial for correct interpretation of biological data. In cases where two or more closely related bacterial strains are being studied, a common approach is to simply map reads from all strains to a common reference genome, whether because there is no closed reference for some strains or for ease of comparison. The assumption is that the differences between bacterial strains are insignificant enough that the results of differential expression analysis will not be influenced by choice of reference. Genes that are common among the strains under study are used for differential expression analysis, while the remaining genes, which may fail to express in one sample or the other because they are simply absent, are analyzed separately. In this study, we investigate the practice of using a common reference in transcriptomic analysis. We analyze two multi-strain transcriptomic data sets that were initially presented in the literature as comparisons based on a common reference, but which have available closed genomic sequence for all strains, allowing a detailed examination of the impact of reference choice. We provide a method for identifying regions that are most affected by non-native alignments, leading to false positives in differential expression analysis, and perform an in depth analysis identifying the extent of expression loss. We also simulate several data sets to identify best practices for non-native reference use.


          I hope this helps, please let me know if I can help with anything!
          Thank you so much. Let me digest the paper.

          Comment

          • aprice67
            Member
            • Nov 2012
            • 49

            #6
            Basically, to sum up, it's okay to use a non-native reference of a closely related strain as long as your reads are pretty long. Reads of 100bp do pretty well, 150 do great, 50 are bad. When you extract counts, be careful of edge cases, (see fig. 5 + 6), with htseq or featurecounts you can specify to avoid these false positives.

            I'm happy to answer any specifics if have questions. Good luck!

            Comment

            • illuminaGA
              Member
              • Dec 2012
              • 71

              #7
              Originally posted by aprice67 View Post
              Basically, to sum up, it's okay to use a non-native reference of a closely related strain as long as your reads are pretty long. Reads of 100bp do pretty well, 150 do great, 50 are bad. When you extract counts, be careful of edge cases, (see fig. 5 + 6), with htseq or featurecounts you can specify to avoid these false positives.

              I'm happy to answer any specifics if have questions. Good luck!
              That great information, Thank you so much

              Comment

              Latest Articles

              Collapse

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, 06-05-2026, 10:09 AM
              0 responses
              14 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-04-2026, 08:59 AM
              0 responses
              24 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-02-2026, 12:03 PM
              0 responses
              29 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-02-2026, 11:40 AM
              0 responses
              23 views
              0 reactions
              Last Post SEQadmin2  
              Working...