Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Can I use E. coli K strains reference for B strains Differential Gene Exp analysis?

    Dear All, I currently have a project for E. coli B strains Differential Gene Exp analysis. I did some search only found genome ref for K strains. So I am wondering Can I use E. coli K strains reference for B strains Differential Gene Exp analysis? Or just do a ref-free assembly?
    Thank you so much.

  • #2
    You probably could, as long as the alignments work reasonably well and you use the same reference for all the samples being compared.

    You were thinking of assembling the data otherwise and then using it as a reference?

    Comment


    • #3
      Originally posted by GenoMax View Post
      You probably could, as long as the alignments work reasonably well and you use the same reference for all the samples being compared.

      You were thinking of assembling the data otherwise and then using it as a reference?
      Thanks. I will try.

      Yes, I was thinking about assembly the transcripts first based on the reference-free strategies.

      Comment


      • #4
        I recently published a paper on this exact topic. It show how alignment to non-native reference genomes influences outcomes and gives best practices for doing it. I used an e-coli data set in here too.


        Sequence read alignment to a reference genome is a fundamental step in many genomics studies. Accuracy in this fundamental step is crucial for correct interpretation of biological data. In cases where two or more closely related bacterial strains are being studied, a common approach is to simply map reads from all strains to a common reference genome, whether because there is no closed reference for some strains or for ease of comparison. The assumption is that the differences between bacterial strains are insignificant enough that the results of differential expression analysis will not be influenced by choice of reference. Genes that are common among the strains under study are used for differential expression analysis, while the remaining genes, which may fail to express in one sample or the other because they are simply absent, are analyzed separately. In this study, we investigate the practice of using a common reference in transcriptomic analysis. We analyze two multi-strain transcriptomic data sets that were initially presented in the literature as comparisons based on a common reference, but which have available closed genomic sequence for all strains, allowing a detailed examination of the impact of reference choice. We provide a method for identifying regions that are most affected by non-native alignments, leading to false positives in differential expression analysis, and perform an in depth analysis identifying the extent of expression loss. We also simulate several data sets to identify best practices for non-native reference use.


        I hope this helps, please let me know if I can help with anything!

        Comment


        • #5
          Originally posted by aprice67 View Post
          I recently published a paper on this exact topic. It show how alignment to non-native reference genomes influences outcomes and gives best practices for doing it. I used an e-coli data set in here too.


          Sequence read alignment to a reference genome is a fundamental step in many genomics studies. Accuracy in this fundamental step is crucial for correct interpretation of biological data. In cases where two or more closely related bacterial strains are being studied, a common approach is to simply map reads from all strains to a common reference genome, whether because there is no closed reference for some strains or for ease of comparison. The assumption is that the differences between bacterial strains are insignificant enough that the results of differential expression analysis will not be influenced by choice of reference. Genes that are common among the strains under study are used for differential expression analysis, while the remaining genes, which may fail to express in one sample or the other because they are simply absent, are analyzed separately. In this study, we investigate the practice of using a common reference in transcriptomic analysis. We analyze two multi-strain transcriptomic data sets that were initially presented in the literature as comparisons based on a common reference, but which have available closed genomic sequence for all strains, allowing a detailed examination of the impact of reference choice. We provide a method for identifying regions that are most affected by non-native alignments, leading to false positives in differential expression analysis, and perform an in depth analysis identifying the extent of expression loss. We also simulate several data sets to identify best practices for non-native reference use.


          I hope this helps, please let me know if I can help with anything!
          Thank you so much. Let me digest the paper.

          Comment


          • #6
            Basically, to sum up, it's okay to use a non-native reference of a closely related strain as long as your reads are pretty long. Reads of 100bp do pretty well, 150 do great, 50 are bad. When you extract counts, be careful of edge cases, (see fig. 5 + 6), with htseq or featurecounts you can specify to avoid these false positives.

            I'm happy to answer any specifics if have questions. Good luck!

            Comment


            • #7
              Originally posted by aprice67 View Post
              Basically, to sum up, it's okay to use a non-native reference of a closely related strain as long as your reads are pretty long. Reads of 100bp do pretty well, 150 do great, 50 are bad. When you extract counts, be careful of edge cases, (see fig. 5 + 6), with htseq or featurecounts you can specify to avoid these false positives.

              I'm happy to answer any specifics if have questions. Good luck!
              That great information, Thank you so much

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM
              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              18 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              22 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              17 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              49 views
              0 likes
              Last Post seqadmin  
              Working...
              X