Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • 2nd gen sequencing gene content digital MA

    Dear collective,
    I would like to be able to use my 2nd generation Illumina sequencing data in population genomic investigations to do strain-to-strain comparative gene content assessment. This process would be analogous to what used to be done with old DNA-DNA spotted arrays, or as is currently done with Affy probe set arrays. My intention would be to use the depth of coverage relative to a reference when mapping Illumina read data to then make a gene presence or absence call. Is anybody aware of a program or pipeline which has already been set up to do this process. Does anybody have any suggestion for how to modify some other application to accomplish this - in my opinion this seems to be analogous to the read counting that goes on when performing RNAseq. Before going to the trouble of writing code of my own I wanted to be sure that I had not just missed the software that has already been developed. I realize that accomplishing this is fairly straight forward in a spreadsheet if one is comparing only a handful of strains, but I need to be able to apply such a process to literally thousands of strains - and therefore the process has to be scripted.
    Thanks for any assistance.
    Steve Beres
    Last edited by sbberes; 09-05-2013, 07:11 AM. Reason: To explain the scope of the comparisons

  • #2
    Search for an alignment tool. There are multiple alignment tools out there, e.g. BLAST, BWA, Bowtie. BLAST is the most sensitive, but takes the longest. You should therefore look what best suits your needs.

    Comment


    • #3
      Rick,
      I was probably not clear enough. I do not need an aligner I already have .bam alignment files for about 3600 isolates. What I am looking for is a way to use this alignment data to make presence or absence gene calls for all 3000 genes in the sequence data for all 3600 strains relative to the known gene content for this species. This way I can identify those lineages within this population that differ in gene content using clustering programs. This way I do not have to scan thru 3600 blast files. If it takes only one minute to look at anything and you have to do it 3600 times, that is 60 hrs. Therefore every single piece of this analysis has to be automated. I have come to the conclusion that I can accomplish calculation of the average depth of coverage across every gene in a genome or multi-strain combined metagenome using cufflinks with an appropriate .gff file and then make the present or absent call.
      Thanks
      SBB

      Comment


      • #4
        What you're really looking for is to detect structural variations based upon depth of coverage. There are a good number of tools out there that can do this sort of thing, just have a look in pubmed (just search "structural variation read depth"). I've never needed to do this, so I can't give any specific recommendations. At the end, you'll want to simply intersect deleted regions with your gene annotation to get a list of completely/partially deleted genes/isolate.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-25-2024, 11:49 AM
        0 responses
        19 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-24-2024, 08:47 AM
        0 responses
        19 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        62 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        60 views
        0 likes
        Last Post seqadmin  
        Working...
        X