Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculation of pan- and core-genome

    Hi,
    I was hoping someone here could point me in the direction of a good tool for calculation of pan and core genomes in prokaryotes! I am looking for one or several tools/scripts that does a number of things:
    I have a bucket full of bacterial genome data (in contigs mostly) of the same species and would like, based on various gropings of these, to determine initially overall pan and core genomes of the isolates.

    Besides getting just the number of genes in each group, it would also be very beneficial to some sort if genes list output for further analysis.

    Finally, I would like to see what difference there is between the calculated core/pan genome in 1 group compared to another defined set of isolates in another group - again not only a number of genes but an actual list of genes or gene sequences.

    The contigs have not been analysed for CDSs or annotated in any way, but this I can do in another pipeline prior to the pan core calculation if needed.

    Thanks!!!

  • #2
    Can I just clarify - do you have a bunch of reads which are labelled with which isolate of the same species they come from, and on the basis of that you want to pull out the pan (everything) and core (shared) genomes?

    Comment


    • #3
      Hi Zam,
      I have assembled the reads into contigs, and they have names_contigID to them indicating the species and specific isolate they come from. And yes, it from them that I would like to extract the information.

      Comment


      • #4
        Well then, one approach is to assemble a "multicoloured" graph of your data (one colour per isolate), and then dump contigs with information about how many isolates share each contig. Then you can split things however you like - pull out the contigs that everyone shares, 95% share, etc. Software for this is here:

        and the paper contain an example of something similar:


        >Finally, I would like to see what difference there is between the calculated core/pan >genome in 1 group compared to another defined set of isolates in another group - >again not only a number of genes but an actual list of genes or gene sequenc

        You can do any comparisons you like between any subsets you like in this manner. Feel free to contact me directly (zam AT well.ox.ac.uk)

        Comment


        • #5
          Good references for how to do the calculations are Kittichotriat W et al, PLoS ONE July 2011 and Tettelin H. et al PNAS 2005 102:13950-13955 if you want to try doing the analysis or scripting out your own tools. There's also Pan Seq that you can try, but I haven't really been able to get it to work all that well for my purposes.

          Comment


          • #6
            Thanks both of you!!

            And Zam, I may take you up on that offer. And congratulations on that paper.

            Comment


            • #7
              At the risk of being accused of shameless self-promotion, I will point out that this is something that Mauve and specifically progressiveMauve has supported for years. Have a look a the .backbone file output (documentation here).

              Comment


              • #8
                Koadman - Good for you! (I'm certainly in no position to criticise self-promotion)
                Stegger - thanks!

                Comment


                • #9
                  Please self-promote all you can, that just allow me to come back with potential questions to the right people

                  Comment


                  • #10
                    Originally posted by Zam View Post
                    Well then, one approach is to assemble a "multicoloured" graph of your data (one colour per isolate), and then dump contigs with information about how many isolates share each contig. Then you can split things however you like - pull out the contigs that everyone shares, 95% share, etc. Software for this is here:

                    and the paper contain an example of something similar:


                    >Finally, I would like to see what difference there is between the calculated core/pan >genome in 1 group compared to another defined set of isolates in another group - >again not only a number of genes but an actual list of genes or gene sequenc

                    You can do any comparisons you like between any subsets you like in this manner. Feel free to contact me directly (zam AT well.ox.ac.uk)
                    I am also trying to do analysis for PAN/CORE genome, but the above mentioned software is for someone who have good hands in linux based system.

                    Is there a simple way where non-bioinformatician can do this kind of analysis ?

                    Cheers !
                    Shashank

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Exploring the Dynamics of the Tumor Microenvironment
                      by seqadmin




                      The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
                      07-08-2024, 03:19 PM
                    • seqadmin
                      Exploring Human Diversity Through Large-Scale Omics
                      by seqadmin


                      In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
                      06-25-2024, 06:43 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Today, 11:09 AM
                    0 responses
                    7 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 07-19-2024, 07:20 AM
                    0 responses
                    144 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 07-16-2024, 05:49 AM
                    0 responses
                    118 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 07-15-2024, 06:53 AM
                    0 responses
                    111 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X