Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Genotype calling within your sample set instead relative to reference genome

    Hi there,

    I'm developing a workflow to call variants from a dataset of ~600 samples sequenced through genotyping-by-sequencing (GBS) for phylogenomic analyses. My reference genome is rather divergent, around 20 million years. I'm interested in the variants among my sample dataset, not with respect to the reference genome, but those haplotype callers that I'm cheking call the variants with respect the reference (GATK, SAMTOOLS, FreeBayes...) Any suggestion around this problem?

    Thanks a lot guys.
    Last edited by Guillefriis; 11-04-2015, 07:11 AM.

  • #2
    You can create a reference by de novo assembly from you 600 sample data set, then align each to identify sample-specific variants.

    Comment


    • #3
      Depending on how much data there is, 600 samples may be too much to try and assemble at one time. Perhaps a sampling approach and comparing the assemblies between those tries to estimate the differences?

      Comment


      • #4
        I'm not sure if I would see a problem here. Let's assume you would compare your samples against the reference and you would see for example that sample1 has a A(ref)->T(s1) mutation at position 10 while sample2 has a A(ref)->C(s2) mutation at position 10. The variation between samples (here: C vs T) is easy extractable you just use your reference as a backbone for the comparison.

        Comment


        • #5
          If @Guillefriis does what you are proposing then where to set the cut-off to say that a particular difference is due to divergence (present in > X% samples) and so is not interesting?

          Comment


          • #6
            @HESmith I wouldn't like to use a de novo assembly since I need the genomic positions of the variants provided by the zebra finch genome (planning to do a genome scan).

            @WhatsOEver (I don't know if it's a good practice to answer to two posts in one, please let me know if forum users prefer them separatedly) I see your point and actually I thought it could work as you say, only looks computational time wasting to look over differences with respect the reference (there are going to be a lot of them) and extracting between-samples variants afterwards. Looks like SelectVariants GATK tool can do so, but I'm not sure how exactely, somebody has used it? Also, I'm not sure of the behavior of the soft callers when heterozygous at these position, a variant heterozigous site between my samples be filter out because both of the samples have an alternate allele matching the reference?


            @GenoMax I'm not sure if I understood you, I'm not interested in reference-relative variants because my study is focused in phylogenomic relationships within an emberizid genus while my reference is the Zebra Finch, only used for the mapping and downstream analyses.

            Thanks you all guys.
            Last edited by Guillefriis; 11-05-2015, 02:37 AM.

            Comment


            • #7
              Originally posted by Guillefriis View Post
              @GenoMax I'm not sure if I understood you, I'm not interested in reference-relative variants because my study is focused in phylogenomic relationships within an emberizid genus while my reference is the Zebra Finch, only used for the mapping and downstream analyses.
              You may not be interested in them but that is how you are going to pick them, right? Have you done a test to see what this result looks like? I am not an evolutionary biologist by a long shot so I don't know how ~20M year difference has affected the overall genome organization (# of chromosomes, sizes etc).

              With 600 samples you likely have enough data to try some assemblies with a random sampling of reads. That may prove to be a better reference.

              It is late and my mind is wandering ...

              Comment


              • #8
                I wonder if you have considered pyRAD:
                http://dereneaton.com/software/pyrad/

                Comment


                • #9
                  Originally posted by GenoMax View Post
                  You may not be interested in them but that is how you are going to pick them, right? Have you done a test to see what this result looks like? I am not an evolutionary biologist by a long shot so I don't know how ~20M year difference has affected the overall genome organization (# of chromosomes, sizes etc).

                  With 600 samples you likely have enough data to try some assemblies with a random sampling of reads. That may prove to be a better reference.

                  It is late and my mind is wandering ...
                  I think I'll try. I'll lose genomic position of the variants but I can end with a larger number of them, which it's better in phylogenetic terms. Never have done an assembly though!

                  Comment


                  • #10
                    Originally posted by nucacidhunter View Post
                    I wonder if you have considered pyRAD:
                    http://dereneaton.com/software/pyrad/
                    You know @nucacidhunter I had a look and seems pretty interesting, I think that I'll do an intersection called SNPs using bith gatk and pyRAD. Thanks man.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      The Impact of AI in Genomic Medicine
                      by seqadmin



                      Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                      02-26-2024, 02:07 PM
                    • seqadmin
                      Multiomics Techniques Advancing Disease Research
                      by seqadmin


                      New and advanced multiomics tools and technologies have opened new avenues of research and markedly enhanced various disciplines such as disease research and precision medicine1. The practice of merging diverse data from various ‘omes increasingly provides a more holistic understanding of biological systems. As Maddison Masaeli, Co-Founder and CEO at Deepcell, aptly noted, “You can't explain biology in its complex form with one modality.”

                      A major leap in the field has
                      ...
                      02-08-2024, 06:33 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 02-28-2024, 06:12 AM
                    0 responses
                    27 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 02-23-2024, 04:11 PM
                    0 responses
                    74 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 02-21-2024, 08:52 AM
                    0 responses
                    82 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 02-20-2024, 08:57 AM
                    0 responses
                    69 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X