Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Genotyping By Sequencing (GBS) and SNP calling

    Dear all,

    I am interested in using the GBS method and then perform SNP detection on Illumina reads.
    However, I am not sure about which would be the appropriate software to preform this task on:

    i) a de novo species
    ii) using a draft reference genome with multiple scaffolds


    What about using VarScan for the first case and TASR for the second?
    Does anyone has any experience with these....I also wonder if STACKs will perform well on these data...

    Thanks in advance,
    Fernando

  • #2
    Hola!

    You could try this software for de novo variant calling with no reference at all - ideal for a de novo species
    software:

    paper in Nature Genetics here:


    This works for your case 1. It also works for case 2, and can either use or ignore your draft, as you prefer. The paper shows how well the method works, and also shows how you can get better results with a de novo species if you have data from multiple samples, rather than just from one.

    To be clear - I am biased, as I am an author :-)

    Good luck!

    Zam

    Comment


    • #3
      Hi Zam,

      Cortex_var will do assemble before calling variants, but reads generated by GBS are not supposed to have overlapping. Is it OK in this case to use cortex_var? If it still works fine, cortex_var would be a great tool for calling SNP in GBS data.

      Thanks,

      swang

      Comment


      • #4
        Thanks for your response Zam,

        I will consider your program. and come back after reading more about it.

        I guess one way to overcome swang concern is to use pair-end reads to facilitate the assembly, does it?

        Cheers,
        fcr

        Comment


        • #5
          Hi there. Thanks for pointing this out swangg. I must admit in my ignorance, I misunderstood, and thought "genotyping by sequencing" was a generic term to distinguish genotyping from shotgun sequencing from genotyping using a chip/array. Anyway, the answer is that I don't know enough about how the reads from sequencing by genotyping are produced after the restriction cut. As swangg said, Cortex will only call a SNP (or variant) if there are enough reads to cover both alleles (ie k-1 bases before and after the variant, on both alleles).

          I know people are happily using Cortex with RAD-sequencing data, with results that look good - they have paired end reads where one end is at the tag and the other end is an insert away, and they use just the second read to look for variants. I also have seen Cortex used on restriction data where the number of SNP calls was lower than I expected.

          fcr - I don't think pairing is the answer to swanng's question - it basically reduces to a question of how the reads are sampled. Do you get single ended reads which precisely are adjacent to the restriction cut site? Or do you get paired, where one end is at the cut site, in which case you could find SNPs using the second read.

          Hope that clarifies a bit.

          Comment


          • #6
            Hi Zam,

            That's pretty interesting. But what the reason of using the second pair for a RAD fragment to call SNPs? My guess is that sequence quality is always better at the beginning of the read...and the interesting thing is look for variation in the genome rather than precisely after the restriction enzyme target.

            Thanks a lot,
            fcr

            Comment


            • #7
              Hi fcr - yes - the RAD target itself is not of interest - the idea is the first read gets the tag/restriction site, and hopefully is monomorphic. The second read is 200bp away (or whatever). Do this for a bunch of different samples, and at a fixed tag, each sample has a set of reads 200bp away from that site. If there are SNPs there, then you find them in those reads. It's just a way of looking for variants in a non-model genome where you don't have a reference, and want to ensure you get a bunch of reads from really different places.

              Comment


              • #8
                Genotyping By Sequencing (GBS) and SNP calling

                The idea to use Cortex for GBS is quite interesting. I thought utilized Velvet/Oases to assemble the genome would be the best way. I should consider and try Cortex tomorrow .

                In my case , i work with diploid plant and it's multi samples. I also plan to do SNP detection with Sequenom after SNP discovery. But I notice an issue to obtain the SNP position from de novo assembly. Is that simply to take SNP position given by Cortex or I need to do custom made script to obtain the position?

                Comment


                • #9
                  Cortex will produce SNP+other variant calls for you, and will genotype your samples. It will also produce flanking sequence for your calls, which should help you set up your Sequenom. That will all happen automatically, without any outside information.

                  Cortex will also give you a position relative to whichever reference you specify (it produces a VCF file), but it won't build that reference for you. If you have no such reference, you can still get a VCF-like file, without meaningful chr/pos. i.e. you don't need chr/position in order to get your calls and design your primers.

                  If your samples are from a single population, then Cortex can also accurately classify calls as repeat, variant or error, by comparing models for how coverage would behave on the two alleles.

                  Comment


                  • #10
                    Genotyping By Sequencing (GBS) and SNP calling

                    Thanks Zam,

                    I did also discuss with other researcher regarding this. Some of them prefer to write custom made script to call the SNP. but like me, i'm not really genius to write complicate script. so i still have to rely with variant call tools.
                    somehow i'm still thinking that is that reasonable to obtain which chr for the SNPs position in the case of de novo assembly? Can I get that from Cortex?

                    Comment


                    • #11
                      Dear all,

                      Sorry if I'm asking a silly sequence.
                      Can I clarify with u guys regarding GBS? Is that purposely done with DNA extraction? What about RNA extraction? Based on reading I found that for GBS approach they use restriction enzyme to reduce the genome complexity. I'm confuse now.

                      Does Cortex work with RNA-seq?

                      Hope anyone can explain here.
                      Thanks.

                      Comment


                      • #12
                        Hi there,
                        To answer your questions about Cortex
                        1. cortex will only give you chr/pos coordinates if you give it a reference. It does not attempt to build a whole genome assembly.
                        2. Cortex will work with RNA-seq, but all of the modelling work is tailored for DNA sequencing. Feel free to use it with RNA-seq as an experiment/exploration, it does provide the useful ability to compare multiple samples, and I am using it on RNA-seq data myself. However the error-cleaning methods and model are (currently) not well tailored for RNA-seq data, and right now you are probably better off with other tools.

                        Comment


                        • #13
                          Anyone ever use this pipeline for GBS?

                          Comment


                          • #14
                            Hi Geneus,

                            I'm currently working with GBS data and that maizegenetics pipeline (TASSEL). It performed pretty well, in the end giving me SNPs for a set of individuals which had been sequenced paired-end on an Illumina HiSeq with multiplexing/pooling.

                            However, there are some issues with TASSEL that are suboptimal for my usecase:
                            1. Reads are being cropped to 64bp by the pipeline. I'd like to use more of my original 100bp reads.
                            2. If you are mapping your reads against a reference (as I do) chromosome names in that reference have to be numeric. This seems like a somewhat random constraint but you need to account for it by renaming your chromosomes accordingly.
                              Also note that TASSEL itself does not include a mapper. I used BWA to do the mapping against the reference genome. Once you get the SAM file out of that you can use TASSEL to go on (e.g. calling SNPs).


                            Especially the 64pb constraint bothers me a little which is why I would be very interested to know the answer to the original question in this thread:

                            Is STACKS safe to use for GBS data?

                            Comment


                            • #15
                              Originally posted by Harremsis View Post
                              Hi Geneus,

                              I'm currently working with GBS data and that maizegenetics pipeline (TASSEL). It performed pretty well, in the end giving me SNPs for a set of individuals which had been sequenced paired-end on an Illumina HiSeq with multiplexing/pooling.

                              However, there are some issues with TASSEL that are suboptimal for my usecase:
                              1. Reads are being cropped to 64bp by the pipeline. I'd like to use more of my original 100bp reads.
                              2. If you are mapping your reads against a reference (as I do) chromosome names in that reference have to be numeric. This seems like a somewhat random constraint but you need to account for it by renaming your chromosomes accordingly.
                                Also note that TASSEL itself does not include a mapper. I used BWA to do the mapping against the reference genome. Once you get the SAM file out of that you can use TASSEL to go on (e.g. calling SNPs).


                              Especially the 64pb constraint bothers me a little which is why I would be very interested to know the answer to the original question in this thread:

                              Is STACKS safe to use for GBS data?
                              I am pretty sure that Dr. Buckler's lab uses Novoalign as an alignment tool...at least, so I am told. I know Dr. Buckler quite well and I am sure if you reached back to him he would be happy to engage in a detailed conversation on the TASSEL pipeline and listen to any suggestions...he is brilliant yet open minded...a very rare combination.

                              I cannot comment on STACKS...perhaps yet another question for Dr. Buckler?

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              17 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              22 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              16 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              46 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X