Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to calculate RAD-Seq digestion sites?

    Hi, everyone!
    I'm new here.i need your help!
    I have got the paired-end RAD-Seq data, now i want to calculate how many digestion site have been covered ? how can i finish that?
    Thanks for your help!
    Last edited by fanwei; 08-19-2013, 06:34 PM.

  • #2
    Hi fanwei,

    You could run Stacks (http://creskolab.uoregon.edu/stacks/) for general RAD-Seq analysis, including how many RAD loci you have sequenced. If you have a reference genome, and are wondering how many of the in silico cut sites are present in your data, you could create a "RAD reference" of the cut sites + 100 bp adjacent DNA and align your reads against that.
    Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

    Comment


    • #3
      Originally posted by SNPsaurus View Post
      Hi fanwei,

      You could run Stacks (http://creskolab.uoregon.edu/stacks/) for general RAD-Seq analysis, including how many RAD loci you have sequenced. If you have a reference genome, and are wondering how many of the in silico cut sites are present in your data, you could create a "RAD reference" of the cut sites + 100 bp adjacent DNA and align your reads against that.
      Thank you!
      yes, i have a reference genome. i have finished mapping using bwa, and using GATK for SNP calling. Approximately 6300 SNPs per sample have been found. But when i want to find specific SNPs between two samples, little has been found(less than 10). It seems that little overlaps exist. My sequencing depth is 3~4X.
      Can Stacks deal with this situation?
      Thanks for your help!

      Comment


      • #4
        If the coverage is low, you probably aren't getting enough depth to call SNPs at most loci. At 3-4X, you won't even pick up most heterozygous SNPs. If the goal is to find SNPs specific to a particular sample, you need to sequence to a high depth, feel confident that you don't have missing data, and then compare.
        Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

        Comment


        • #5
          Originally posted by SNPsaurus View Post
          If the coverage is low, you probably aren't getting enough depth to call SNPs at most loci. At 3-4X, you won't even pick up most heterozygous SNPs. If the goal is to find SNPs specific to a particular sample, you need to sequence to a high depth, feel confident that you don't have missing data, and then compare.
          Yea, because heterozygous SNPs is genetic instability, my goal is to find homozygous SNPs. Do you think the coverage is too low?
          And sequencing is completed by company, they choose the TaqαI(TCGA) to digest genomic DNA. Now i'm wondering whether it is reasonable? Because there are too many digestion site in genome.

          Comment


          • #6
            Was that for RAD or ddRAD or GBS, do you know? If it is RAD-Seq, then the digesting with a 4-cutter enzyme will produce short fragments resistant to shearing, making library creation very inefficient. For any of the methods, a frequent cutter like that will produce 3-5 million tags for a moderate sized genome of 500 Mb. So it is not surprising you have low coverage, unless they sequenced just 2 samples per HiSeq lane.

            I'm guessing they only sequenced a portion of the possible cut sites, and so you ended up with a semi-random set of tags in one sample versus the other, with little overlap between them. If it was ddRAD or GBS, you also have to worry if they were not careful in the size distribution selection, since then one sample may end up with a bigger size range of fragments and a different set of loci selected.

            Why was it paired-end sequenced? Tell me a little about the species, etc.

            If a locus is sequenced at 3X, and it is diploid, then 25% of the time you'll only sequence one chromosome or the other, missing the heterozygosity. So you'll many times think it is homozygous for one allele in one sample and homozygous in the other allele in the other sample, when it is really heterozygous in both.
            Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

            Comment


            • #7
              Originally posted by SNPsaurus View Post
              Was that for RAD or ddRAD or GBS, do you know? If it is RAD-Seq, then the digesting with a 4-cutter enzyme will produce short fragments resistant to shearing, making library creation very inefficient. For any of the methods, a frequent cutter like that will produce 3-5 million tags for a moderate sized genome of 500 Mb. So it is not surprising you have low coverage, unless they sequenced just 2 samples per HiSeq lane.

              I'm guessing they only sequenced a portion of the possible cut sites, and so you ended up with a semi-random set of tags in one sample versus the other, with little overlap between them. If it was ddRAD or GBS, you also have to worry if they were not careful in the size distribution selection, since then one sample may end up with a bigger size range of fragments and a different set of loci selected.

              Why was it paired-end sequenced? Tell me a little about the species, etc.

              If a locus is sequenced at 3X, and it is diploid, then 25% of the time you'll only sequence one chromosome or the other, missing the heterozygosity. So you'll many times think it is homozygous for one allele in one sample and homozygous in the other allele in the other sample, when it is really heterozygous in both.
              Thank you very much! Sorry for incomplete information provided. And i'm quite agree with you!
              Species is rice. It is diploid. The genome is about 400Mb. We choosed paired-end RAD-sequencing method.As previously mentioned,sequencing depth is 3~4X, coverage is 8%.My goal is to find specific SNPs per sample.
              Can you give me some suggestions?

              Comment


              • #8
                If you got the amount of sequencing expected, then the experiment was designed poorly, since that amount of sequencing is guaranteed to give a bad outcome. If I am understanding you, only 8% of the sites are sequenced in a sample. The chance of having reads in both samples is then (.08 X .08 = 0.0064) or less than 1% of the sites will be sequenced in both samples. Then, the low sequencing coverage of 3X at the sites also guarantees that there will be many miscalling of the SNPs.

                So, I don't see any way to rescue this experiment other than lots more sequencing. But it would probably be better to start over with a good design, unfortunately.
                Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

                Comment


                • #9
                  Originally posted by SNPsaurus View Post
                  If you got the amount of sequencing expected, then the experiment was designed poorly, since that amount of sequencing is guaranteed to give a bad outcome. If I am understanding you, only 8% of the sites are sequenced in a sample. The chance of having reads in both samples is then (.08 X .08 = 0.0064) or less than 1% of the sites will be sequenced in both samples. Then, the low sequencing coverage of 3X at the sites also guarantees that there will be many miscalling of the SNPs.

                  So, I don't see any way to rescue this experiment other than lots more sequencing. But it would probably be better to start over with a good design, unfortunately.
                  Thank you very much! I'll redesign my work.

                  Comment


                  • #10
                    Originally posted by fanwei View Post
                    Thank you very much! Sorry for incomplete information provided. And i'm quite agree with you!
                    Species is rice. It is diploid. The genome is about 400Mb. We choosed paired-end RAD-sequencing method.As previously mentioned,sequencing depth is 3~4X, coverage is 8%.My goal is to find specific SNPs per sample.
                    Can you give me some suggestions?
                    You should probably sequence a number of samples in each variety to assay the full genetic diversity of each. If you are looking for SNPs specific to a sample, it is easy to be misled when looking at a small number of individuals.

                    Not knowing enough about your system, a typical approach would be to sequence around 100,000 loci at moderate depth (5X) for a large number of individuals (here at SNPsaurus we work in 96-well plate units). You'll get high-quality genotype calls for homozygous alleles, and can multiplex 190 individuals in a lane.
                    Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

                    Comment


                    • #11
                      Originally posted by SNPsaurus View Post
                      Hi fanwei,

                      You could run Stacks (http://creskolab.uoregon.edu/stacks/) for general RAD-Seq analysis, including how many RAD loci you have sequenced. If you have a reference genome, and are wondering how many of the in silico cut sites are present in your data, you could create a "RAD reference" of the cut sites + 100 bp adjacent DNA and align your reads against that.
                      hi, i'm trying to run Stacks, i have read manual downloaded from web,but also encounter problems. It seems complex. Are you familiar with that? Could you kindly help me how to run Stacks?

                      Comment


                      • #12
                        Sorry, we use our own analysis software for nextRAD. There is a user community at https://groups.google.com/forum/#!forum/stacks-users that might be able to help.
                        Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

                        Comment


                        • #13
                          Originally posted by SNPsaurus View Post
                          Sorry, we use our own analysis software for nextRAD. There is a user community at https://groups.google.com/forum/#!forum/stacks-users that might be able to help.
                          You are very nice! Thank you!

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Advanced Tools Transforming the Field of Cytogenomics
                            by seqadmin


                            At the intersection of cytogenetics and genomics lies the exciting field of cytogenomics. It focuses on studying chromosomes at a molecular scale, involving techniques that analyze either the whole genome or particular DNA sequences to examine variations in structure and behavior at the chromosomal or subchromosomal level. By integrating cytogenetic techniques with genomic analysis, researchers can effectively investigate chromosomal abnormalities related to diseases, particularly...
                            09-26-2023, 06:26 AM
                          • seqadmin
                            How RNA-Seq is Transforming Cancer Studies
                            by seqadmin



                            Cancer research has been transformed through numerous molecular techniques, with RNA sequencing (RNA-seq) playing a crucial role in understanding the complexity of the disease. Maša Ivin, Ph.D., Scientific Writer at Lexogen, and Yvonne Goepel Ph.D., Product Manager at Lexogen, remarked that “The high-throughput nature of RNA-seq allows for rapid profiling and deep exploration of the transcriptome.” They emphasized its indispensable role in cancer research, aiding in biomarker...
                            09-07-2023, 11:15 PM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, Yesterday, 09:38 AM
                          0 responses
                          9 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 09-27-2023, 06:57 AM
                          0 responses
                          11 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 09-26-2023, 07:53 AM
                          1 response
                          23 views
                          0 likes
                          Last Post seed_phrase_metal_storage  
                          Started by seqadmin, 09-25-2023, 07:42 AM
                          0 responses
                          17 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X