Has anyone used any SNP analysis pipelines other than the standard variant pipeline? Any suggestions on good ones to try?
Unconfigured Ad
Collapse
X
-
We use most of the libraries from bioconductors.Originally posted by cmm8cmm8 View PostHas anyone used any SNP analysis pipelines other than the standard variant pipeline? Any suggestions on good ones to try?
The Bioconductor project aims to develop and share open source software for precise and repeatable analysis of biological data. We foster an inclusive and collaborative community of developers and data scientists.
Here you will find all possible libraries for several platforms like Affy, Agilent, Illumina..
I hope this helps.
Comment
-
-
I think there's a confusion here between SNPs for microarray and SNP detection from sequencing. I believe the Bioconductor packages is for the former and the people were asking for the latter.Originally posted by manoj.b View PostWe use most of the libraries from bioconductors.
The Bioconductor project aims to develop and share open source software for precise and repeatable analysis of biological data. We foster an inclusive and collaborative community of developers and data scientists.
Here you will find all possible libraries for several platforms like Affy, Agilent, Illumina..
I hope this helps.
Q: What do you consider as "standard"? Only from the vendors? How about MAQ?
I'm testing NextGENe currently, which is supposedly designed for SNP detection (well, really for mutation detection). Would love to hear what other people use.
Comment
-
-
For SOLiD data, we use BFAST (admittedly my own aligner) [https://secure.genome.ucla.edu/index.php/BFAST]. The output of that is converted to SAM format (for use with samtools) [http://samtools.sourceforge.net/].
We then use the MAQ consensus model to call SNPs using samtools, modifying the various parameters (train on known data) to get the correct TPR and FPR for calling hets.
Nils
Comment
-
-
Nilshomer,
I recognize that your data is SOLID, but I was wondering about your method for concensus calling in which you "train on known data" to find the best parameter settings.
I, too, am interested in doing such a thing. I have a 1M SNP Illumina array and Next-Gen data from the Illumina GA2 on the exome. What type of data did you train on?
Which parameters did you find needed the most tweaking?
Did you also find that the number of variants called by MAQ (or Samtools, in your case) was very high? I get >180,000 variants in the cns.filter.snp file when using the parameters from easyrun. This seems like way too many, but I'm having difficulty distinguishing the real things from the false positives.
Looking forward to hearing your input...
Comment
-
-
We are novice bioinformtacists so use CLC Bio's Genomic Workbench. The DIP (deletion-insertion polymorphism) algorithm works well. The SNP algorithm definitely detects known SNPs and we are optimizing the settings for best sensitivity and specificity. So far if we maximize specificity by looking at the X and Y chromosomes where SNPs should obviously be homozygous for male DNA samples, it reduces sensitivity and we miss too many known SNPs. Relaxing the criteria gives us better sensitivity but we get too many false positives.
Comment
-
-
I would plot an ROC curve based on all of the parameters in samtools at sites that Illumina genotyped as heterozygous assuming no genotyping error (1/10,000 in actuality). I found varying the "-r" parameter to be of most value. Also, further filtering like requiring a variant to be seen on both strand with sufficient coverage and quality helps a lot. We applied all these methods in our paper (self-publicity).Originally posted by erichpowell View PostNilshomer,
I recognize that your data is SOLID, but I was wondering about your method for concensus calling in which you "train on known data" to find the best parameter settings.
I, too, am interested in doing such a thing. I have a 1M SNP Illumina array and Next-Gen data from the Illumina GA2 on the exome. What type of data did you train on?
Which parameters did you find needed the most tweaking?
Did you also find that the number of variants called by MAQ (or Samtools, in your case) was very high? I get >180,000 variants in the cns.filter.snp file when using the parameters from easyrun. This seems like way too many, but I'm having difficulty distinguishing the real things from the false positives.
Looking forward to hearing your input...
Comment
-
-
Hi,
Without using LifeTech's BioScope/LifeScope, I think the following pipeline can be applied to SOLiD data for SNP/indel detection.
1) *.csfasta+*.qual / *.XSQ -> SAM/BAM
BFAST, BWA, or NovoalignCS
2) SAM/BAM -> SNP/indel detection
SAM tools or GATK (more accurate)
3) Annotation
GATK or ANNOVAR
I think SAM tools and GATK do not use color-space information to detect SNPs/indels. That is one of the advantage of BioScope/LifeScope.Last edited by HiroMishima; 10-31-2011, 04:44 PM.
Comment
-
-
I was wondering this, too. Thanks for your information!!Originally posted by HiroMishima View PostHi,
Without using LifeTech's BioScope/LifeScope, I think the following pipeline can be applied to SOLiD data for SNP/indel detection.
1) *.csfasta+*.qual / *.XSQ -> SAM/BAM
BFAST, BWA, or NovoalignCS
2) SAM/BAM -> SNP/indel detection
SAM tools or GATK (more accurate)
3) Annotation
GATK or ANNOVAR
I think SAM tools and GATK do not use color-space information to detect SNPs/indels. That is one of the advantage of BioScope/LifeScope.
Comment
-
Latest Articles
Collapse
-
by SEQadmin2
Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.
The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
...-
Channel: Articles
06-02-2026, 10:05 AM -
-
by SEQadmin2
With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.
Introduction
Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...-
Channel: Articles
05-22-2026, 06:42 AM -
-
by SEQadmin2
Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.
Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...-
Channel: Articles
05-06-2026, 09:04 AM -
ad_right_rmr
Collapse
News
Collapse
| Topics | Statistics | Last Post | ||
|---|---|---|---|---|
|
Started by SEQadmin2, 06-02-2026, 12:03 PM
|
0 responses
19 views
0 reactions
|
Last Post
by SEQadmin2
06-02-2026, 12:03 PM
|
||
|
Started by SEQadmin2, 06-02-2026, 11:40 AM
|
0 responses
14 views
0 reactions
|
Last Post
by SEQadmin2
06-02-2026, 11:40 AM
|
||
|
Started by SEQadmin2, 05-28-2026, 11:40 AM
|
0 responses
29 views
0 reactions
|
Last Post
by SEQadmin2
05-28-2026, 11:40 AM
|
||
|
Started by SEQadmin2, 05-26-2026, 10:12 AM
|
0 responses
31 views
0 reactions
|
Last Post
by SEQadmin2
05-26-2026, 10:12 AM
|
Comment