can anyone suggest a pipeline for analysis of exome seq data?
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
One possibility:
1. align with BWA
2. get variation with SAMTOOLS pileup
Comment
-
For SNPs and Indels do try Novoalign as it performs quite well in terms of accuracy but is slower than BWA.
We also have a Novoalign NGS guide doing this basic variant detection pipeline.
Basically
1. Align with novoalign
2. Sort alignments
3. Merge if you have multiple runs for the same library
4. Remove PCR duplicates with samtools or Picard
5. Run the samtools pileup variation caller
6. Filter
See the posted link for command line examples
Comment
-
I am also currently trying to find out the best way to handle exome sequencing data (sureselect, sequenced on the solid).
It seems to me, that most people map against the whole genome, not an exome sequence, probably to reduce false positive SNPs in the end. So my plan for a pipeline looks like this so far:
1. Align against whole genome (with bioscope in my case)
2. remove duplicates with picard
3. SNP calling with bioscope, as well as samtools pileup (to compare results)
4. filter out the SNPs in the targeted region
So far I'm not quite sure, of the best way to filter in the last step. I'd be very grateful for some suggestions. :-)
Comment
-
-
We see some decent coverage in non-target regions. Has anyone looked at that? Probably its good data, when one sees more than 10x coverage of coding regions, even when not targeted by the capture kit. However, they could certainly be false!
@jeckow, whats your experience with annovar, could you comment on its usage, time to run, efficiency, etc..--
bioinfosm
Comment
-
regarding "out of target" sequence freebies... becareful with pseudogenes and paralogs... the capture kits will pull down things that can cross-hybridize with your baits
this can also mess up your intended targets...
anyone come up with a way to handle these in an automatic fashion? to ignore genes affected by pseudogenes / paralogs?
Comment
-
I am trying to understand the exome-capture dataset we got for human and mouse (separate projects). To begin with, I am interested in estimating:
1. How much of the exome is covered by at least 1 (or N) base(s). (breadth of exome coverage)
2. What is the depth at which each exon is covered. (Depth of exome coverage)
Anybody has done this kind of analysis?
Please suggest if there are tools that I could use for this purpose.
Harsha
Comment
-
This information is easy to get using samtools on a BAM file produced by any decent aligner (like bwa). The "samtools mpileup" function will report the number of reads covering each reference position (depth), and you can simply use awk and bed tools to generate your metrics.
I would caution you against using a minimum of 1 base coverage as your metric for coverage. For diploid sequences, one read is completely useless. For coverage, we count bases at which we can call a genotype with >99.9% confidence. This usually works out to be somewhere in the 10-20x range, depending on the bases seen.
Hope that helps!
--Nancy
Comment
-
Estimating breadth and depth of coverage
AFter trying a couple different approaches, I concluded that using the coverageBed script in BEDTools is the easiest way to determine the breadth and depth of coverage.
coverageBed -abam reads.bam b exons.bed -hist >result.txt
When the run completes, you need to look at the end of the result.txt file for the 2 column data to plot the histogram.
Comment
-
hey thats great!
From bedtools "New "per base depth feature" (-d) added to coverageBed. This reports the per base coverage (1-based) of each feature in file B based on the coverage of features found in file A. For example, this could report the per-base depth of sequencing reads (-a) across each capture target (-b). "
Guess will have to try it out to see what it really looks like!--
bioinfosm
Comment
-
Hi!
I am also working on Exome annotation. I have 454 sequencing data, so i am planning to use MOSAIK for aligning reads to ref. chromosome and then extract data from aligned_sorted file to BAM/SAM format, going to analyze with samtools.
Is this the right pipeline for Exome annotation SNP finding of 454 data?
Does bwa supports long reads (>=200) of 454 seq data? can it handle that?
Thanks,
Comment
Latest Articles
Collapse
-
by seqadmin
During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.
Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...-
Channel: Articles
Yesterday, 10:59 AM -
-
by seqadmin
The first FDA-approved CRISPR-based therapy marked the transition of therapeutic gene editing from a dream to reality1. CRISPR technologies have streamlined gene editing, and CRISPR screens have become an important approach for identifying genes involved in disease processes2. This technique introduces targeted mutations across numerous genes, enabling large-scale identification of gene functions, interactions, and pathways3. Identifying the full range...-
Channel: Articles
08-27-2024, 04:44 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 09-06-2024, 08:02 AM
|
0 responses
138 views
0 likes
|
Last Post
by seqadmin
09-06-2024, 08:02 AM
|
||
Started by seqadmin, 09-03-2024, 08:30 AM
|
0 responses
141 views
0 likes
|
Last Post
by seqadmin
09-03-2024, 08:30 AM
|
||
Started by seqadmin, 08-27-2024, 04:40 AM
|
0 responses
152 views
0 likes
|
Last Post
by seqadmin
08-27-2024, 04:40 AM
|
||
New Single-Molecule Sequencing Platform Introduces Advanced Features for High-Throughput Genomics
by seqadmin
Started by seqadmin, 08-22-2024, 05:00 AM
|
0 responses
395 views
0 likes
|
Last Post
by seqadmin
08-22-2024, 05:00 AM
|
Comment