Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Shruti Madhiwalla
    Junior Member
    • Sep 2010
    • 4

    Exome Sequencing

    can anyone suggest a pipeline for analysis of exome seq data?
  • NGSfan
    Senior Member
    • Apr 2009
    • 181

    #2
    Originally posted by Shruti Madhiwalla View Post
    can anyone suggest a pipeline for analysis of exome seq data?
    Are you looking for snps/mutations/indels? What kind of data do you want at the end of the analysis?

    Comment

    • Shruti Madhiwalla
      Junior Member
      • Sep 2010
      • 4

      #3
      I am looking for snps in the data.

      Comment

      • svl
        Member
        • Sep 2009
        • 43

        #4
        One possibility:
        1. align with BWA
        2. get variation with SAMTOOLS pileup


        Comment

        • zee
          NGS specialist
          • Apr 2008
          • 249

          #5
          For SNPs and Indels do try Novoalign as it performs quite well in terms of accuracy but is slower than BWA.

          We also have a Novoalign NGS guide doing this basic variant detection pipeline.

          Basically

          1. Align with novoalign
          2. Sort alignments
          3. Merge if you have multiple runs for the same library
          4. Remove PCR duplicates with samtools or Picard
          5. Run the samtools pileup variation caller
          6. Filter

          See the posted link for command line examples

          Comment

          • bpetersen
            Member
            • Mar 2010
            • 20

            #6
            I am also currently trying to find out the best way to handle exome sequencing data (sureselect, sequenced on the solid).
            It seems to me, that most people map against the whole genome, not an exome sequence, probably to reduce false positive SNPs in the end. So my plan for a pipeline looks like this so far:

            1. Align against whole genome (with bioscope in my case)
            2. remove duplicates with picard
            3. SNP calling with bioscope, as well as samtools pileup (to compare results)
            4. filter out the SNPs in the targeted region

            So far I'm not quite sure, of the best way to filter in the last step. I'd be very grateful for some suggestions. :-)

            Comment

            • Jeckow
              Junior Member
              • Sep 2009
              • 4

              #7
              Align with bwa.

              Then i suggest you to use GATK. You can analyse multiple samples at once and it gives a robust set of calls and allows for a whole-exome specific pipeline..Cool!

              Once GATK analysis is completed, you can annotate called variations by annovar.

              That's all!

              Comment

              • bioinfosm
                Senior Member
                • Jan 2008
                • 483

                #8
                We see some decent coverage in non-target regions. Has anyone looked at that? Probably its good data, when one sees more than 10x coverage of coding regions, even when not targeted by the capture kit. However, they could certainly be false!

                @jeckow, whats your experience with annovar, could you comment on its usage, time to run, efficiency, etc..
                --
                bioinfosm

                Comment

                • NGSfan
                  Senior Member
                  • Apr 2009
                  • 181

                  #9
                  regarding "out of target" sequence freebies... becareful with pseudogenes and paralogs... the capture kits will pull down things that can cross-hybridize with your baits

                  this can also mess up your intended targets...

                  anyone come up with a way to handle these in an automatic fashion? to ignore genes affected by pseudogenes / paralogs?

                  Comment

                  • hrajasim
                    Member
                    • Aug 2009
                    • 27

                    #10
                    I am trying to understand the exome-capture dataset we got for human and mouse (separate projects). To begin with, I am interested in estimating:

                    1. How much of the exome is covered by at least 1 (or N) base(s). (breadth of exome coverage)
                    2. What is the depth at which each exon is covered. (Depth of exome coverage)

                    Anybody has done this kind of analysis?
                    Please suggest if there are tools that I could use for this purpose.
                    Harsha

                    Comment

                    • nhansen
                      Junior Member
                      • Sep 2009
                      • 6

                      #11
                      This information is easy to get using samtools on a BAM file produced by any decent aligner (like bwa). The "samtools mpileup" function will report the number of reads covering each reference position (depth), and you can simply use awk and bed tools to generate your metrics.

                      I would caution you against using a minimum of 1 base coverage as your metric for coverage. For diploid sequences, one read is completely useless. For coverage, we count bases at which we can call a genotype with >99.9% confidence. This usually works out to be somewhere in the 10-20x range, depending on the bases seen.

                      Hope that helps!
                      --Nancy

                      Comment

                      • hrajasim
                        Member
                        • Aug 2009
                        • 27

                        #12
                        Estimating breadth and depth of coverage

                        AFter trying a couple different approaches, I concluded that using the coverageBed script in BEDTools is the easiest way to determine the breadth and depth of coverage.

                        coverageBed -abam reads.bam b exons.bed -hist >result.txt

                        When the run completes, you need to look at the end of the result.txt file for the 2 column data to plot the histogram.

                        Comment

                        • bioinfosm
                          Senior Member
                          • Jan 2008
                          • 483

                          #13
                          hey thats great!

                          From bedtools "New "per base depth feature" (-d) added to coverageBed. This reports the per base coverage (1-based) of each feature in file B based on the coverage of features found in file A. For example, this could report the per-base depth of sequencing reads (-a) across each capture target (-b). "

                          Guess will have to try it out to see what it really looks like!
                          --
                          bioinfosm

                          Comment

                          • ketan_bnf
                            Member
                            • Oct 2010
                            • 59

                            #14
                            Hi!

                            I am also working on Exome annotation. I have 454 sequencing data, so i am planning to use MOSAIK for aligning reads to ref. chromosome and then extract data from aligned_sorted file to BAM/SAM format, going to analyze with samtools.

                            Is this the right pipeline for Exome annotation SNP finding of 454 data?
                            Does bwa supports long reads (>=200) of 454 seq data? can it handle that?

                            Thanks,

                            Comment

                            Latest Articles

                            Collapse

                            • SEQadmin2
                              From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                              by SEQadmin2


                              Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                              The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                              ...
                              Yesterday, 10:05 AM
                            • SEQadmin2
                              Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                              by SEQadmin2


                              With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                              Introduction

                              Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                              05-22-2026, 06:42 AM
                            • SEQadmin2
                              Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                              by SEQadmin2

                              Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                              Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                              05-06-2026, 09:04 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by SEQadmin2, Yesterday, 12:03 PM
                            0 responses
                            19 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, Yesterday, 11:40 AM
                            0 responses
                            14 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, 05-28-2026, 11:40 AM
                            0 responses
                            29 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, 05-26-2026, 10:12 AM
                            0 responses
                            31 views
                            0 reactions
                            Last Post SEQadmin2  
                            Working...