Welcome to the New Seqanswers!

Welcome to the new Seqanswers! We'd love your feedback, please post any you have to this topic: New Seqanswers Feedback.
See more
See less

Hiseq 2000 paired-end capture data analysis problem-too many variants!

  • Filter
  • Time
  • Show
Clear All
new posts

  • Hiseq 2000 paired-end capture data analysis problem-too many variants!

    We are trying to analysis a Hiseq 2000 paired-end whole exome capture sequencing data. The quality of the data is very good. We get an average depth of coverage around 120x. The fastq files looks perfect. We used bwa for paired end alignment. Picard to remove duplicates and Samtools for variant calling. The problem we have now is that there are too many SNV and indel variants from this data, around 140,000 SNVs and Indels after filtration (mapping quality>=45, read depth>=10 and standard varFilter in Samtools). I just wonder if somebody else on this board are doing similar data analysis. How many SNV and Indel you got? Can BWA and Samtools be used on Hiseq data? Or if there are some other software we should try? Any other information we should know about Hiseq data?


  • #2
    Have you done verification against dbSNP? Have you filtered down your candidates to just those within known exons after alignment (often times PE reads "splash over" into intronic regions where variation is likely more liberally tolerated)?

    EDIT: I'm actually really curious to hear about how many reads and read length and how many lanes you ran the sample on. We just got our HiSeq2k installed last week and we're running our first samples though it. Details would be fantastic.
    Last edited by Lee Sam; 08-11-2010, 12:29 PM.