Unconfigured Ad

**Bukowski** · 12-07-2011, 08:35 AM

Originally posted by mrfox View Post

Hi All,

My collaborators are interested in detecting SNPs in some cancer samples. Exome sequencing seems to be a good start but we have not much knowledge about exome seq and analysis. It will be appreciated if you could give some advice on the following questions:

1) Shall we use Illumina or SOLiD platform? We would like to use the one with better sequencing QUALITY.
2) What is the appropriate read length we shall use? The larger the better?
3) I am not sure if paired-end information is useful for SNP detection but I guess we had better use paired-end.
4)Could you recommend a good software if we want to identify potential SVs using the exome seq data?

Thank you very much.

1) I don't think it matters, but more tools are supported and more people use Illumina
2) Yes, we do 100bp PE on a HiSeq2000 for instance
3) Yes, but you will find it more useful for detecting indels. Lots of tools will expect paired-end data and there is no reason not to use it.
4) Samtools or GATK after alignment are both popular tools for calling SNPs. SNVmix might be more appropriate for cancer samples. Annovar or SnpEff or Ensembl's VEP for annotation.

Consider doing a paired/normal study if possible.

**mrfox** · 12-07-2011, 08:42 AM

Thank you for your advice, Bukowski! One more question, if we perform CNV using the Exome Seq, what tool do you recommend? I know it is more challenging to do CNV only using Exome seq, compared to using whole genome data.

**Bukowski** · 12-07-2011, 08:43 AM

I Would probably be looking at ExomeCNV for that:

CRAN: Package ExomeCNV

http://cran.r-project.org/web/packages/ExomeCNV/index.html

And I'm pretty sure that will require paired/normal data, but check.

**Jayu** · 01-22-2012, 08:44 AM

Can anyone tell me the pipeline for exome sequencing data analysis?

**gringer** · 01-23-2012, 12:44 AM

Originally posted by Jayu View Post

Can anyone tell me the pipeline for exome sequencing data analysis?

That depends on how you want to do the analysis.

Depending on how paranoid or pedantic you are, you can do a readjustment of read sequences based on the original intensity data. After that, you can do some pre-filtering or trimming of reads to exclude unlikely sequences.

Your happiness with the current exon boundary annotation of your genome will determine if you can go straight to mapping, or if there needs to be some sort of assisted (or possibly de-novo) assembly first.

If you care about isoforms, you will need to use a tool that can identify and distinguish different isoforms and estimate isoform proportions. This may be better achieved with a genome mapping with something that can split reads with very large gaps (something like Tophat). Otherwise you could map to the transcriptome, bearing in mind that isoform identification is much more difficult in that case.

Once you have reads (or estimated reads), they need to be normalised to account for sampling variation and other types of random and systematic error. After that you can finally get around to the actual data analysis, which will generally be up to the researcher.

**Bukowski** · 01-23-2012, 01:18 AM

Originally posted by gringer View Post

That depends on how you want to do the analysis.

Depending on how paranoid or pedantic you are, you can do a readjustment of read sequences based on the original intensity data. After that, you can do some pre-filtering or trimming of reads to exclude unlikely sequences.

Your happiness with the current exon boundary annotation of your genome will determine if you can go straight to mapping, or if there needs to be some sort of assisted (or possibly de-novo) assembly first.

If you care about isoforms, you will need to use a tool that can identify and distinguish different isoforms and estimate isoform proportions. This may be better achieved with a genome mapping with something that can split reads with very large gaps (something like Tophat). Otherwise you could map to the transcriptome, bearing in mind that isoform identification is much more difficult in that case.

Once you have reads (or estimated reads), they need to be normalised to account for sampling variation and other types of random and systematic error. After that you can finally get around to the actual data analysis, which will generally be up to the researcher.

That sounds an awful lot like a recipe for RNA-Seq analysis not exome analysis. The poster (who shouldn't be tacking questions on to other people's threads) might be interested in http://seqanswers.com/wiki/How-to/exome_analysis

**gringer** · 01-23-2012, 01:25 AM

Originally posted by Bukowski View Post

That sounds an awful lot like a recipe for RNA-Seq analysis not exome analysis.

Er, yes. Sorry, I got a little carried away there....

**Aman Mahajan** · 01-23-2012, 01:30 AM

I have a question not related to the thread though..

I assembled my illumina data using SOAP, now I want to carry out expression analysis using Rseq tool. it accepts only SAM format so I downloaded SAMTOOLS to convert my soap output to SAM. Can anyone tell me how to run it and convert, tutorial has been of no use yet!

**gringer** · 01-23-2012, 01:32 AM

I have a question not related to the thread though..

This was just recently posted in this thread:

The poster (who shouldn't be tacking questions on to other people's threads)

Please try to do what this comment suggests and start new threads for unrelated questions. It makes searching the forums much easier for other future browsers of questions and answers.

**Aman Mahajan** · 01-23-2012, 01:36 AM

This is actually my 1st post, can't figure out how to start a new thread. I'll try and post it there . Thanks

if this has been answered before kindly pass me on the link to the thread.

**gringer** · 01-23-2012, 01:45 AM

This is actually my 1st post, can't figure out how to start a new thread

From the SEQAnswers home page, click on the red 'Forums' link at the left, then click on the forum name, then click on the 'New Thread' button. You can also click on the link at the top of a thread page (SEQanswers > Bioinformatics > Bioinformatics) to go to the forum page.

Topics	Statistics	Last Post
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, Yesterday, 10:09 AM	0 responses 9 views 0 reactions	Last Post by SEQadmin2 Yesterday, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 17 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, 06-02-2026, 12:03 PM	0 responses 26 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 12:03 PM
DNA Methylation Study Reveals How Epigenetic Changes Pass Between Generations by SEQadmin2 Started by SEQadmin2, 06-02-2026, 11:40 AM	0 responses 21 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 11:40 AM

Unconfigured Ad

Exome sequencing: Illumina? SOLiD? Read length? Pair-Ended?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News