Seqanswers Leaderboard Ad

**epigen** · 07-28-2010, 08:13 AM

Your pipeline is almost the same as mine. I used different aligners (SOLiD data) and required Phred score >=20 and additionally 20<= number of reads<= 100 for the SNP position. Also excluded positions with indels and >1/4 at the end of a read.

I read some papers that mentioned the bias for the reference sequence but ended up ignoring it because it seemed there is no easy and reliable way to get rid of it. How bad the bias is was left unclear to me.
The papers are:
Heap et al. 2010, Hum Mol Genet 19:122
Tuch et al. 2010, PLoSOne 0009317
Degner et al. 2009, Bioinformatics 25:3207
Chepelev et al. 2009, Nucl Acids Res 37:e105

Since was interested in alleles that are ~50:50 in the treated sample but biased for one allele in the control, I had rather few candidate SNPs and can't give advice on multiple SNPs per transcript. But I'd be very interested in learning more!

Barbara

**GoneSouth** · 08-25-2010, 04:06 AM

If I understand your pipeline correctly, you will not get the SNPs which are the most differentiated between the two samples.
If you have only allele eg: 'T' in one sample and only allele 'A' in the other?

**epigen** · 08-25-2010, 04:16 AM

As I mentioned, we wanted heterozygous SNPs in the treated sample (e.g. A and T allele present at ~50:50) that were homozygous in the control sample (only A or T allele present). So a total switch from T to A was not interesting for me.

**sparks** · 11-03-2010, 01:41 AM

Hi, Novoalign can align reads against a reference genome with IUB ambiguous NA codes. This should remove any allelic bias, you just need to code SNPs into the genome using the IUB codes.
Colin

**polarise** · 01-15-2011, 03:22 AM

Hi,

I hope it's not too late to comment on this post.

Two things: one, I think there needs to be a threshold or number of reads that align to a particular site for it to be called 'potential ASE'. There is a paper by Fontanillas et al. (2010) in which they show that the number of reads affects the power to detect ASE.

Secondly, I propose that, where possible, there should be reference sequences biased to the population such that there would not be a bias in the Degner- sense (see Degner et a. 2009 paper).

Otherwise, I agree with the pipeline.

P. Korir

**dariober** · 01-17-2011, 05:58 AM

Originally posted by polarise View Post

I hope it's not too late to comment on this post.

No, it's not! Many thanks to everyone for sharing their comments and expertise!

Dario

**fkrueger** · 01-17-2011, 07:45 AM

Since this thread has been revived not too long ago I thought I might ask a question about allele specific alignments as well.

We were thinking of taking a slightly different approach to analyse sequencing data in an allele-specific manner and I would like to know if someone can spot any obvious problems with it.

We are mainly working with mouse models from back-crossed strains for which we have a known set of genome-wide SNPs available, so the approach I'll describe here is probably only useful if detailed genome information is available. To avoid some of the pitfalls Dario mentioned above (especially biasing alignments towards the reference alleles without SNPs) we wrote a program which takes in sequencing file(s) and aligns the sequence (or sequence pair) to two reference genomes in parallel (using Bowtie). In order for this to work we made up individual versions of mouse (or yeast) genomes and indexed them separately (this should also work with e.g. the normal mouse genome and one chromosome of a different strain).

We then analyse the alignment outputs on the fly and decide whether an alignment is specific for one or the other genome (i.e. has a uniquely best alignment for one of them) or if it aligns to both genomes equally well (e.g. if it doesn't contain a SNP which allowing us to say something about the origin). It will then print the alignments to different files (unique 1, unique 2, aligns to both) with a number of useful information (mapping positions, genomic sequences and mismatch information). All downstream analysis would then start from these files (such as percent sequence from which allele, basecall/alignment qualities at positions of SNPs and so on).

In its initial version ASAP (for allele specific alignment pipeline) is intended to be used for ChIP-Seq data, but we were thinking of extending this to RNA-Seq data as well. This of course would have the disadvantage that - at least in its current form - it doesn't work for spliced mapping, however it should still work for the majority of reads which do not span splice junctions anyway.

I would be happy to know if people think this is a useful tool or if you can see major flaws which did not occur to us so far. I should mention that ASAP is not fully ready yet, but single-end alignments are nearly working and the paired-end option is yet to be implemented (overall progress is I'd say around 87%. Roughly).

thanks,
Felix

**epigen** · 01-19-2011, 03:50 AM

candidates not allele-specific

Finally we got the results from the validation of our candidates - and a big disappointment: almost all turned out to be expressed perfectly 50:50 instead of allele-specific!
Since in some papers they only consider reads with different start sites, I next tried duplicate removal with Picard, which left me with less than half of the reads (~20 Mio remaining per sample), and used the new samtools pileup version to call SNPs. This resulted in only a handful of new candidates, which are currently being checked in the lab.
I guess we'll have to sequence more because with 20 Mio reads left, we can only detect highly expressed genes. The problem I see in duplicate removal is that true PCR duplicates can't be distinguished from reads that align to the same position by chance just because the gene/allele is so highly expressed. And duplicate removal will of course keep the read with the highest mapping score, which will be the one with the reference SNP.

Has any of you worked out a successful approach?

@fkrueger:
To me ASAP sounds like a clever idea. Unfortunately it's not applicable for humans (my data). Even if you sequence the individual's DNA first to find the SNPs, you never know which SNP belongs to which chromosome so you can't figure out the correct combinations ...

**metheuse** · 07-17-2015, 12:46 PM

I'm also trying to measure allele-specific-expression by exome-seq and RNA-seq, with the former to find heteroSNPs and alleles, and the latter to calculate ASE on the identified positions. I found this post to be quite helpful and it's just what I was proposing to do.

I also did some method search and found some programs specifically designed for ASE (MMSeq, eXpress, AlleleSeq, etc), but they all require construction of haplotypes (two versions of reference genome/transcriptome), which makes a lot of sense and should be the most accurate way to avoid mapping bias and help integrating SNP-level ASE into gene-level ASE. However in most of the cases, people don't have sequencing data from trio. There are several tools for haplotyping from exome-seq data, like GATK's ReadBackedPhasing and HapCompass, but they can only phase some blocks of the variants which are close to each other and thus were covered by continuous reads.

I got to know an aligner called GSNAP that can align RNA-seq reads with tolerance of SNPs. I'm gonna try this to see how the output is different from Tophat alignment.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 24 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 25 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 21 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

RNAseq: Pipeline to detect allele specific expression

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News