Seqanswers Leaderboard Ad

**ffinkernagel** · 02-10-2014, 11:53 AM

You can try an assembly and blasting the results.

Possible it's fish dna from the bead blocking.

**mmaiensc** · 02-18-2014, 09:02 AM

Similar problem

I am having a similar issue for ChIP-seq mouse data (HiSeq, SE, 50 bp). In particular, alignment statistics appear to be very antibody specific: for one protein I get ~20%, for a second ~40%, for a third ~65%, and for non-IP input ~90% (these approx %'s are borne out in two replicates for each sample).

Contamination does not seem to be an issue: fastqc did not show any adaptors left on the reads, and not much in the way of over-represented sequences. I used fastq_screen to check against human, rat, mouse, fly, yeast, c elegans, e coli, staph, and phiX, and the best matches were still to mouse, by far. Blast showed a similar mix of things, as fmadriles noted. At any rate, if it were contamination I would expect to see similar issues in all the samples, rather than depending so strongly on the antibody/protein of interest.

Short of attempted assembly on the unmapped reads, which I may try, does anyone have any other suggestions about what the issue could be, or other things to try? Has anyone else seen this kind of thing in ChIP-seq data before?

**MU Core** · 02-19-2014, 08:54 AM

Possibly chimeric sequences from amplification.

**fmadriles** · 02-24-2014, 04:35 AM

For mmaiensc specially.
So at the end an expert has helped me and done some analyses, and finally concluded:

I did a species screen with the original data and found (as you did) that
most of the sequence comes from human and mouse (more mouse than
human). Most of the reads map uniquely and there is a
bit of overlap between mouse and rat (as you’d expect). There are however
around 35% of reads which don’t map to any of the genomes or contaminants
we screen for. I’ve extracted these to a new dataset and did an assembly
with velvet. It’s not a great assembly since the reads are short, but it
gave some extra information.

I’ve included the set of contigs of at least 100bp and have sorted these
both by coverage and length. All of the high coverage contigs appear to
be human alpha satellite DNA or general AT rich repeats.

For the long contigs, a bunch of these turned out to be rRNA from both
mouse and human so a chunk of your extra sequence comes from these. In
addition there is also a large set of sequences which come from a
bacterial source. However you
don’t appear to have the whole genome present, but more specifically you
have a region of the genome around an integrase gene. This strongly
suggests that either you have a high copy number transgene in your mouse, or it could be that this has contaminated one of
the reagents in your library prep process.

I think this is as far as I can justify taking this analysis. I’ve
included the sequences and contigs I generated if you really want to
pursue this, but I suspect the satellite sequence, the rRNA and the
bacterial DNA should account for a significant chunk of the previously
unknown sequence, and there really isn’t anything else I consistently
found in there. If anything the contamination with really high levels of
human sequence should probably be more of a concern in your case since
this is certainly something we shouldn’t expect to see.

I hope it is useful for other people as well as it has been for me!!

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 20 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

orphan reads - any advice?

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News