Multi-Genome Alignment for QC...

james hadfield

Moderator
Cambridge, UK
Community Forum

Join Date: Feb 2008

Posts: 224
- Share
- Tweet
#1

Multi-Genome Alignment for QC...

08-17-2010, 07:51 AM

In a previous post on our HiSeq I mentioned that we were running a multi-genome alignment (MGA) as a QC tool. Comments made me think it would be an interesting topic to post in the Bioinformatics section, not one I usually post in!

The work for this was done by Matt Edlridge, our head of bioinformatics. Big thanks to him for doing it!

The MGA takes a sample of sequence reads from a lane and aligns the first 36bp using Bowtie. The sampling allows the MGA to run fast and this is part of our normal data pipeline, we get to see the report in our LIMs alongside the Gerald report (which I think we will soon be ditching entirely).

Of course reads can align to multiple genomes (conserved regions). If this happens we assign the read to the genome with most reads. This approach should show up cases of genome contamination and maximise the difference between first and second genomes in the list.

We also use Exonerate to identify sequences containing Illumina adapters.

Currently we run against: Human, Mouse, Rat, Xenopus, Arabidopsis, C.elegans, Yeast, Bacteria and Viruses (the last two being amalgamations of >1500 genomes each). There are other genomes as well which are specific to the work for projects in our lab, I guess at some level it would be possible to run against all genomes?

The output is a descending list of genomes with the highest number of aligned reads expressed as a percentage. Hopefully the genome the user was expecting! We did have a case about three years ago where one user accidentally sequenced a genome to 80x coverage of an organism that was also growing in his lab. It took a little time to work out what was wrong with his experiment and I believe the data was handed over to that community. Serendipity at its best!
There are often un-aligned reads and the assumption initially was that these were junk low quality reads. Running this kind of aligner might allow us to see if that assumption is true but we have not looked at this at this time.

The reason I wanted this MGA in our pipeline was to see what amount of PhiX was in lanes where we had not actually put it. The assumption was that any sloppy practices in a lab where all flowcells are set up would be obvious in this instance. It was immediately clear that the level of PhiX ‘contamination’ from lane to lane was very low. We identified two or three flowcells where there was a potential issue but this was out of over many hundred. We were also able to get run reports and data from anther large centre nearby and they had similar results. All in all I was very happy with the low contamination from lane to lane and am very happy that the protocols are reasonably robust.

PhiX must be being breathed in as aerosols in labs the word over, might we get some Cronenberg style PhiX-Human hybrid. Let me know if you see one...

Let me know what you think.
Tags: None

Previous template Next

Nine Things a Sample Prep Scientist Thinks About Before Sequencing

by SEQadmin2

I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

Here are nine questions we think about, in roughly the order they matter, before...
- Channel: Articles
Yesterday, 07:11 AM
From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data

by SEQadmin2

Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.

The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
...
- Channel: Articles
06-02-2026, 10:05 AM
Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends

by SEQadmin2

With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.

Introduction

Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
- Channel: Articles
05-22-2026, 06:42 AM

Topics	Statistics	Last Post
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 20 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 38 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 44 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 49 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM

Unconfigured Ad

Multi-Genome Alignment for QC...

Latest Articles

ad_right_rmr

News