Seqanswers Leaderboard Ad

**westerman** · 11-19-2008, 11:11 AM

Alex:

I just finished this type of analysis with a 454 Titanium run on E. coli only I started with a de-novo assembly of the reads and then mapped them to the E. coli genome instead of using the Mapper and then assembling the remaining reads like you did. (Although I did also run the Mapper as a separate trial.) The statistics from the de-novo assembly:

There are 651 "large" (>= 500 bp) contigs.

Of these a whopping 545 do not match E. coli W3110. However none of these non-matching contigs are very long -- ranging from 500 to 2988 bp. As a comparison the 106 matching contigs tend be long and range from 531 bp to 222,307 bp.

So it is obvious that the non-matching contigs are not very good. Never-the-less it is curious as to what the non-matching contigs do match.

Of the 545 contigs:

36 do not significantly match anything in genbank.

137 match many entries in genbank.

348 match Bacillus licheniformis genomes.

3 match B. licheniformis plasmid

9 match P. flourescens.

2 match K. pneumoniae

The remaining 10 I did not bother to characterize since they did not hit the same genbank entries.

---------------------------------------

So what conclusions can be, tentatively, drawn?

A) We did not have wholesale contamination otherwise the non-matching-to-Ecoli contigs would have been long.

B) Perhaps E. coli is picking up strands of DNA from its environment?

C) Perhaps the environment of strands of DNA is getting into our experiment? Due to a poor laboratory sterile technique. Perhaps due to DNA being stuck on new or reused equipment.

I suspect that NextGen sequencers will uncover a lot of this low-level contamination. We are dealing with so many reads that, in my mind, it seems like some will arise from external sources.

As to your particular case, you mentioned that your case (B) you were able to map the contigs back to your human reference sequence but that the contigs were looking strange. It is possible that you are finding traces of human contamination. Either the cells being sequenced had trace rogue DNA in them or in the handling trace DNA 'fell in' to the prep. It is an idea.

I am looking forward to analyzing our next titanium run.

**Chuckytah** · 03-25-2011, 10:32 AM

Originally posted by westerman View Post

Alex:

I just finished this type of analysis with a 454 Titanium run on E. coli only I started with a de-novo assembly of the reads and then mapped them to the E. coli genome instead of using the Mapper and then assembling the remaining reads like you did. (Although I did also run the Mapper as a separate trial.) The statistics from the de-novo assembly:

There are 651 "large" (>= 500 bp) contigs.

Of these a whopping 545 do not match E. coli W3110. However none of these non-matching contigs are very long -- ranging from 500 to 2988 bp. As a comparison the 106 matching contigs tend be long and range from 531 bp to 222,307 bp.

So it is obvious that the non-matching contigs are not very good. Never-the-less it is curious as to what the non-matching contigs do match.

Of the 545 contigs:

36 do not significantly match anything in genbank.

137 match many entries in genbank.

348 match Bacillus licheniformis genomes.

3 match B. licheniformis plasmid

9 match P. flourescens.

2 match K. pneumoniae

The remaining 10 I did not bother to characterize since they did not hit the same genbank entries.

---------------------------------------

So what conclusions can be, tentatively, drawn?

A) We did not have wholesale contamination otherwise the non-matching-to-Ecoli contigs would have been long.

B) Perhaps E. coli is picking up strands of DNA from its environment?

C) Perhaps the environment of strands of DNA is getting into our experiment? Due to a poor laboratory sterile technique. Perhaps due to DNA being stuck on new or reused equipment.

I suspect that NextGen sequencers will uncover a lot of this low-level contamination. We are dealing with so many reads that, in my mind, it seems like some will arise from external sources.

As to your particular case, you mentioned that your case (B) you were able to map the contigs back to your human reference sequence but that the contigs were looking strange. It is possible that you are finding traces of human contamination. Either the cells being sequenced had trace rogue DNA in them or in the handling trace DNA 'fell in' to the prep. It is an idea.

I am looking forward to analyzing our next titanium run.

What program/software did you use to obtain those statistics?
thanks

**westerman** · 03-25-2011, 11:08 AM

Originally posted by Chuckytah View Post

What program/software did you use to obtain those statistics?
thanks

Hum, making me think about project done over 2 years ago. That is forever in NGS time! I can not remember exactly but I probably used blast to get the statistics. E. coli is small enough that blasting the contigs to it would not be onerous.

**Chuckytah** · 03-25-2011, 12:54 PM

Originally posted by westerman View Post

Hum, making me think about project done over 2 years ago. That is forever in NGS time! I can not remember exactly but I probably used blast to get the statistics. E. coli is small enough that blasting the contigs to it would not be onerous.

sorry i didn't saw the dates lol
ty anyway

**Jeremy** · 03-28-2011, 02:57 AM

Originally posted by Alex Clop View Post

It is worth to mention that i) most of those contigs in (B) scenario are around 200 – 500 bp and none exceed 1300 bp ii) whilst the coverage in CONS is around 80 fold, the coverage of the ctg is for most of them between 2 and 3 and few of them exceed 10 fold coverage.

Has anyone I would appreciate if i) anyone that has observed these kind of reads / contigs in their 454 analysis could let me know.
Alex

The process of ligating adapters to the DNA fragments also produces chimeric sequences where two DNA fragments ligate together. The ratio of primers to DNA is designed to limit this but it does happen. More often than not it will be repetitive DNA that ligates. If these chimeric sequences then get the correct primers on each end they will amplify in the subsequent PCR steps producing more copies. That's why the sequences you describe have low sequence coverage and behave like paired end tags - they are an artefact of the ligation process.

Topics	Statistics	Last Post
Bacterial Timeline Study Suggests Oxygen Use Preceded Photosynthesis by seqadmin Started by seqadmin, Yesterday, 12:59 PM	0 responses 7 views 0 reactions	Last Post by seqadmin Yesterday, 12:59 PM
New Software Simplifies 3D Gene Expression Mapping by seqadmin Started by seqadmin, 04-02-2025, 10:17 AM	0 responses 9 views 0 reactions	Last Post by seqadmin 04-02-2025, 10:17 AM
AI Tool Creates High-Resolution 3D Maps of the Mouse Brain by seqadmin Started by seqadmin, 03-20-2025, 05:03 AM	0 responses 49 views 0 reactions	Last Post by seqadmin 03-20-2025, 05:03 AM
Studying Microbial Gene Transfer with RNA Barcoding by seqadmin Started by seqadmin, 03-19-2025, 07:27 AM	0 responses 60 views 0 reactions	Last Post by seqadmin 03-19-2025, 07:27 AM

Seqanswers Leaderboard Ad

454: Unmapped contigs after Reference Assembly

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News