Illumina DNA sequence specific strand bias involving orphan reads

Tally

Member

Join Date: Aug 2011

Posts: 12
- Share
- Tweet
#1

Illumina DNA sequence specific strand bias involving orphan reads

06-20-2012, 10:26 PM

I have whole-genome DNA paired-end sequence from the Illumina HiSeq2000. I aligned this to the reference genome with BWA v.5.9 using the default parameters for paired-end.

I have detected an unusual region in the alignment. The region is around 430 base pairs long, has excessive coverage (>40 fold coverage in a genome sample sequenced to ~6x), has an excessive number of orphan reads (~40%), and includes only one known repeat (RNA repeat for around 1/3 of the length of the region). GC content is 54%. Either side of this region is demarcated by partially mapped reads truncated to the same base position (clipped at the start of the read at the 5' end of the region and clipped at the end of the read at the 3' end of the region). These unmapped portions all concur with respect to sequence and BLAT to repeat elements.

Here is what I am puzzled about:
The 5' end of this alignment, as viewed in Samtools tview, shows 100% of the orphans to be mapped to the reverse strand. Of the non-orphan reads, ~70% map to the forward strand. The 3' end of this region shows the opposite trend: 100% of the orphans are mapped to the forward strand, and ~70% of the non-orphans map to the reverse strand. The unmapped pairs of the orphan reads all include repeat sequence (usually simple DNA repeats, some LINE elements).

I can understand that sequence-specific strand bias may exist due to technicalities of the library prep and sequencing process. What I don't understand is why I have a seemingly opposite bias between orphan reads and non-orphan reads.

All comments greatly appreciated.
Tags: None
pmiguel

Senior Member

Join Date: Aug 2008

Posts: 2328
- Share
- Tweet
#2

06-21-2012, 05:26 AM

I would guess that the reads demonstrating this bias do not really belong there and should be in a different (or several) different areas of the genome.

LINEs are frequently 5' truncated. So it may be that a highly truncated LINE insertion allowed and uninterrupted assembly to traverse its entire length, whereas some other, longer insertions could not be fully assembled. This might be just a function of your insert lengths. If they are, for example, 500 bp, then paired ends might be useable by a sophisticated assembly engine to travese a repetitive area maybe 750 bp in length (or a little longer) using the unique sequence of one read of a pair to "anchor" the repetitive sequence of the other (repetitive) read. Then your orphan reads map where they can.

You could try pulling out all the orphan reads and their pair reads and attempt a mini-assembly with just them. Possibly you could generate a a mini-assembly of section of a large element. You might even be able to interatively add read pairs to read the terminii of this element. Lots of possible obstacles, though.

--
Phillip
Comment

Previous template Next

Pathogen Surveillance with Advanced Genomic Tools

by seqadmin

The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
- Channel: Articles
03-24-2025, 11:48 AM
New Genomics Tools and Methods Shared at AGBT 2025

by seqadmin

This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25^th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

The Headliner
The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
- Channel: Articles
03-03-2025, 01:39 PM

Topics	Statistics	Last Post
AI Tool Creates High-Resolution 3D Maps of the Mouse Brain by seqadmin Started by seqadmin, 03-20-2025, 05:03 AM	0 responses 49 views 0 reactions	Last Post by seqadmin 03-20-2025, 05:03 AM
Studying Microbial Gene Transfer with RNA Barcoding by seqadmin Started by seqadmin, 03-19-2025, 07:27 AM	0 responses 57 views 0 reactions	Last Post by seqadmin 03-19-2025, 07:27 AM
Mapping the snoRNAome in Zebrafish to Advance Disease Research by seqadmin Started by seqadmin, 03-18-2025, 12:50 PM	0 responses 49 views 0 reactions	Last Post by seqadmin 03-18-2025, 12:50 PM
TIGR Systems Offer a Compact Alternative to CRISPR for Gene Editing by seqadmin Started by seqadmin, 03-03-2025, 01:15 PM	0 responses 200 views 0 reactions	Last Post by seqadmin 03-03-2025, 01:15 PM

Seqanswers Leaderboard Ad

Illumina DNA sequence specific strand bias involving orphan reads

Comment

Latest Articles

ad_right_rmr

News