Thanks, Brian. This is where I am showing my ignorance I am sure, but how did the reads become so short? Looking at what I pulled out of the sam file, they are full-length (300bp) reads for the first few matches, but then become those little buggers are well.
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
BWA-mem produces 'chimeric alignments'. This is actually a really neat feature in some cases, and a big pain in other cases - in my opinion, it should be disabled by default.
If you look at the sam lines you posted, most of them have a bitflag (the second column) of over 2048. That indicates they are chimeric. BWA-mem appears to do multiple local alignments on reads, such that if there is a really good match for the first 20% somewhere, that will be presented as a single line in the sam file, and if there is a really good match for the middle 40%, that will be displayed as a different line, etc. So a single read could generate a huge number of lines in the sam file. The goal is to correctly map reads that are chimeric (such as reads from a cancer sample with two chromosomes randomly fused together). But apparently, it does not work well in extreme-GC genomes; most mappers are designed for human and mouse genomes, which have approximately 50% GC, as they constitute the majority of genetic research. But since I work at a place that strictly deals with microbial, plant, and fungal genomes, BBMap (which was originally designed for human) is now developed for and tested on a much wider array of organisms than most.
BWA's chimeric alignments are local and hard-clipped. For example, this cigar string from the second line you posted - "221H79M" - means that the first 221 bases were ignored and only the last 79 bases are included in the alignment. Of course, this will wreak havoc with something like fastqc, where all reads are weighted equally regardless of length. Rather than a length filter (which will unnecessarily exclude reads that had been adapter- or quality-trimmed), I think you should simply use samtools to filter out reads with the chimeric flag marked.Last edited by Brian Bushnell; 08-02-2014, 12:00 PM.
Comment
-
Originally posted by Genomics101 View PostThanks very much, GenoMax. Indeed, it is MiSeq data, but I never had this problem with MiSeq before (that was with 250bp PE reads, these are 300s). Can you tell me more the particular pathology with MiSeq? Is this a problem with library construction? And, goodness, what is an adapter lawn?
Comment
Latest Articles
Collapse
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Today, 08:06 AM
|
0 responses
11 views
0 likes
|
Last Post
by seqadmin
Today, 08:06 AM
|
||
Started by seqadmin, 04-30-2024, 12:17 PM
|
0 responses
13 views
0 likes
|
Last Post
by seqadmin
04-30-2024, 12:17 PM
|
||
Started by seqadmin, 04-29-2024, 10:49 AM
|
0 responses
19 views
0 likes
|
Last Post
by seqadmin
04-29-2024, 10:49 AM
|
||
Started by seqadmin, 04-25-2024, 11:49 AM
|
0 responses
26 views
0 likes
|
Last Post
by seqadmin
04-25-2024, 11:49 AM
|
Comment