Seqanswers Leaderboard Ad

**winsettz** · 10-15-2013, 11:32 AM

All the annotations are indeed with respect to the read in question.

However, knowing something about the mate can be important. For instance, if you have a pair of reads with one read mapped and the mate unmapped, it may point to things like viral insertion (for instance, your reference is E. coli and a virus has inserted itself into the host genome, and one read maps to reference and mate does not). Or if you want to work with "clean" reads that map properly to reference, you would filter out the reads that didn't have a situation where one read mapped and the other did not. Or if you were mapping fastq reads to reference to eliminate contaminating reads of a reference, then you would take the reads that had a pair that didn't map.

Read reverse-strand and mate-reverse strand. For example, if you are working with DNA regions that have inverted (DNA break repair?), instead of a standard paired-end

->-------
-------<-

You might have
R1
->-------
------->-
R2
If you are R1, your mate is /not/ on the reverse strand (it actually looks like the forward strand of the reference, since your sample had the inversion). Or if you are R2, /you/ are /not/ on the reverse strand. Perhaps it is more appropriate to say "read /looks like reverse strand of reference/" or "read's mate /looks like reverse strand of reference"

Edit: "First in pair" is usually called "R1", "Second in pair is your second read, often called "R2". You can double-check by parsing a SAM file as follows

Code:

samtools view -bS -f 64 sam.sam | bamtools convert -format fastq -in - -out sam_f64.fastq

Explanation: samtools view to get reads that have the 64 bit (everything that is R1). Convert to bam, feed to bamtools, convert to fastq.

Code:

samtools view -bS -f 128 sam.sam | bamtools convert -format fastq -in - -out sam_f128.fastq

Explanation: samtools view to get reads that have the 128 bit (everything that is R2). Convert to bam, feed to bamtools, convert to fastq.

You can run head your fastq's and double check that the flags do sort read one and read two out separately just to make sure. If your fastq's are coming out in the style with \1 and \2, then you grep for the flag you don't want to see.

Edit: An older thread of mine where I ask for help on the subject, and receive few answers.

Parsing reads from SAM/BAM by orientation - SEQanswers

http://seqanswers.com/forums/showthread.php?t=33873

Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

Edit 2: What are you trying to do with your reads? Is the reference "bad" and you want the reads that don't map to it? Is the reference "good" and you want reads that map to it? Are your data mate-pair/jumping-read with outtie orientation (in which case, default mappers can sometimes flag it as improper, especially if they assume innie is proper)

**prs321** · 10-17-2013, 10:15 AM

Originally posted by winsettz View Post

All the annotations are indeed with respect to the read in question.

However, knowing something about the mate can be important. For instance, if you have a pair of reads with one read mapped and the mate unmapped, it may point to things like viral insertion (for instance, your reference is E. coli and a virus has inserted itself into the host genome, and one read maps to reference and mate does not). Or if you want to work with "clean" reads that map properly to reference, you would filter out the reads that didn't have a situation where one read mapped and the other did not. Or if you were mapping fastq reads to reference to eliminate contaminating reads of a reference, then you would take the reads that had a pair that didn't map.

Read reverse-strand and mate-reverse strand. For example, if you are working with DNA regions that have inverted (DNA break repair?), instead of a standard paired-end

->-------
-------<-

You might have
R1
->-------
------->-
R2
If you are R1, your mate is /not/ on the reverse strand (it actually looks like the forward strand of the reference, since your sample had the inversion). Or if you are R2, /you/ are /not/ on the reverse strand. Perhaps it is more appropriate to say "read /looks like reverse strand of reference/" or "read's mate /looks like reverse strand of reference"

Edit: "First in pair" is usually called "R1", "Second in pair is your second read, often called "R2". You can double-check by parsing a SAM file as follows

Code:

samtools view -bS -f 64 sam.sam | bamtools convert -format fastq -in - -out sam_f64.fastq

Explanation: samtools view to get reads that have the 64 bit (everything that is R1). Convert to bam, feed to bamtools, convert to fastq.

Code:

samtools view -bS -f 128 sam.sam | bamtools convert -format fastq -in - -out sam_f128.fastq

Explanation: samtools view to get reads that have the 128 bit (everything that is R2). Convert to bam, feed to bamtools, convert to fastq.

You can run head your fastq's and double check that the flags do sort read one and read two out separately just to make sure. If your fastq's are coming out in the style with \1 and \2, then you grep for the flag you don't want to see.

Edit: An older thread of mine where I ask for help on the subject, and receive few answers.

Parsing reads from SAM/BAM by orientation - SEQanswers

http://seqanswers.com/forums/showthread.php?t=33873

Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

Edit 2: What are you trying to do with your reads? Is the reference "bad" and you want the reads that don't map to it? Is the reference "good" and you want reads that map to it? Are your data mate-pair/jumping-read with outtie orientation (in which case, default mappers can sometimes flag it as improper, especially if they assume innie is proper)

I am just trying to see how well the mapping results were in comparison to raw reads, reads processed via cutadapt, and reads processed via Scythe.

I'm not too sure how good the assembly is. I've been told that it could be better.

What is innie and outtie?

**winsettz** · 10-17-2013, 10:31 AM

Innie and outie refer to the orientation of the reads. In standard paired end, the DNA is sheared and read from 5' to 3' of each strand. Since 5' is outside on both strands, the direction of the reads is towards the middle (-> <-).

With mate-pair a long section of DNA is taken, circularized, then the area near the join is excised, such that the information in the read represents long distance reads across genomic space. But due to the nature of the cut, the reads are actually pointing outwards.

5'start-----------------------------end3'
circle------end3'5start------circle
snip

R1>----end3'5'start----<R2

Which when compared to actual genome is
5'start-----<R2---------------R1>-------end3'

If you're assessing map qualities, samtools flagstat is your friend.

Code:

samtools flagstat mybam.bam

Example output

Code:

in total (QC-passed reads + QC-failed reads)	18808442
duplicates	0
mapped	345583
paired in sequencing	18808442
read1	9404221
read2	9404221
properly paired	329846
with itself and mate mapped	332866
singletons	12717
w mate mapped to different chr	636
w mate mapped to different chr (mapQ >= 5)	188

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 31 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 33 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Very confused about the FLAG in SAM files

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News