Mate Pair orientation in illumina

Originally posted by mastal View Post

Do you mean paired-end or mate pair? For Illumina technology, the orientation of the two reads relative to each other is different for paired end and mate pair.

The read identifiers in your sam file don't look like typical Illumina read IDs.

Anyways, if your reads are paired end Illumina reads, it is just random whether the reads in file /1 align to the + strand or the - strand, some of the /1 reads will align to one strand, and some to the other strand.
The reads in file /2 will align to the opposite strand from the paired read in file /1.

I was told Mate Pair (i received also written instructions).
Well... if there's no relation between /1 /2 and orientation, i really wonder how i can decide which orientation my mate pairs have...
About the reads, i can't exclude that they have been sequenced artifically (with a simulation on sample genome).
Thanks for the reply

Hi everyone,
i'm working for a university project about "resequencing" of a small genome (the reference genome is laidlawii).
I have the reference genome and two fastq files containing the reads of an illumina mate-pair library from a target genome.
Going to the point: i have a problem when i'm asked to generate a track for IGV representing the "percentage of oriented mates", i simply can't understand which read in each pair is the left and which is the right one.
Each read in the two fastq files has an id and is also marked with tag /1 /2: in one file i have all the /1 and in the other one i have all the /2.
Now the question is if there is a strong relation between the tag and the fact that the read is the left or right.
For aligment i use PASS (pass.cribi.unipd.it) that outputs a sam file with different informations among which the reverse complemented alignmen (flag bit 0x10 setted).
In almost every pair one read is aligned l->r while the other is reverse complemented aligned (maybe illumina sequences the borders from different strands?).

Making it easier: can i say that every mate-pair with /1 aligning left->right and /2 aligning reversed complemented, is left->right oriented on the reference?
And in the opposite case, then the mate pair aligns reversed on the reference?
(the assumption to prove is that /1 is always the left (or right) mate)

Thank you,
i hope i've made myself understood (english is not my first language and i'm a poor informatician :P)

edit: to be exhaustive as much as possible, here a situation that make me me crazy:

Code:

sq_1607_4547_0_1_0_0_0:0:1_0:0:0_3f6b5	83	Chromosome	4547	50	50M	=	1607	-2990	GACTACATCGGTTCCGGAGGGGAAACGAAGTATTTTTTATATGAGCATAA	
sq_1607_4547_0_1_0_0_0:0:1_0:0:0_3f6b5	163	Chromosome	1607	49	5M1D45M	=	4547	2990	ACTCGTTGTCAAAAAAATAGATTCACCATTATTAAAGTGATAAATGTTTA	
sq_1610_3842_0_1_0_0_0:0:1_0:0:0_6dbfe	83	Chromosome	3842	50	50M	=	1612	-2280	ATACCCGGATACAGCAAAAATCATACCTGTTAATTTTCCTACTGTCATTA	
sq_1610_3842_0_1_0_0_0:0:1_0:0:0_6dbfe	163	Chromosome	1612	49	49M	=	3842	2280	GTTGTCAAAAAAATAGATTCACCATTATTAAAGTGATAAATGTTTATAA
sq_1611_220_1_0_0_0_0:0:1_0:0:1_2059e	99	Chromosome	220	49	9M1D41M	=	1612	1442	TAATAAATTGTCGTTTCTTATGCTATCATAGTTTTACATAAATTATTAAC	
sq_1611_220_1_0_0_0_0:0:1_0:0:1_2059e	147	Chromosome	1612	50	50M	=	220	-1442	GTTGTCAAAAAAATAGATTCACCATTATTAAAGTGATAAATGTTTATAAA	
sq_1611_4420_0_1_0_0_0:0:1_0:0:0_35f5b	83	Chromosome	4420	50	50M	=	1612	-2858	AAGCGTTAAAAAGTGCGCTTTTTTACTTATATTATGTTATAATATAATAG	
sq_1611_4420_0_1_0_0_0:0:1_0:0:0_35f5b	163	Chromosome	1612	50	50M	=	4420	2858	GTTGTCAAAAAAATAGATTCACCATTATTAAAGTGATAAATGTTTATAAA
sq_1617_4456_0_1_0_0_0:0:0_0:0:0_3e90d	83	Chromosome	4456	50	50M	=	1617	-2889	TTATAATATAATAGGTAGGTGAATGAAGCGTATGAATCATTTTGAGTTAG	
sq_1617_4456_0_1_0_0_0:0:0_0:0:0_3e90d	163	Chromosome	1617	50	50M	=	4456	2889	CAAAAAAATAGATTCACCATTATTAAAGTGATAAATGTTTATAAAAATGA

This is the output of PASS aligner, with mates ordered by id.
You see... the first 2 mates pairs align so that the "first segment in the template is reversed complemented" (flag = 83 with bit 5 and 7 setted according to sam specs) and the "second segment is forward aligned" (flag = 163). And this is the case of the hundreds of pairs preceding that point, so for the first 1612 bases i have /2 forward aligned and /1 reversed complemented.
Then the third mate pairs in the example is different. flag = 99 means that "this is the first segment and is forward aligned" and flag = 147 means "this is the second segment and is reversed complemented". So in this case /2 is reversed and /1 is forward.
After that, all returns normal...
This example make me think that there's no strong relation between the /1 /2 indication and the fact that a read is left or right.
In fact if it was like that, how can i explain that i have /2 of the second mate pair aligning forward on position 1612 and /1 of third mate pair aligning reversed in the same position?
The only possible case is that i have another region of my genome with the same code reversed, but in this case i'd have multiple reads, and this is not the case (the reads are id sorted so i should notice...).
An example of multiple read is this:

Code:

sq_76677_74195_1_0_0_0_0:1:0_0:0:0_61c68	99	Chromosome	73696	50	50M	=	76178	2532	ATTTATCGGTTTAAGAGGGGTCTGCGGCGCATTAGTTAGTTGGTGGGGTA
sq_76677_74195_1_0_0_0_0:1:0_0:0:0_61c68	147	Chromosome	76178	49	50M	=	73696	-2532	AATATATGCTAAGTGGAAACGGAAGTAGAGATGCACAAACAGCCAGGAGG
sq_76677_74195_1_0_0_0_0:1:0_0:0:0_61c68	83	Chromosome	1204898	50	50M	=	1202206	-2742	TACCCCACCAACTAACTAATGCGCCGCAGACCCCTCTTAAACCGATAAAT	
sq_76677_74195_1_0_0_0_0:1:0_0:0:0_61c68	163	Chromosome	1202206	49	50M	=	1204898	2742	CCTCCTGGCTGTTTGTGCATCTCTACTTCCGTTTCCACTTAGCATATATT

In this case you see that the two mate pairs aligns correctly in different parts of genome with the first mate aligning /1 forward and /2 reversed, while the second mate align /1 reversed and /2 forward.
This is plausible considering that probably (but i know it is) the code around position 1200000 is the same of around position 70000 but reversed complemented.
But if i dont know which one between /1 /2 is the left mate, i can't say where in my target genome i have the inversion.

Anyway, do you know if that read identificator is splittable for gain more information? I've noticed quite a regularity like if the value of the first 2 "boolean" values could say which read is the left one (if "_0_1_" meant that the second read is the left and "_1_0_" that the first read is the left, then i'd have solved my question). However i've no documentation about that and it does not match the fastq illumina standards.

Topics	Statistics	Last Post
Gene Misexpression in the Healthy Human Population by seqadmin Started by seqadmin, 07-25-2024, 06:46 AM	0 responses 9 views 0 likes	Last Post by seqadmin 07-25-2024, 06:46 AM
New Method for Rapid Genetic Diagnosis of Mendelian Disorders by seqadmin Started by seqadmin, 07-24-2024, 11:09 AM	0 responses 26 views 0 likes	Last Post by seqadmin 07-24-2024, 11:09 AM
Advancing Nanopore Technology for Portable Sensing Devices by seqadmin Started by seqadmin, 07-19-2024, 07:20 AM	0 responses 160 views 0 likes	Last Post by seqadmin 07-19-2024, 07:20 AM
New RNA-Based Gene Writing Technology Achieves Precise Gene Integration by seqadmin Started by seqadmin, 07-16-2024, 05:49 AM	0 responses 127 views 0 likes	Last Post by seqadmin 07-16-2024, 05:49 AM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment: