Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
They should be RF. That being said, there is generally significant contamination in MP libraries of PE. Check out NextClip and the Illumina technical bulletin on MP library analysis.
-
Originally posted by mastal View PostDo you mean paired-end or mate pair? For Illumina technology, the orientation of the two reads relative to each other is different for paired end and mate pair.
The read identifiers in your sam file don't look like typical Illumina read IDs.
Anyways, if your reads are paired end Illumina reads, it is just random whether the reads in file /1 align to the + strand or the - strand, some of the /1 reads will align to one strand, and some to the other strand.
The reads in file /2 will align to the opposite strand from the paired read in file /1.
Well... if there's no relation between /1 /2 and orientation, i really wonder how i can decide which orientation my mate pairs have...
About the reads, i can't exclude that they have been sequenced artifically (with a simulation on sample genome).
Thanks for the reply
Leave a comment:
-
Do you mean paired-end or mate pair? For Illumina technology, the orientation of the two reads relative to each other is different for paired end and mate pair.
The read identifiers in your sam file don't look like typical Illumina read IDs.
Anyways, if your reads are paired end Illumina reads, it is just random whether the reads in file /1 align to the + strand or the - strand, some of the /1 reads will align to one strand, and some to the other strand.
The reads in file /2 will align to the opposite strand from the paired read in file /1.
Leave a comment:
-
Mate Pair orientation in illumina
Hi everyone,
i'm working for a university project about "resequencing" of a small genome (the reference genome is laidlawii).
I have the reference genome and two fastq files containing the reads of an illumina mate-pair library from a target genome.
Going to the point: i have a problem when i'm asked to generate a track for IGV representing the "percentage of oriented mates", i simply can't understand which read in each pair is the left and which is the right one.
Each read in the two fastq files has an id and is also marked with tag /1 /2: in one file i have all the /1 and in the other one i have all the /2.
Now the question is if there is a strong relation between the tag and the fact that the read is the left or right.
For aligment i use PASS (pass.cribi.unipd.it) that outputs a sam file with different informations among which the reverse complemented alignmen (flag bit 0x10 setted).
In almost every pair one read is aligned l->r while the other is reverse complemented aligned (maybe illumina sequences the borders from different strands?).
Making it easier: can i say that every mate-pair with /1 aligning left->right and /2 aligning reversed complemented, is left->right oriented on the reference?
And in the opposite case, then the mate pair aligns reversed on the reference?
(the assumption to prove is that /1 is always the left (or right) mate)
Thank you,
i hope i've made myself understood (english is not my first language and i'm a poor informatician :P)
edit: to be exhaustive as much as possible, here a situation that make me me crazy:
Code:sq_1607_4547_0_1_0_0_0:0:1_0:0:0_3f6b5 83 Chromosome 4547 50 50M = 1607 -2990 GACTACATCGGTTCCGGAGGGGAAACGAAGTATTTTTTATATGAGCATAA sq_1607_4547_0_1_0_0_0:0:1_0:0:0_3f6b5 163 Chromosome 1607 49 5M1D45M = 4547 2990 ACTCGTTGTCAAAAAAATAGATTCACCATTATTAAAGTGATAAATGTTTA sq_1610_3842_0_1_0_0_0:0:1_0:0:0_6dbfe 83 Chromosome 3842 50 50M = 1612 -2280 ATACCCGGATACAGCAAAAATCATACCTGTTAATTTTCCTACTGTCATTA sq_1610_3842_0_1_0_0_0:0:1_0:0:0_6dbfe 163 Chromosome 1612 49 49M = 3842 2280 GTTGTCAAAAAAATAGATTCACCATTATTAAAGTGATAAATGTTTATAA sq_1611_220_1_0_0_0_0:0:1_0:0:1_2059e 99 Chromosome 220 49 9M1D41M = 1612 1442 TAATAAATTGTCGTTTCTTATGCTATCATAGTTTTACATAAATTATTAAC sq_1611_220_1_0_0_0_0:0:1_0:0:1_2059e 147 Chromosome 1612 50 50M = 220 -1442 GTTGTCAAAAAAATAGATTCACCATTATTAAAGTGATAAATGTTTATAAA sq_1611_4420_0_1_0_0_0:0:1_0:0:0_35f5b 83 Chromosome 4420 50 50M = 1612 -2858 AAGCGTTAAAAAGTGCGCTTTTTTACTTATATTATGTTATAATATAATAG sq_1611_4420_0_1_0_0_0:0:1_0:0:0_35f5b 163 Chromosome 1612 50 50M = 4420 2858 GTTGTCAAAAAAATAGATTCACCATTATTAAAGTGATAAATGTTTATAAA sq_1617_4456_0_1_0_0_0:0:0_0:0:0_3e90d 83 Chromosome 4456 50 50M = 1617 -2889 TTATAATATAATAGGTAGGTGAATGAAGCGTATGAATCATTTTGAGTTAG sq_1617_4456_0_1_0_0_0:0:0_0:0:0_3e90d 163 Chromosome 1617 50 50M = 4456 2889 CAAAAAAATAGATTCACCATTATTAAAGTGATAAATGTTTATAAAAATGA
You see... the first 2 mates pairs align so that the "first segment in the template is reversed complemented" (flag = 83 with bit 5 and 7 setted according to sam specs) and the "second segment is forward aligned" (flag = 163). And this is the case of the hundreds of pairs preceding that point, so for the first 1612 bases i have /2 forward aligned and /1 reversed complemented.
Then the third mate pairs in the example is different. flag = 99 means that "this is the first segment and is forward aligned" and flag = 147 means "this is the second segment and is reversed complemented". So in this case /2 is reversed and /1 is forward.
After that, all returns normal...
This example make me think that there's no strong relation between the /1 /2 indication and the fact that a read is left or right.
In fact if it was like that, how can i explain that i have /2 of the second mate pair aligning forward on position 1612 and /1 of third mate pair aligning reversed in the same position?
The only possible case is that i have another region of my genome with the same code reversed, but in this case i'd have multiple reads, and this is not the case (the reads are id sorted so i should notice...).
An example of multiple read is this:
Code:sq_76677_74195_1_0_0_0_0:1:0_0:0:0_61c68 99 Chromosome 73696 50 50M = 76178 2532 ATTTATCGGTTTAAGAGGGGTCTGCGGCGCATTAGTTAGTTGGTGGGGTA sq_76677_74195_1_0_0_0_0:1:0_0:0:0_61c68 147 Chromosome 76178 49 50M = 73696 -2532 AATATATGCTAAGTGGAAACGGAAGTAGAGATGCACAAACAGCCAGGAGG sq_76677_74195_1_0_0_0_0:1:0_0:0:0_61c68 83 Chromosome 1204898 50 50M = 1202206 -2742 TACCCCACCAACTAACTAATGCGCCGCAGACCCCTCTTAAACCGATAAAT sq_76677_74195_1_0_0_0_0:1:0_0:0:0_61c68 163 Chromosome 1202206 49 50M = 1204898 2742 CCTCCTGGCTGTTTGTGCATCTCTACTTCCGTTTCCACTTAGCATATATT
This is plausible considering that probably (but i know it is) the code around position 1200000 is the same of around position 70000 but reversed complemented.
But if i dont know which one between /1 /2 is the left mate, i can't say where in my target genome i have the inversion.
Anyway, do you know if that read identificator is splittable for gain more information? I've noticed quite a regularity like if the value of the first 2 "boolean" values could say which read is the left one (if "_0_1_" meant that the second read is the left and "_1_0_" that the first read is the left, then i'd have solved my question). However i've no documentation about that and it does not match the fastq illumina standards.Last edited by d3mux; 08-10-2014, 05:36 AM.
Latest Articles
Collapse
-
by seqadmin
Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.
Nobel Prize for MicroRNA Discovery
This week,...-
Channel: Articles
Yesterday, 08:07 AM -
-
by seqadmin
Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...-
Channel: Articles
09-23-2024, 06:35 AM -
-
by seqadmin
During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.
Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...-
Channel: Articles
09-09-2024, 10:59 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 10-02-2024, 04:51 AM
|
0 responses
72 views
0 likes
|
Last Post
by seqadmin
10-02-2024, 04:51 AM
|
||
Started by seqadmin, 10-01-2024, 07:10 AM
|
0 responses
84 views
0 likes
|
Last Post
by seqadmin
10-01-2024, 07:10 AM
|
||
Started by seqadmin, 09-30-2024, 08:33 AM
|
1 response
86 views
0 likes
|
Last Post
by EmiTom
Yesterday, 06:46 AM
|
||
Started by seqadmin, 09-26-2024, 12:57 PM
|
0 responses
20 views
0 likes
|
Last Post
by seqadmin
09-26-2024, 12:57 PM
|
Leave a comment: