Originally posted by westerman
View Post
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Originally posted by westerman View Post2 mismatches is great for SNP discovery since any given read is unlikely to have more than 1 SNP in it. Anything else can be discarded as error.
0 84.08%
1 13.02%
2 2.30%
3 0.40%
4 0.10%
5 0.03%
...
so make your own judgment.
On the other hand some of us have to deal DNA from species only partially related to our known (and often incomplete) reference sequence. We then use larger mismatch parameters and are thankful for what information we do get back.
Comment
-
Originally posted by Sheila View PostIn the configuration file you can choose between "all" or "unique".
all = all mapping positions
unique= unique mapping positions
Comment
-
Originally posted by fishtank View PostI am wondering where you came to the conclusion that last bases of the miRNA that are close to the adaptor have a high error rate. Could these be due to miRNA editing?
It's is known the last bases close to the adaptor have a higher error rate so I would not use 0 mismatches first because you would not detect any isomiR with 1nt diference (polymorphic or not) and second because of the higher error rate at the end of the sequences.
I'm still playing with the parameters, it's hard to define what's best.
S.
Comment
-
I am trying to figure out how the *.csfasta_extend.counts.35.6 gets generated from .csfasta_extend.ma.35.6. In the .csfasta_extend.ma.35.6, what does
>1_17_829_F3,220_-79.6.21
T13100202312110020020101102011303111
means? I saw some documents that says it should be
>TAG_ID,LOCATION,MISMATCHES.
so 1_17_829_F3 is the TAG_ID.
Is 6 is the mismatches? But how do I decode the location part?
Thanks.
Comment
-
Using rna2map, it seems to me the start/end chromosome coordinates in the *.csfasta_extend.counts.35.6 is offset by 1 relative to the reads...i.e. to view the read sequence correctly, I have to input chr:start-1 to end-1 into the ucsc genome browser.
But if I take the chromosome location specified in mirBase.13.0.fasta generated, I don't have the offset to view the reference sequence. Why the difference?
Can someone confirm this?
Comment
-
Originally posted by fishtank View PostI am trying to figure out how the *.csfasta_extend.counts.35.6 gets generated from .csfasta_extend.ma.35.6. In the .csfasta_extend.ma.35.6, what does
>1_17_829_F3,220_-79.6.21
T13100202312110020020101102011303111
where FASTASEQNUMBER is the 1-indexed sequence number in your multi-entry fasta file.
Comment
-
Originally posted by OneManArmy View PostPANEL_XCOORD_YCOORD_[F3/BC],FASTASEQNUMBER_LOCATION.MISMATCHES.LENGTH
where FASTASEQNUMBER is the 1-indexed sequence number in your multi-entry fasta file.
Comment
-
I can provide some statistics concerning small RNA matching pipeline from AB.
I use a small RNA purifyed human sample in a barcoding experiment with 7.3M reads
I've run the pipeline many times with differents parameters :
- SeedMM : 0,1,2,3
- ExtendMM : 1, 3 or 6
- ReadType : random or unique
R_0_6 = Random, 0 seed MM and 6 Extend MM
For Tag count, Total beads and uniquely placed beads
_____________Tags________Total_____Unique
R_0_6 : __983.679____1.023.809____527.973
R_1_6 : 1.377.737____1.433.096____752.479
R_2_6 : 1.677.397____1.739.800____925.693
R_3_6 : 1.762.540____1.834.924____981.906
R_0_1 : __441.813______469.826____162.466
I do not perform genome mapping but we get between 13% to 24% of useable reads
mapped to a miRNA reference (the more we allow mismatchs, the more we have reads mapping miR).
Note that the number of uniquely placed beads does not increase (~55%),
and i would think that the more MM we allow the more there is a possibility that a read match
multiple references miR and does not uniquely mapped... Any idea where i'm wrong ?
Anyway it seems that in the later analysis that miR expression is not
clearly affected by the parameters we took to run the pipeline (Hopefully).
Comment
-
Hi:
I was wondering what people are doing with their miRNA data to quantitate
miR and miR* from their sequence reads.Especially novel miR*.
(I wish they change miR* nomenclature to more sensible 3p-5p one)
Is anybody aware of any computational approaches to automate miR vs miR*
quantitation?
Also I was wondering how people are addressing sense - antisense
mapping issues related to ds regions in pre-miRs?
We are still not sure how small RNA pipeline handles strand information, how it counts reads when they map to both strands (looks like it double-counts them).
And how do we summarize read counts efficiently in table form (not GB track) efficiently with strand information preserved.
Thanks
Comment
-
Realsitic miRNA mapping from SREK
Hi. I have a long experience in miRNA identification from 454 data and from march of this year I am grinding my teeth on SOLiD SREK results. I am using SHRiMP and custom made scripts both for genome mapping and mapping against miRBase reference (both mature and haripin)
Even biologically, the claim of 50% of miRNAs in a sample is unbelievable. I do think this number is including tRNAs (yes, there are many tRNA fragments of the stem very similar to miRNAs), snoRNAs etc etc. I am very cautious and conservative in this classification. I would say that mapping percentage of small RNAs from SREK experiment against Hs Genome will be between 50% and 60% of the reads. Known (ie well established) miRNAs will be from 5% to 15% of the total beads, i.e. from 10% to 30% of the mappable reads. You should be well aware of the danger of false positives also in known miRNA identification. More details on request
Originally posted by fishtank View PostBut according to this ABI document, they are getting 50% reads mapped to miRNA so 0.3% is worrying. And it is already enriched for small RNA. I am skeptical of the 50% claim though, wonder what other people are getting?
Comment
-
How mismatches are calculated
In ideal world, I would expect rna2map pipeline to report number of mismatches that are present between "the part of the read that aligns to reference" and "the reference sequence". That is to say in old BLAST searches way of things, mismatches between highscoring pairs.
However, after doing some digging in to the code of the rna2map pipeline and analyzing mapping results, i have discovered that rna2map stupidly puts the number of mismatches that are found with the adaptor sequence as well in the alignment.
therefore if the alignment reads as follows
>TagID1,1_1000.6.22
>TagID2,1_1000.6.22
it means that there are six errors in total for both tags.
now consider this: your miRNA aligning with 0 mismatches to the reference for 22 bp (which is great) but adaptor is aligning with 6 mismatches (who cares).
and in second case: your miRNA aligning with 6 mismatches to the reference for 22 bp (which is not so great) but adaptor is aligning with 0 mismatches (who cares).
now if we looked at the alignment file only and not the reads that actually align, then we would be tempted to use both reads with equal weight. however, in real world it would not be such a great idea to use a read with six mismatches over 22 bp (~73% match).
has anybody ever looked into this kind of things before or anybody accounted for this ever before.
please share your views and opinions and we can discuss it further.
cheers
hardipPost-doctoral Fellow
John Curtin School of Medical Research
Australian National University, Canberra, ACT, Australia
Comment
Latest Articles
Collapse
-
by seqadmin
The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...-
Channel: Articles
05-06-2024, 07:48 AM -
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 05-10-2024, 06:35 AM
|
0 responses
16 views
0 likes
|
Last Post
by seqadmin
05-10-2024, 06:35 AM
|
||
Started by seqadmin, 05-09-2024, 02:46 PM
|
0 responses
21 views
0 likes
|
Last Post
by seqadmin
05-09-2024, 02:46 PM
|
||
Started by seqadmin, 05-07-2024, 06:57 AM
|
0 responses
19 views
0 likes
|
Last Post
by seqadmin
05-07-2024, 06:57 AM
|
||
Started by seqadmin, 05-06-2024, 07:17 AM
|
0 responses
21 views
0 likes
|
Last Post
by seqadmin
05-06-2024, 07:17 AM
|
Comment