Originally posted by gringer
View Post
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
-
Originally posted by rskr View PostAccuracy and recall aren't interchangeable with sensitivity and specificity. Sensitivity is for binary classifiers and recall is for a database. Suppose you framed your search as a binary classifier, where every object in the database was classified as returned or not returned. Since there are so few returnable objects compared to the ones that are returnable. Sensitivity might as well be irrelevant. IE you could mark everything as not returnable and be 100.00% accurate, since there is only one in three billion correct answers. This is why it makes more sense to frame the evaluation in a relevance framework eg Accuracy Recall.
FWIW, Medical science uses positive and negative predictive value to account for extreme chances of correct/incorrect clasifications. Wikipedia tells me that PPV is equivalent to precision, while sensitivity is equivalent to recall.
Comment
-
Originally posted by gringer View PostUnderstood, thanks for the clarification. I don't deny that accuracy and recall work well for what has been done in the paper, it's just that they're not biology-friendly.
FWIW, Medical science uses positive and negative predictive value to account for extreme chances of correct/incorrect clasifications. Wikipedia tells me that PPV is equivalent to precision, while sensitivity is equivalent to recall.
Comment
-
Originally posted by Bernt.Popp View PostHey Wei,
I am trying to align SOLiD colorspace reads with subread (1.4.0).
The commands used are:
1)
subread-buildindex -c -o human_g1k_v37_decoy human_g1k_v37_decoy.fasta
2)
subread-align -T 16 -I 16 -b -i $ref -r $myfilename".csfasta" -o $mydnaID.$myslide.subread.sam
3) adding readgroup information, sorting and converting to BAM with picard.
Unfortunately either there is some bug in the conversion from colorspace to basespace (option -b) or I am doing something wrong as the alignments are totally messy when viewed in IGV (although the reads seem to be at the right position).
Here is a example with a comparison to CUSHAW2 and novoalignCS alignments:
https://www.dropbox.com/s/4vgi0c7ev1...%20subread.jpg
Do you have any idea what could be wrong?
Also the new Indel feature does not emit any variants for the colorspace exomes analyzed...
Cheers,
Bernt
We found a problem with color base conversion for those reads mapped to negative strand. We are now investigating this and will fix it with a patch.
Thanks for reporting this.
Wei
Comment
-
Originally posted by shi View PostHi Bernt,
We found a problem with color base conversion for those reads mapped to negative strand. We are now investigating this and will fix it with a patch.
Thanks for reporting this.
Wei
Best,
Wei
Comment
-
Originally posted by shi View PostWe have fixed the bug. Please update your Subread with the latest version (1.4.0-p1) and rerun your alignments.
Best,
Wei
https://www.dropbox.com/s/zr0zhrtsqx...or_subread.jpg
I did not rebuild the index though, should I?
Maybe the dynamic programming approach described in Li H, Durbin R Bioinformatics (2009) could help in solving the conversion problem?
Cheers,
Bernt
Comment
-
Dear Bernt,
I think the alignment result on SOLiD data has been largely improved in subread-1.4.0-p1. In your screenshot, most reads have the full length or a substantially long part mapped to the reference genome correctly. When I looked closely, I found that the reads with a part mismatched are very likely to have one color in the middle wrong, ruining the remaining part in color->base conversion.
There were also few reads entirely mismatched because Subread on SOLiD data does not compare base by base, but color by color, and it trims off the first two characters from the read before mapping (as what bowtie does). If the first base in the SOLiD read is wrong, the entire read has all its bases distorted.
If you convert those highly mismatched reads into colors, you may find that all these reads matched the genome very well in the color space.
By the way, if the data is from RNA-seq, it may contain junctions that our subjunc program can discover. Subjunc also works on SOLiD reads, so maybe it's worth a try
Cheers,
Yang
Originally posted by Bernt.Popp View PostError persists for me, alignment with version 1.4.0-p1:
https://www.dropbox.com/s/zr0zhrtsqx...or_subread.jpg
I did not rebuild the index though, should I?
Maybe the dynamic programming approach described in Li H, Durbin R Bioinformatics (2009) could help in solving the conversion problem?
Cheers,
BerntLast edited by yangliao; 10-25-2013, 01:57 PM.
Comment
-
Originally posted by yangliao View Post... Subread on SOLiD data does not compare base by base, but color by color, and it trims off the first two characters from the read before mapping (as what bowtie does). If the first base in the SOLiD read is wrong, the entire read has all its bases distorted.
If you convert those highly mismatched reads into colors, you may find that all these reads matched the genome very well in the color space.
Code:.31230 ATGATT CGTCGG GCAGCC TACTAA
Last edited by gringer; 10-25-2013, 02:03 PM.
Comment
-
Yes, I agree the color to base conversion caused a lot of trouble for SNP calling although the reads seem to be mapped to the correct locations. I also agree that the color representations of the alignments are not intuitive and it is hard to see if they match with the reference or not.
One way to get around this issue is possibly to convert the color-space reads to base-space reads before carrying out alignments. This may reduce the number of mapped reads, but it should considerably reduce the number of mismatched bases due to the issue with color to base conversion.
Wei
Comment
-
Originally posted by shi View PostOne way to get around this issue is possibly to convert the color-space reads to base-space reads before carrying out alignments. This may reduce the number of mapped reads, but it should considerably reduce the number of mismatched bases due to the issue with color to base conversion
However, when representing an alignment in base-space, you need to consider the base-space representation of the reference sequence, and modify the aligned colour-space sequence to fix any colour-shift errors.
edit: Note that it is always the case that a single colour-space difference between read and reference sequence is an instrument read error, and will cause a base-shift error in any base-space representation. A single SNP will modify two consecutive colours, and an INDEL will shift all subsequent colours (in the same fashion as in base-space) as well as (possibly) changing the colour at the site of the INDEL.Last edited by gringer; 10-25-2013, 10:01 PM.
Comment
Latest Articles
Collapse
-
by seqadmin
The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...-
Channel: Articles
11-06-2024, 07:24 PM -
-
by seqadmin
Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...-
Channel: Articles
10-18-2024, 07:11 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 11-08-2024, 11:09 AM
|
0 responses
35 views
0 likes
|
Last Post
by seqadmin
11-08-2024, 11:09 AM
|
||
Started by seqadmin, 11-08-2024, 06:13 AM
|
0 responses
28 views
0 likes
|
Last Post
by seqadmin
11-08-2024, 06:13 AM
|
||
Started by seqadmin, 11-01-2024, 06:09 AM
|
0 responses
32 views
0 likes
|
Last Post
by seqadmin
11-01-2024, 06:09 AM
|
||
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks
by seqadmin
Started by seqadmin, 10-30-2024, 05:31 AM
|
0 responses
23 views
0 likes
|
Last Post
by seqadmin
10-30-2024, 05:31 AM
|
Comment