Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • haolili
    replied
    I am also ready for analysis of AB solid data with TopHat and later Cufflinks, hope for successful example here

    Leave a comment:


  • Haneko
    replied
    Hi clariet,

    Actually I've never considered this column to be a factor in filtering, so i can't give you a definite answer. But I think it'd be good to filter away those low quality mappings, perhaps someone could give a recommendation on the threshold?

    Leave a comment:


  • clariet
    replied
    Thank you very much for the reply. I was referring to the 5th column. It IS the mapping quality according to SAM manual. But I guess this mapping quality must be correlated with alignment score for some way.

    I have seen a lot of alignment with low mapping quality (0 or 1). For the input of cufflinks, do you usually filter out these low quality mapping reads? what is the cutoff you usually use?

    Thanks

    Originally posted by Haneko View Post
    Hi there,

    That is actually not the score, but the mapping quality (I'm assuming you're referring to column 5 for 255 "score"). For calculation of score, you will have to take the alignment in colorspace (XL:Z) and the number of colorspace mismatches (XU:Z), then use SOLiD's formula.

    A score of 10 shouldn't be appearing in the output. The seed of 25bp mapping with at most 2 mismatches will give u the lowest possible score for an alignment to be reported, which in my case is 18.

    Leave a comment:


  • nilshomer
    replied
    Originally posted by Xi Wang View Post
    According to SAM manual, the column 5 is for mapping quality. I just refer to this column as mapping quality, and I think the two - "alignment score" and "mapping quality" - are the same.
    Alignment score and mapping quality are not the same. The former is a measurement of the distance (or log-odds see:BLOSUM) between the observed sequence and reference sequence while the latter is generally the Phred-scaled probability of mismapping. In the SAM format, the alignment score can be stored in the "AS" tag while the fifth column is the mapping quality.

    Leave a comment:


  • Haneko
    replied
    Sorry, I made it a bit confusing.

    Yes, the alignment score I'm referring to is self-calculated. Not column 5 of mapping quality.

    Leave a comment:


  • Xi Wang
    replied
    Originally posted by Haneko View Post
    Hmm, I guess it depends on how you see it. I noted that the column gives a value of 255 for spliced reads, so it's not really helpful when it comes to spliced alignments. And we've always been dependent on the alignment score (using alignment length and mismatches) since WTAP1.2, so I tend to favor that.
    Ok, some things were mixed up here. The alignment score you mentioned is not the 5th column of SAM files, right?
    But I really want to point out that the 255 means the mapping quality is not available. (Refer to the SAM manual: http://samtools.sourceforge.net/SAM1.pdf)

    Leave a comment:


  • Haneko
    replied
    Hmm, I guess it depends on how you see it. I noted that the column gives a value of 255 for spliced reads, so it's not really helpful when it comes to spliced alignments. And we've always been dependent on the alignment score (using alignment length and mismatches) since WTAP1.2, so I tend to favor that.

    Leave a comment:


  • Xi Wang
    replied
    Originally posted by Haneko View Post
    I'm not sure about that. We've never really looked into mapping quality. The score of 10 i was referring to was the alignment score actually.
    According to SAM manual, the column 5 is for mapping quality. I just refer to this column as mapping quality, and I think the two - "alignment score" and "mapping quality" - are the same.

    Leave a comment:


  • Haneko
    replied
    I'm not sure about that. We've never really looked into mapping quality. The score of 10 i was referring to was the alignment score actually.

    Leave a comment:


  • Xi Wang
    replied
    Originally posted by Haneko View Post
    Hi there,

    A score of 10 shouldn't be appearing in the output. The seed of 25bp mapping with at most 2 mismatches will give u the lowest possible score for an alignment to be reported, which in my case is 18.
    I just think, if a read hit multiple locations in the genome, the mapping quality should also be a small number, even with few or no mismatches. Is it right?

    Leave a comment:


  • Haneko
    replied
    Hi there,

    That is actually not the score, but the mapping quality (I'm assuming you're referring to column 5 for 255 "score"). For calculation of score, you will have to take the alignment in colorspace (XL:Z) and the number of colorspace mismatches (XU:Z), then use SOLiD's formula.

    A score of 10 shouldn't be appearing in the output. The seed of 25bp mapping with at most 2 mismatches will give u the lowest possible score for an alignment to be reported, which in my case is 18.

    Leave a comment:


  • clariet
    replied
    From the lines below, the score of these alignment are all the same: 255. But from my bioscope output, most of the alignment has less than 10 score. Should I filter out these alignments?


    [
    QUOTE=Haneko;15679]I'm getting the following using your code:

    1206_912_423 16 chrX 148852770 255 10H10M101N30M * 0 0 CTCCCGTAGCCTTGATGGTCTGCTGCTTCCGTCTGTCACT ,GA%%:IIIIIIIIIIIIIIIIIIIIIIIIIIII
    IIIIII CS:Z:T32112112213020231231221013210203231320221310310031 XJ:Z:K CQ:Z:<<::9@9=:?==;:=>>>=:>9>695;;773:885&%*80,/&7&())6( XL:Z:39,39 XU:Z:3,1 IH:i:2 HI
    :i:2 MD:Z:40 XS:A:-
    922_1240_1515 16 chrX 119563391 255 10H10M1029N30M * 0 0 TGATCATGATCATTTGTCTGCAATGGTTTTGCCAGCATCT "C?H?'';?&&A?"""IIIIIIIIIIIIIIIIII
    IIIIII CS:Z:T32231321031000101301312213103133211123213222112001 XJ:Z:K CQ:Z::>>:>:?<>;==::<9=;>>9><:&4,6&.2*',45+9()50)'&*&2 XL:Z:39 XU:Z:4 IH:i:1 HI:i:1 MD:Z:40 XS
    :A:-
    1297_662_654 0 chrX 153279920 255 10H10M102N26M4H * 0 0 CTTCGGTGTGCCACTGAAGATCCTGGTGTCGCCATG 1IIEIIIIC?III&&III&&4?I:;BDI=+.;I=3% CS
    :Z:T20331231203202301111301111202132021011123301313032 XJ:Z:K CQ:Z:@@96564=5/919428;7>&:78=&:585&+*66%7,98&&)38&.%8,+ XL:Z:30,35 XU:Z:2,2 IH:i:2 HI:i:2 MD:Z:36 XS
    :A:+
    1289_854_1683 16 chrX 153666617 255 10H10M1046N30M * 0 0 TGCCACTCGCCATTCCTGCAGCTCAGGGGAAGGGATCAAT '<A;5<IB9;@IDH((IIHIIIIGGIIIIIIIII
    IIIIII CS:Z:T33012320020200021223213122203103322110313223332222 XJ:Z:K CQ:Z:AA;A>;9>>?;?6:3:.:;4872:7(=,98)3'<7&0,6')1'5/1.)4/ XL:Z:39 XU:Z:1 IH:i:1 HI:i:1 MD:Z:40 XS
    :A:-
    1409_132_757 16 chrX 153666617 255 10H10M1046N30M * 0 0 TGCCACTCTACATTCCTGCAGCTCAGGGGAAGGGATCAAT "9:<?;G%###"IF##GFIIIAGIECIIIIIIII
    IIIIII CS:Z:T33012320020200021223213121203213222110311113332022 XJ:Z:K CQ:Z:?><@;9<>?8:>5<31553/<7526#619&#/%71+5(3'$&%:4&&-44 XL:Z:39 XU:Z:4 IH:i:1 HI:i:1 MD:Z:8TA30
    XS:A:-
    1125_1188_1449 16 chrX 53458535 255 10H10M110N30M * 0 0 GAAGAACCTCCTACAATGACACGGGCAAAGGTACGGTCCT &-<I<?##E@)/<>"""/:?IIIIIIIIIIIIII
    IIIIII CS:Z:T32021031310200130031112113113102231022021112101031 XJ:Z:K CQ:Z:;=:<=?A:?>@<=8=5:==<.2)'/*7(5/)8.#:&75(&*6#)9$8$#8 XL:Z:39 XU:Z:4 IH:i:1 HI:i:1 MD:Z:40 XS
    :A:-
    .
    .
    .[/QUOTE]

    Leave a comment:


  • xguo
    replied
    My mistake. Bioscope does output spliced alignment as one record. pre-Bioscope WT pipeline generates two records in gff file for reads mapped to splice junction.

    thanks for the reply.

    Leave a comment:


  • Haneko
    replied
    Hi,

    I don't think BioScope outputs 2 entries, it should only output 1 entry for each alignment (continuous or spliced), unless there are more than one alignment for that read. It shouldn't be necessary to merge any 2 lines.

    Did you find any such cases in your data?

    Leave a comment:


  • xguo
    replied
    Originally posted by damiankao View Post
    I am using Bioscope mapping output .bam files as input into cufflinks. You have to first convert to .sam file, clean it up, and added the strand information by parsing the bitwise flag.

    I was able to run this cleaned up version of .sam file through cufflinks with pretty good results. The only problem I am having is that most of the output is not showing any strand information.

    I think cufflink is only using strand information for spliced reads and ignoring unspliced read strand? So all the genes assembled with spliced read has strand information, but others don't?
    Bioscope output includes two separate records in SAM file if a read aligns to a splice junction. I suppose what cufflink expects for spliced reads is one SAM record per junction read with CIGAR string ##M###N##M. Thus, reads mapped to splice junction in Bioscope output are not treated as spliced reads by cufflink and strand information is not used at all. As cufflink author Cole stated, "You should do your best to feed Cufflinks spliced alignments that are stranded with the XS". It may be necessary to merge two junction read records in Bioscope output into one with CIGAR string "##M###N##M".

    Any thought on this issue?

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Multiomics Techniques Advancing Disease Research
    by seqadmin


    New and advanced multiomics tools and technologies have opened new avenues of research and markedly enhanced various disciplines such as disease research and precision medicine1. The practice of merging diverse data from various ‘omes increasingly provides a more holistic understanding of biological systems. As Maddison Masaeli, Co-Founder and CEO at Deepcell, aptly noted, “You can't explain biology in its complex form with one modality.”

    A major leap in the field has
    ...
    02-08-2024, 06:33 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 02-21-2024, 08:52 AM
0 responses
30 views
0 likes
Last Post seqadmin  
Started by seqadmin, 02-20-2024, 08:57 AM
0 responses
18 views
0 likes
Last Post seqadmin  
Started by seqadmin, 02-14-2024, 09:19 AM
0 responses
54 views
0 likes
Last Post seqadmin  
Started by seqadmin, 02-12-2024, 03:37 PM
0 responses
458 views
0 likes
Last Post seqadmin  
Working...
X