Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Haneko
    replied
    no, actually. it's because i directly copied it from the window of my ssh. So there aren't any line breaks there. sorry for the confusion.

    Leave a comment:


  • damiankao
    replied
    My sam files have H in the CIGAR string and it worked fine.

    Is what you copied and pasted into your post exactly as it looks in the sam file? I noticed there are some newline breaks in the attribute column that doesn't look like a natural linebreak imposed by the forum.

    Leave a comment:


  • Xi Wang
    replied
    It seems that this problem is caused by Cufflinks can't deal with the hard clip (H).
    Besides, as your CIGAR strings are all right, the error message "CIGAR op has zero length" is not quite distinguishable.
    Last edited by Xi Wang; 03-18-2010, 09:20 PM.

    Leave a comment:


  • Haneko
    replied
    I'm getting the following using your code:

    1206_912_423 16 chrX 148852770 255 10H10M101N30M * 0 0 CTCCCGTAGCCTTGATGGTCTGCTGCTTCCGTCTGTCACT ,GA%%:IIIIIIIIIIIIIIIIIIIIIIIIIIII
    IIIIII CS:Z:T32112112213020231231221013210203231320221310310031 XJ:Z:K CQ:Z:<<::9@9=:?==;:=>>>=:>9>695;;773:885&%*80,/&7&())6( XL:Z:39,39 XU:Z:3,1 IH:i:2 HI
    :i:2 MD:Z:40 XS:A:-
    922_1240_1515 16 chrX 119563391 255 10H10M1029N30M * 0 0 TGATCATGATCATTTGTCTGCAATGGTTTTGCCAGCATCT "C?H?'';?&&A?"""IIIIIIIIIIIIIIIIII
    IIIIII CS:Z:T32231321031000101301312213103133211123213222112001 XJ:Z:K CQ:Z::>>:>:?<>;==::<9=;>>9><:&4,6&.2*',45+9()50)'&*&2 XL:Z:39 XU:Z:4 IH:i:1 HI:i:1 MD:Z:40 XS
    :A:-
    1297_662_654 0 chrX 153279920 255 10H10M102N26M4H * 0 0 CTTCGGTGTGCCACTGAAGATCCTGGTGTCGCCATG 1IIEIIIIC?III&&III&&4?I:;BDI=+.;I=3% CS
    :Z:T20331231203202301111301111202132021011123301313032 XJ:Z:K CQ:Z:@@96564=5/919428;7>&:78=&:585&+*66%7,98&&)38&.%8,+ XL:Z:30,35 XU:Z:2,2 IH:i:2 HI:i:2 MD:Z:36 XS
    :A:+
    1289_854_1683 16 chrX 153666617 255 10H10M1046N30M * 0 0 TGCCACTCGCCATTCCTGCAGCTCAGGGGAAGGGATCAAT '<A;5<IB9;@IDH((IIHIIIIGGIIIIIIIII
    IIIIII CS:Z:T33012320020200021223213122203103322110313223332222 XJ:Z:K CQ:Z:AA;A>;9>>?;?6:3:.:;4872:7(=,98)3'<7&0,6')1'5/1.)4/ XL:Z:39 XU:Z:1 IH:i:1 HI:i:1 MD:Z:40 XS
    :A:-
    1409_132_757 16 chrX 153666617 255 10H10M1046N30M * 0 0 TGCCACTCTACATTCCTGCAGCTCAGGGGAAGGGATCAAT "9:<?;G%###"IF##GFIIIAGIECIIIIIIII
    IIIIII CS:Z:T33012320020200021223213121203213222110311113332022 XJ:Z:K CQ:Z:?><@;9<>?8:>5<31553/<7526#619&#/%71+5(3'$&%:4&&-44 XL:Z:39 XU:Z:4 IH:i:1 HI:i:1 MD:Z:8TA30
    XS:A:-
    1125_1188_1449 16 chrX 53458535 255 10H10M110N30M * 0 0 GAAGAACCTCCTACAATGACACGGGCAAAGGTACGGTCCT &-<I<?##E@)/<>"""/:?IIIIIIIIIIIIII
    IIIIII CS:Z:T32021031310200130031112113113102231022021112101031 XJ:Z:K CQ:Z:;=:<=?A:?>@<=8=5:==<.2)'/*7(5/)8.#:&75(&*6#)9$8$#8 XL:Z:39 XU:Z:4 IH:i:1 HI:i:1 MD:Z:40 XS
    :A:-
    .
    .
    .

    Leave a comment:


  • Xi Wang
    replied
    Originally posted by Haneko View Post
    Unfortunately, I can't pinpoint which line it is referring to because my input file is rather big. Here is a larger chunk of the output:

    Counting hits in map
    CIGAR op has zero length
    CIGAR op has zero length
    Total map density: 1335893.196411
    Processing bundle [ chrX:2712030-2712060 ] with 2 non-redundant alignments
    Filtering bundle introns, avg bundle doc = 1.166667, thresh = 0.058333
    Intron filtering pass finished
    Filtering forward strand
    Initial filter pass complete
    Updated avg bundle doc = nan
    threshold is = nan
    Filtering reverse strand
    Initial filter pass complete
    Updated avg bundle doc = 1.166667
    threshold is = 0.175000
    Saw reverse strand only
    No introns in bundle, collapsing all hits to single transcript
    Calculating abundances
    Calculating intial MLE
    Tossing likely garbage isoforms
    Revising MLE
    Importance sampling posterior distribution
    1 isoforms with 1 abundances
    Considering isoform with FMI 1.000000
    Processing bundle [ chrX:2767648-2767776 ] with 3 non-redundant alignments
    Filtering bundle introns, avg bundle doc = 1.489583, thresh = 0.074479
    ....

    Is there a way to find the line giving the error using these information?
    You can use this script to find the empty CIGAR reads, although it will be a little bit slow.

    Code:
    sort -k6,6 <your_file.sam> | more

    Leave a comment:


  • damiankao
    replied
    This doesn't seem likely to me, but do you have any alignments where the CIGAR string contains no 'M'?

    Leave a comment:


  • Haneko
    replied
    Yes, I'm quite certain. I reran cufflinks using the reads that map to chrX, and it still gave me the error.

    Leave a comment:


  • damiankao
    replied
    Are you sure you have no alignments with empty CIGAR string? They usually are at the end of the sam file.

    Leave a comment:


  • Haneko
    replied
    I've already removed all the non-mappable reads from the SAM file when it threw me the error. =(

    Leave a comment:


  • damiankao
    replied
    Bioscope sam output includes unmapped reads. Lines that have no CIGAR string and no chromosome/contig ID are the umapped reads. I have no idea why BIoscope decided to include them, but you have to filter them out.

    I ran about 250 million Bioscope mapped reads few days ago on our university's maths server. Cufflinks needed about 60-70 gigs of memory for that amount of reads. But at least it ran and gave me results. Now I just have to wade through 800,000 features that it predicted and filter out all the crap.
    Last edited by damiankao; 03-17-2010, 01:35 AM.

    Leave a comment:


  • Haneko
    replied
    Unfortunately, I can't pinpoint which line it is referring to because my input file is rather big. Here is a larger chunk of the output:

    Counting hits in map
    CIGAR op has zero length
    CIGAR op has zero length
    Total map density: 1335893.196411
    Processing bundle [ chrX:2712030-2712060 ] with 2 non-redundant alignments
    Filtering bundle introns, avg bundle doc = 1.166667, thresh = 0.058333
    Intron filtering pass finished
    Filtering forward strand
    Initial filter pass complete
    Updated avg bundle doc = nan
    threshold is = nan
    Filtering reverse strand
    Initial filter pass complete
    Updated avg bundle doc = 1.166667
    threshold is = 0.175000
    Saw reverse strand only
    No introns in bundle, collapsing all hits to single transcript
    Calculating abundances
    Calculating intial MLE
    Tossing likely garbage isoforms
    Revising MLE
    Importance sampling posterior distribution
    1 isoforms with 1 abundances
    Considering isoform with FMI 1.000000
    Processing bundle [ chrX:2767648-2767776 ] with 3 non-redundant alignments
    Filtering bundle introns, avg bundle doc = 1.489583, thresh = 0.074479
    ....

    Is there a way to find the line giving the error using these information?

    Leave a comment:


  • nilshomer
    replied
    Originally posted by Haneko View Post
    I'm using BioScope SAM output for cufflinks and getting a strange message when running:

    Counting hits in map
    CIGAR op has zero length
    CIGAR op has zero length
    ..

    There are few such lines ('CIGAR op has zero length'), and I'm not sure what they mean. Can someone please help?
    It may mean the SAM entry is incorrect. Could you post the line under question?

    Leave a comment:


  • Haneko
    replied
    I'm using BioScope SAM output for cufflinks and getting a strange message when running:

    Counting hits in map
    CIGAR op has zero length
    CIGAR op has zero length
    ..

    There are few such lines ('CIGAR op has zero length'), and I'm not sure what they mean. Can someone please help?

    Leave a comment:


  • damiankao
    replied
    I didn't change the CIGAR string at all. I assumed that cufflinks will just ignore anything that's not M or N. The basepair sequence and quality score correspond only to the Ms in the CIGAR string.

    I am having memory issues with cufflinks right now though. I did several sucessful runs with about 100 million mapped reads (~30gig .sam file). But I am getting allocating new memory error now with ~200million mapped reads (~74gig .sam file).

    Leave a comment:


  • KevinLam
    replied
    Originally posted by lgoff View Post
    Has anyone tried to use cufflinks to assemble isoforms from SOLiD RNA-Seq data? Now that Bowtie supports colorspace reads, I am trying to take this output and process the alignments through cufflinks with little success.

    I understand that TopHat adds the required XA:i:[+-] tag to the alignments, which I am able to add due to this being a strand-specific library. Whether or not I add this tag myself, when I run cufflinks, no output is reported (other than headers) into the output files.

    Anyone had this issue or dealing with transcript assembly in SOLiD? Any help is appreciated...
    Bowtie does not yet report gapped alignments; this is future work.
    Won't this affect you if you apply it to RNA-seq data?

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Recent Innovations in Spatial Biology
    by seqadmin


    Spatial biology is an exciting field that encompasses a wide range of techniques and technologies aimed at mapping the organization and interactions of various biomolecules in their native environments. As this area of research progresses, new tools and methodologies are being introduced, accompanied by efforts to establish benchmarking standards and drive technological innovation.

    3D Genomics
    While spatial biology often involves studying proteins and RNAs in their...
    01-01-2025, 07:30 PM
  • seqadmin
    Advancing Precision Medicine for Rare Diseases in Children
    by seqadmin




    Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
    12-16-2024, 07:57 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 01-09-2025, 04:04 PM
0 responses
12 views
0 likes
Last Post seqadmin  
Started by seqadmin, 01-09-2025, 09:42 AM
0 responses
20 views
0 likes
Last Post seqadmin  
Started by seqadmin, 01-08-2025, 03:17 PM
0 responses
29 views
0 likes
Last Post seqadmin  
Started by seqadmin, 01-03-2025, 11:18 AM
1 response
47 views
1 like
Last Post Tonia
by Tonia
 
Working...
X