Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using Cufflinks with AB SOLiD Data?

    Has anyone tried to use cufflinks to assemble isoforms from SOLiD RNA-Seq data? Now that Bowtie supports colorspace reads, I am trying to take this output and process the alignments through cufflinks with little success.

    I understand that TopHat adds the required XA:i:[+-] tag to the alignments, which I am able to add due to this being a strand-specific library. Whether or not I add this tag myself, when I run cufflinks, no output is reported (other than headers) into the output files.

    Anyone had this issue or dealing with transcript assembly in SOLiD? Any help is appreciated...

  • #2
    I'm actually trying to use the BioScope output for RNA-seq for cufflinks now.

    In my sam file, it doesn't contain the "XS:A:" flag that is said to be a must to have from the documentation i read for the splice reads. When I ran cufflinks, I was expecting an error, but nothing popped out. Does this mean the flag is not required?

    Comment


    • #3
      I am using Bioscope mapping output .bam files as input into cufflinks. You have to first convert to .sam file, clean it up, and added the strand information by parsing the bitwise flag.

      I was able to run this cleaned up version of .sam file through cufflinks with pretty good results. The only problem I am having is that most of the output is not showing any strand information.

      I think cufflink is only using strand information for spliced reads and ignoring unspliced read strand? So all the genes assembled with spliced read has strand information, but others don't?

      Comment


      • #4
        Hi damiankao. For SOLiD data, do you use the full SAM file including all unmapped, unique mapped and multimapped reads? Or do you only use the set that contains the SOLiD-defined unique alignments (best score, and score >4 away from second best)?

        EDIT:

        I forgot to mention. In the manual, it states that cufflinks is unable to support other operations such as clipping. SOLiD bioscope whole transcriptome analysis includes hard clipping (H) in the CIGAR string for the output. May I ask how you dealt with that?

        Thanks!
        Last edited by Haneko; 03-10-2010, 12:03 AM.

        Comment


        • #5
          Originally posted by lgoff View Post
          Has anyone tried to use cufflinks to assemble isoforms from SOLiD RNA-Seq data? Now that Bowtie supports colorspace reads, I am trying to take this output and process the alignments through cufflinks with little success.

          I understand that TopHat adds the required XA:i:[+-] tag to the alignments, which I am able to add due to this being a strand-specific library. Whether or not I add this tag myself, when I run cufflinks, no output is reported (other than headers) into the output files.

          Anyone had this issue or dealing with transcript assembly in SOLiD? Any help is appreciated...
          Bowtie does not yet report gapped alignments; this is future work.
          Won't this affect you if you apply it to RNA-seq data?
          http://kevin-gattaca.blogspot.com/

          Comment


          • #6
            I didn't change the CIGAR string at all. I assumed that cufflinks will just ignore anything that's not M or N. The basepair sequence and quality score correspond only to the Ms in the CIGAR string.

            I am having memory issues with cufflinks right now though. I did several sucessful runs with about 100 million mapped reads (~30gig .sam file). But I am getting allocating new memory error now with ~200million mapped reads (~74gig .sam file).

            Comment


            • #7
              I'm using BioScope SAM output for cufflinks and getting a strange message when running:

              Counting hits in map
              CIGAR op has zero length
              CIGAR op has zero length
              ..

              There are few such lines ('CIGAR op has zero length'), and I'm not sure what they mean. Can someone please help?

              Comment


              • #8
                Originally posted by Haneko View Post
                I'm using BioScope SAM output for cufflinks and getting a strange message when running:

                Counting hits in map
                CIGAR op has zero length
                CIGAR op has zero length
                ..

                There are few such lines ('CIGAR op has zero length'), and I'm not sure what they mean. Can someone please help?
                It may mean the SAM entry is incorrect. Could you post the line under question?

                Comment


                • #9
                  Unfortunately, I can't pinpoint which line it is referring to because my input file is rather big. Here is a larger chunk of the output:

                  Counting hits in map
                  CIGAR op has zero length
                  CIGAR op has zero length
                  Total map density: 1335893.196411
                  Processing bundle [ chrX:2712030-2712060 ] with 2 non-redundant alignments
                  Filtering bundle introns, avg bundle doc = 1.166667, thresh = 0.058333
                  Intron filtering pass finished
                  Filtering forward strand
                  Initial filter pass complete
                  Updated avg bundle doc = nan
                  threshold is = nan
                  Filtering reverse strand
                  Initial filter pass complete
                  Updated avg bundle doc = 1.166667
                  threshold is = 0.175000
                  Saw reverse strand only
                  No introns in bundle, collapsing all hits to single transcript
                  Calculating abundances
                  Calculating intial MLE
                  Tossing likely garbage isoforms
                  Revising MLE
                  Importance sampling posterior distribution
                  1 isoforms with 1 abundances
                  Considering isoform with FMI 1.000000
                  Processing bundle [ chrX:2767648-2767776 ] with 3 non-redundant alignments
                  Filtering bundle introns, avg bundle doc = 1.489583, thresh = 0.074479
                  ....

                  Is there a way to find the line giving the error using these information?

                  Comment


                  • #10
                    Bioscope sam output includes unmapped reads. Lines that have no CIGAR string and no chromosome/contig ID are the umapped reads. I have no idea why BIoscope decided to include them, but you have to filter them out.

                    I ran about 250 million Bioscope mapped reads few days ago on our university's maths server. Cufflinks needed about 60-70 gigs of memory for that amount of reads. But at least it ran and gave me results. Now I just have to wade through 800,000 features that it predicted and filter out all the crap.
                    Last edited by damiankao; 03-17-2010, 01:35 AM.

                    Comment


                    • #11
                      I've already removed all the non-mappable reads from the SAM file when it threw me the error. =(

                      Comment


                      • #12
                        Are you sure you have no alignments with empty CIGAR string? They usually are at the end of the sam file.

                        Comment


                        • #13
                          Yes, I'm quite certain. I reran cufflinks using the reads that map to chrX, and it still gave me the error.

                          Comment


                          • #14
                            This doesn't seem likely to me, but do you have any alignments where the CIGAR string contains no 'M'?

                            Comment


                            • #15
                              Originally posted by Haneko View Post
                              Unfortunately, I can't pinpoint which line it is referring to because my input file is rather big. Here is a larger chunk of the output:

                              Counting hits in map
                              CIGAR op has zero length
                              CIGAR op has zero length
                              Total map density: 1335893.196411
                              Processing bundle [ chrX:2712030-2712060 ] with 2 non-redundant alignments
                              Filtering bundle introns, avg bundle doc = 1.166667, thresh = 0.058333
                              Intron filtering pass finished
                              Filtering forward strand
                              Initial filter pass complete
                              Updated avg bundle doc = nan
                              threshold is = nan
                              Filtering reverse strand
                              Initial filter pass complete
                              Updated avg bundle doc = 1.166667
                              threshold is = 0.175000
                              Saw reverse strand only
                              No introns in bundle, collapsing all hits to single transcript
                              Calculating abundances
                              Calculating intial MLE
                              Tossing likely garbage isoforms
                              Revising MLE
                              Importance sampling posterior distribution
                              1 isoforms with 1 abundances
                              Considering isoform with FMI 1.000000
                              Processing bundle [ chrX:2767648-2767776 ] with 3 non-redundant alignments
                              Filtering bundle introns, avg bundle doc = 1.489583, thresh = 0.074479
                              ....

                              Is there a way to find the line giving the error using these information?
                              You can use this script to find the empty CIGAR reads, although it will be a little bit slow.

                              Code:
                              sort -k6,6 <your_file.sam> | more
                              Xi Wang

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                The Impact of AI in Genomic Medicine
                                by seqadmin



                                Article Coming Soon......
                                Today, 02:07 PM
                              • seqadmin
                                Multiomics Techniques Advancing Disease Research
                                by seqadmin


                                New and advanced multiomics tools and technologies have opened new avenues of research and markedly enhanced various disciplines such as disease research and precision medicine1. The practice of merging diverse data from various ‘omes increasingly provides a more holistic understanding of biological systems. As Maddison Masaeli, Co-Founder and CEO at Deepcell, aptly noted, “You can't explain biology in its complex form with one modality.”

                                A major leap in the field has
                                ...
                                02-08-2024, 06:33 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 02-23-2024, 04:11 PM
                              0 responses
                              31 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 02-21-2024, 08:52 AM
                              0 responses
                              46 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 02-20-2024, 08:57 AM
                              0 responses
                              36 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 02-14-2024, 09:19 AM
                              0 responses
                              63 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X