Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    I know you emailed me some files already, but I just want to point out that posting SAM to a forum like this is not all that helpful due to linebreaks, etc. It's much better to host them on the web somewhere, and post a link instead.

    I'm pretty sure I know where the problem is in Cufflinks, and I believe it's a simple fix. However, I'm traveling this week and have limited access to email and time to fix bugs, so I may not get to this for a few days. Thanks for your patience.
    Last edited by Cole Trapnell; 10-20-2009, 01:43 PM.

    Comment


    • #17
      Originally posted by Cole Trapnell View Post
      [...]
      As noted above, exons need to be attached to their parents transcripts, but through the transcript_id attribute, not the ID/Parent tree.
      Thank you very much for the clarification! I'll try to make the SAM ( ca. 2GB) and GTF available to you. Meanwhile I'll experiment with sanitizing the RefSeq annotations before feeding them into cufflinks... Enjoy your travel.

      -Marvin

      Comment


      • #18
        RefSeq cleanup helped!

        Just to let you know: Curating the RefSeq output with a little script resolved the crash reported earlier.

        The script adds suffixes to RefSeq transcript IDs which refer to more than one genomic locus. The output is then a GTF with only exons, linked together by their (now unique) transcript_ids and supplemented with a gene_id as well (RefSeq.name2 aka gene name).

        If someone is interested I can clean the code up (python) and post it here.

        Comment


        • #19
          Aborted message from cufflinks

          Originally posted by Cole Trapnell View Post
          One thing you could try is to increase the value of the collapse-rounds option from it's default of one. Each additional bump should cut the memory use in bundles like this roughly in half (up to a certain point). It carries some risk that Cufflinks will misassemble things if you set it too high, but 2 or 3 should certainly be safe (at least it is in my experience).
          Ok, I removed the entire set of reads that were mapping to this particular bundle, but it still crashed. Then I used the -c 2 option as you suggest, but the same Aborted message.

          Any comments?

          $ ../cufflinks-0.7.0.Linux_x86_64/cufflinks -p 2 -c 2 berg_42R7WAAXX_300164_41.lane1/accepted_hits.sam
          ...
          Processing bundle [ gi|89161220|ref|NC_000024.8|NC_000024:57769933-57769983 ] with 1 non-redundant alignments
          Processing bundle [ gi|89161220|ref|NC_000024.8|NC_000024:57770041-57770125 ] with 3 non-redundant alignments
          terminate called after throwing an instance of 'std::bad_alloc'
          what(): St9bad_alloc
          Aborted
          --
          bioinfosm

          Comment


          • #20
            Originally posted by bioinfosm View Post
            Ok, I removed the entire set of reads that were mapping to this particular bundle, but it still crashed. Then I used the -c 2 option as you suggest, but the same Aborted message.

            Any comments?

            $ ../cufflinks-0.7.0.Linux_x86_64/cufflinks -p 2 -c 2 berg_42R7WAAXX_300164_41.lane1/accepted_hits.sam
            ...
            Processing bundle [ gi|89161220|ref|NC_000024.8|NC_000024:57769933-57769983 ] with 1 non-redundant alignments
            Processing bundle [ gi|89161220|ref|NC_000024.8|NC_000024:57770041-57770125 ] with 3 non-redundant alignments
            terminate called after throwing an instance of 'std::bad_alloc'
            what(): St9bad_alloc
            Aborted
            Well this may be good news in a way, because those bundles are tiny, so this could be a simple bug in the SAM parser or something of that ilk, rather than Cufflinks exhausting memory. Can you send me (by email at [email protected]) a snippet of your SAM file that reproduces this crash? I'll throw it in the tracker and get to it this weekend when I get back from my trip.

            Comment


            • #21
              Cufflinks

              Originally posted by marvin.j View Post
              Just to let you know: Curating the RefSeq output with a little script resolved the crash reported earlier.

              The script adds suffixes to RefSeq transcript IDs which refer to more than one genomic locus. The output is then a GTF with only exons, linked together by their (now unique) transcript_ids and supplemented with a gene_id as well (RefSeq.name2 aka gene name).

              If someone is interested I can clean the code up (python) and post it here.
              I'm interested. Heck, I'll even take the verbose code (It helps me to figure out what's going on)!

              Comment


              • #22
                ditto memory errors. running without GTF file, 16GB memory, ~10M pairs.

                Thanks for this awesome package ...

                Comment


                • #23
                  I managed to run cufflinks and obtain the genes.expr file. But how do I annotate it with gene IDs etc from this information? The coordinates do not match anything on UCSC

                  $ head genes.expr
                  gene_id bundle_id chr left right bundle_fraction density RPKM
                  CUFF.1 725391 gi|13626247|ref|NT_025975.2|HsY_26131 350 400 2.65651
                  CUFF.10 725573 gi|13626247|ref|NT_025975.2|HsY_26131 55703 55835 0.503126
                  CUFF.13 725579 gi|13626247|ref|NT_025975.2|HsY_26131 56414 56521 1.86204
                  CUFF.15 725581 gi|13626247|ref|NT_025975.2|HsY_26131 56698 56748 3.98476
                  --
                  bioinfosm

                  Comment


                  • #24
                    Seems I could use cuffcompare, but am confused about the reference I am using (from tophap website) and which gtf file to download for use in cuffcompare
                    --
                    bioinfosm

                    Comment


                    • #25
                      Originally posted by bioinfosm View Post
                      Seems I could use cuffcompare, but am confused about the reference I am using (from tophap website) and which gtf file to download for use in cuffcompare
                      We certainly intend for you to use cuffcompare - it was built to do exactly what you want. As for which reference to use, that's up to you. You could try Ensembl first to get a feel for how cuffcompare works and how to parse its output. If the manual is unclear on how to interpret cuffcompare output, please feel free to ask questions here.

                      One important thing about using cuffcompare is that the chromosome names in whatever reference GTF file you use must match the chromosome names in your Cufflinks output, which of course come from your SAM input.

                      Comment


                      • #26
                        So I have these gene counts / exon counts results from Illumina's Genome Studio tool. They use the refseq annotation to obtain these read counts for the known genes.

                        I wish to compare these, with the data generated off cufflinks? I was hoping to use the Homo_sapiens.NCBI36.52.gtf in cuffcompare with cufflinks results for my fastq reads, and obtain the respective counts.

                        Could you help me with obtaining the number of reads mapping to genes/transcripts/exons using tophat-cufflinks-cuffcompare combo?

                        Thanks.
                        --
                        bioinfosm

                        Comment


                        • #27
                          Originally posted by marvin.j View Post
                          Just to let you know: Curating the RefSeq output with a little script resolved the crash reported earlier.

                          The script adds suffixes to RefSeq transcript IDs which refer to more than one genomic locus. The output is then a GTF with only exons, linked together by their (now unique) transcript_ids and supplemented with a gene_id as well (RefSeq.name2 aka gene name).

                          If someone is interested I can clean the code up (python) and post it here.
                          marvin.j - that sounds like a very helpful script for working with UCSC gtf files. If you have time to post the code I'm sure many of us would be grateful.

                          Comment


                          • #28
                            "Seqmentation fault" error with cufflinks

                            Hi,
                            Did anyone figured out how to fix the error of segmentation fault that is arriving from running cufflinks. It occurs only when I use the annotation file in GTF format. The GTF file, I am downloading from UCSC, which contains some thing like below. Really appreciate any help I could get here.
                            Thanks

                            chr1 canFam2_refGene exon 16743049 16743195 0.000000 + . gene_id "NM_001002949"; transcript_id "NM_001002949";
                            chr1 canFam2_refGene start_codon 16743704 16743706 0.000000 + . gene_id "NM_001002949"; transcript_id "NM_001002949";
                            chr1 canFam2_refGene CDS 16743704 16743859 0.000000 + 0 gene_id "NM_001002949"; transcript_id "NM_001002949";
                            chr1 canFam2_refGene exon 16743422 16743859 0.000000 + . gene_id "NM_001002949"; transcript_id "NM_001002949";
                            chr1 canFam2_refGene CDS 16743943 16744269 0.000000 + 0 gene_id "NM_001002949"; transcript_id "NM_001002949";

                            Comment


                            • #29
                              Originally posted by marvin.j View Post
                              Just to let you know: Curating the RefSeq output with a little script resolved the crash reported earlier.

                              The script adds suffixes to RefSeq transcript IDs which refer to more than one genomic locus. The output is then a GTF with only exons, linked together by their (now unique) transcript_ids and supplemented with a gene_id as well (RefSeq.name2 aka gene name).

                              If someone is interested I can clean the code up (python) and post it here.
                              marvin.j, I would also be very grateful if you would post this python script somewhere!

                              Comment


                              • #30
                                script wanted

                                Originally posted by marvin.j View Post
                                Just to let you know: Curating the RefSeq output with a little script resolved the crash reported earlier.

                                The script adds suffixes to RefSeq transcript IDs which refer to more than one genomic locus. The output is then a GTF with only exons, linked together by their (now unique) transcript_ids and supplemented with a gene_id as well (RefSeq.name2 aka gene name).

                                If someone is interested I can clean the code up (python) and post it here.
                                Hi Marvin.j
                                I have the same problem at running cuffcompare with refGene data. Could you send the script to me ([email protected])? Thanks!

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Essential Discoveries and Tools in Epitranscriptomics
                                  by seqadmin




                                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                  04-22-2024, 07:01 AM
                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 05-02-2024, 08:06 AM
                                0 responses
                                16 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-30-2024, 12:17 PM
                                0 responses
                                20 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-29-2024, 10:49 AM
                                0 responses
                                25 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-25-2024, 11:49 AM
                                0 responses
                                28 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X