Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • dalesan
    replied
    I can't really give you any guidance other than pointing you to a post I spotted on biostars, as I am having the same problem of adjacent loci being lumped together under the same XLOC. Actually, I would take a look at the loci with the adjacent genes in the IGV browser as a first step. In my own case, I noticed that some of the reads for the locus in question spanned two annotated genes. It could be that this overlap represents a potential anti-sense scenario, but without the stranded information we can't be sure. So perhaps it makes "sense" that cufflinks merged these reads into one giant transcriptional unit, which actually contains two distinct gene loci. If you note, there is a dotted vertical line in the following image:



    To the right of the dotted line are the reads mapping to a known Arabidopsis gene. To the left of the dotted line are reads mapping to another known Arabidopsis gene. You'll notice that some of the reads bridge the two genes causing cufflinks to merge them into a giant transcriptional unit (blue bar at the bottom).

    I think what I will end up doing is discarding these merged loci from further analysis as the data may not be reliable.

    Quoting Devon Ryan from Biostars (original post here: https://www.biostars.org/p/104551/)
    The only real answer would be to look through the cufflinks source code, since this isn't documented anywhere. I would guess that these are merged into a single locus for processing because the annotation file you gave to cufflinks, likely combined with the modifications it made to the annotated transcripts given your alignments, produced possibly overlapping features (genes in this case) that might need to be processed as a single unit. If you used an unstranded library where WASH7P was expressed, then cufflinks might have just merged that, DDX11L1, and MIR1302-10 into a single transcript, in which case treating the whole region as a single locus would make more sense. I suspect that cufflinks pre-bins the genome according to possible cases like this and then processes them separately, often producing multiple final loci. That's a slightly educated guess, at least.

    Welcome to the wonderful world of completely undocumented features :P

    Leave a comment:


  • beeman
    replied
    *bump*

    Can no-one provide some guidance on my problem

    Leave a comment:


  • beeman
    started a topic cuffmerge combining genes from adjacent loci??

    cuffmerge combining genes from adjacent loci??

    Hi,

    After running CuffDiff, I see many of the significant genes in gene_exp.diff are annotated with multiple geneids. I have looked in the merged.gft file I created from the assemblies generated by cufflinks from all of my replicates, and the test_id/gene_id (XLOC_..) is assigned to several different genes that don't appear to have any overlapping exons but are adjacent to one another. I'm really struggling to understand why this is happening and what I can do to improve the analysis.

    Am I doing something wrong in the cuffmerge step? Why would the transcripts be getting merged?

    My workflow is as follows.
    1. Run tophat (v2.0.13) on samples
    $ tophat -G models.gff --library-type=fr-firststrand bowtie_index fastq_file.fq

    2. Run cufflinks (v2.2.1) for all samples and replicates
    $ cufflinks -g models.gff --library-type=fr-firststrand -o (sample)_(replicate) accepted_hits.bam

    3. Merge all assemblies
    $ cuffmerge -g models.gff -s genome_seq.fa -p 6 assemblies.txt
    * Obviously data in assemblies.txt points to all of the transcript.gtf files from cufflinks output.

    4. Run cuffdiff
    $ cuffdiff -o comaprison_id -L cond1,cond2 --library-type=fr-firststrand -b genome_seq.fa -u merged_assemblies.gtf s1_A.bam,s1_B.bam,S1_C.bam s2_A.bam,s2_B.bam,S2_C.bam

    As an example, if I take the output from cuffdiff, I see for the gene XLOC_012766 that several gene ids have been assigned to this gene (GB51730,GB51731,GB51732)
    test_id gene_id gene locus sample_1 sample_2 status value_1 value_2 log2(fold_change) test_stat p_value q_value significant
    XLOC_012766 XLOC_012766 GB51730,GB51731,GB51732 chr8:12970857-12981659 tg_w4 tg_q4 OK 21.5873 56.0719 1.37709 2.65271 5e-05 0.00717169 yes

    And if I look at the entry for XLOC_012766 in the merged gtf file (sorry for the massive amount of output below), there don't appear to be any overlapping exons and yet these assembled transcripts are being grouped into the same gene region.

    chr8 Cufflinks exon 12970858 12970860 . + . gene_id "XLOC_012766"; transcript_id "TCONS_00046670"; exon_number "1"; gene_name "GB51730"; oId "GB51730-RA"; contained_in "TCONS_00046671"; nearest_ref "GB51730-RA"; class_code "="; tss_id "TSS23627"; p_id "P12060";
    chr8 Cufflinks exon 12972845 12972919 . + . gene_id "XLOC_012766"; transcript_id "TCONS_00046670"; exon_number "2"; gene_name "GB51730"; oId "GB51730-RA"; contained_in "TCONS_00046671"; nearest_ref "GB51730-RA"; class_code "="; tss_id "TSS23627"; p_id "P12060";
    chr8 Cufflinks exon 12972974 12973024 . + . gene_id "XLOC_012766"; transcript_id "TCONS_00046670"; exon_number "3"; gene_name "GB51730"; oId "GB51730-RA"; contained_in "TCONS_00046671"; nearest_ref "GB51730-RA"; class_code "="; tss_id "TSS23627"; p_id "P12060";
    chr8 Cufflinks exon 12970858 12970860 . + . gene_id "XLOC_012766"; transcript_id "TCONS_00046671"; exon_number "1"; gene_name "GB51732"; oId "CUFF.11857.4"; nearest_ref "GB51732-RA"; class_code "j"; tss_id "TSS23627";
    chr8 Cufflinks exon 12972845 12972919 . + . gene_id "XLOC_012766"; transcript_id "TCONS_00046671"; exon_number "2"; gene_name "GB51732"; oId "CUFF.11857.4"; nearest_ref "GB51732-RA"; class_code "j"; tss_id "TSS23627";
    chr8 Cufflinks exon 12972974 12973290 . + . gene_id "XLOC_012766"; transcript_id "TCONS_00046671"; exon_number "3"; gene_name "GB51732"; oId "CUFF.11857.4"; nearest_ref "GB51732-RA"; class_code "j"; tss_id "TSS23627";
    chr8 Cufflinks exon 12973358 12974471 . + . gene_id "XLOC_012766"; transcript_id "TCONS_00046671"; exon_number "4"; gene_name "GB51732"; oId "CUFF.11857.4"; nearest_ref "GB51732-RA"; class_code "j"; tss_id "TSS23627";
    chr8 Cufflinks exon 12974596 12974698 . + . gene_id "XLOC_012766"; transcript_id "TCONS_00046671"; exon_number "5"; gene_name "GB51732"; oId "CUFF.11857.4"; nearest_ref "GB51732-RA"; class_code "j"; tss_id "TSS23627";
    chr8 Cufflinks exon 12975020 12975167 . + . gene_id "XLOC_012766"; transcript_id "TCONS_00046671"; exon_number "6"; gene_name "GB51732"; oId "CUFF.11857.4"; nearest_ref "GB51732-RA"; class_code "j"; tss_id "TSS23627";
    chr8 Cufflinks exon 12976190 12976308 . + . gene_id "XLOC_012766"; transcript_id "TCONS_00046671"; exon_number "7"; gene_name "GB51732"; oId "CUFF.11857.4"; nearest_ref "GB51732-RA"; class_code "j"; tss_id "TSS23627";
    chr8 Cufflinks exon 12978130 12978150 . + . gene_id "XLOC_012766"; transcript_id "TCONS_00046671"; exon_number "8"; gene_name "GB51732"; oId "CUFF.11857.4"; nearest_ref "GB51732-RA"; class_code "j"; tss_id "TSS23627";
    chr8 Cufflinks exon 12970858 12970860 . + . gene_id "XLOC_012766"; transcript_id "TCONS_00046674"; exon_number "1"; gene_name "GB51732"; oId "CUFF.11857.5"; nearest_ref "GB51732-RA"; class_code "j"; tss_id "TSS23627";
    chr8 Cufflinks exon 12972845 12972919 . + . gene_id "XLOC_012766"; transcript_id "TCONS_00046674"; exon_number "2"; gene_name "GB51732"; oId "CUFF.11857.5"; nearest_ref "GB51732-RA"; class_code "j"; tss_id "TSS23627";
    chr8 Cufflinks exon 12972974 12973290 . + . gene_id "XLOC_012766"; transcript_id "TCONS_00046674"; exon_number "3"; gene_name "GB51732"; oId "CUFF.11857.5"; nearest_ref "GB51732-RA"; class_code "j"; tss_id "TSS23627";
    chr8 Cufflinks exon 12973358 12974471 . + . gene_id "XLOC_012766"; transcript_id "TCONS_00046674"; exon_number "4"; gene_name "GB51732"; oId "CUFF.11857.5"; nearest_ref "GB51732-RA"; class_code "j"; tss_id "TSS23627";
    chr8 Cufflinks exon 12974596 12974698 . + . gene_id "XLOC_012766"; transcript_id "TCONS_00046674"; exon_number "5"; gene_name "GB51732"; oId "CUFF.11857.5"; nearest_ref "GB51732-RA"; class_code "j"; tss_id "TSS23627";
    chr8 Cufflinks exon 12975020 12975167 . + . gene_id "XLOC_012766"; transcript_id "TCONS_00046674"; exon_number "6"; gene_name "GB51732"; oId "CUFF.11857.5"; nearest_ref "GB51732-RA"; class_code "j"; tss_id "TSS23627";
    chr8 Cufflinks exon 12976190 12976308 . + . gene_id "XLOC_012766"; transcript_id "TCONS_00046674"; exon_number "7"; gene_name "GB51732"; oId "CUFF.11857.5"; nearest_ref "GB51732-RA"; class_code "j"; tss_id "TSS23627";
    chr8 Cufflinks exon 12978684 12979560 . + . gene_id "XLOC_012766"; transcript_id "TCONS_00046674"; exon_number "8"; gene_name "GB51732"; oId "CUFF.11857.5"; nearest_ref "GB51732-RA"; class_code "j"; tss_id "TSS23627";
    chr8 Cufflinks exon 12970858 12970860 . + . gene_id "XLOC_012766"; transcript_id "TCONS_00046673"; exon_number "1"; gene_name "GB51732"; oId "CUFF.11857.3"; nearest_ref "GB51732-RA"; class_code "j"; tss_id "TSS23627";
    chr8 Cufflinks exon 12972845 12972919 . + . gene_id "XLOC_012766"; transcript_id "TCONS_00046673"; exon_number "2"; gene_name "GB51732"; oId "CUFF.11857.3"; nearest_ref "GB51732-RA"; class_code "j"; tss_id "TSS23627";
    chr8 Cufflinks exon 12972974 12973290 . + . gene_id "XLOC_012766"; transcript_id "TCONS_00046673"; exon_number "3"; gene_name "GB51732"; oId "CUFF.11857.3"; nearest_ref "GB51732-RA"; class_code "j"; tss_id "TSS23627";
    chr8 Cufflinks exon 12973358 12974471 . + . gene_id "XLOC_012766"; transcript_id "TCONS_00046673"; exon_number "4"; gene_name "GB51732"; oId "CUFF.11857.3"; nearest_ref "GB51732-RA"; class_code "j"; tss_id "TSS23627";
    chr8 Cufflinks exon 12974596 12974698 . + . gene_id "XLOC_012766"; transcript_id "TCONS_00046673"; exon_number "5"; gene_name "GB51732"; oId "CUFF.11857.3"; nearest_ref "GB51732-RA"; class_code "j"; tss_id "TSS23627";
    chr8 Cufflinks exon 12976190 12976308 . + . gene_id "XLOC_012766"; transcript_id "TCONS_00046673"; exon_number "6"; gene_name "GB51732"; oId "CUFF.11857.3"; nearest_ref "GB51732-RA"; class_code "j"; tss_id "TSS23627";
    chr8 Cufflinks exon 12978684 12979560 . + . gene_id "XLOC_012766"; transcript_id "TCONS_00046673"; exon_number "7"; gene_name "GB51732"; oId "CUFF.11857.3"; nearest_ref "GB51732-RA"; class_code "j"; tss_id "TSS23627";
    chr8 Cufflinks exon 12970858 12970860 . + . gene_id "XLOC_012766"; transcript_id "TCONS_00046672"; exon_number "1"; gene_name "GB51732"; oId "CUFF.11857.2"; nearest_ref "GB51732-RA"; class_code "j"; tss_id "TSS23627";
    chr8 Cufflinks exon 12972845 12972919 . + . gene_id "XLOC_012766"; transcript_id "TCONS_00046672"; exon_number "2"; gene_name "GB51732"; oId "CUFF.11857.2"; nearest_ref "GB51732-RA"; class_code "j"; tss_id "TSS23627";
    chr8 Cufflinks exon 12972974 12973290 . + . gene_id "XLOC_012766"; transcript_id "TCONS_00046672"; exon_number "3"; gene_name "GB51732"; oId "CUFF.11857.2"; nearest_ref "GB51732-RA"; class_code "j"; tss_id "TSS23627";
    chr8 Cufflinks exon 12973358 12974463 . + . gene_id "XLOC_012766"; transcript_id "TCONS_00046672"; exon_number "4"; gene_name "GB51732"; oId "CUFF.11857.2"; nearest_ref "GB51732-RA"; class_code "j"; tss_id "TSS23627";
    chr8 Cufflinks exon 12974596 12974698 . + . gene_id "XLOC_012766"; transcript_id "TCONS_00046672"; exon_number "5"; gene_name "GB51732"; oId "CUFF.11857.2"; nearest_ref "GB51732-RA"; class_code "j"; tss_id "TSS23627";
    chr8 Cufflinks exon 12975020 12975167 . + . gene_id "XLOC_012766"; transcript_id "TCONS_00046672"; exon_number "6"; gene_name "GB51732"; oId "CUFF.11857.2"; nearest_ref "GB51732-RA"; class_code "j"; tss_id "TSS23627";
    chr8 Cufflinks exon 12976190 12976308 . + . gene_id "XLOC_012766"; transcript_id "TCONS_00046672"; exon_number "7"; gene_name "GB51732"; oId "CUFF.11857.2"; nearest_ref "GB51732-RA"; class_code "j"; tss_id "TSS23627";
    chr8 Cufflinks exon 12978684 12979560 . + . gene_id "XLOC_012766"; transcript_id "TCONS_00046672"; exon_number "8"; gene_name "GB51732"; oId "CUFF.11857.2"; nearest_ref "GB51732-RA"; class_code "j"; tss_id "TSS23627";
    chr8 Cufflinks exon 12973248 12973290 . + . gene_id "XLOC_012766"; transcript_id "TCONS_00046675"; exon_number "1"; gene_name "GB51731"; oId "GB51731-RA"; contained_in "TCONS_00046671"; nearest_ref "GB51731-RA"; class_code "="; tss_id "TSS23628"; p_id "P12061";
    chr8 Cufflinks exon 12973358 12973449 . + . gene_id "XLOC_012766"; transcript_id "TCONS_00046675"; exon_number "2"; gene_name "GB51731"; oId "GB51731-RA"; contained_in "TCONS_00046671"; nearest_ref "GB51731-RA"; class_code "="; tss_id "TSS23628"; p_id "P12061";
    chr8 Cufflinks exon 12974356 12974471 . + . gene_id "XLOC_012766"; transcript_id "TCONS_00046676"; exon_number "1"; gene_name "GB51732"; oId "GB51732-RA"; contained_in "TCONS_00046671"; nearest_ref "GB51732-RA"; class_code "="; tss_id "TSS23629"; p_id "P12062";
    chr8 Cufflinks exon 12974596 12974698 . + . gene_id "XLOC_012766"; transcript_id "TCONS_00046676"; exon_number "2"; gene_name "GB51732"; oId "GB51732-RA"; contained_in "TCONS_00046671"; nearest_ref "GB51732-RA"; class_code "="; tss_id "TSS23629"; p_id "P12062";
    chr8 Cufflinks exon 12975020 12975167 . + . gene_id "XLOC_012766"; transcript_id "TCONS_00046676"; exon_number "3"; gene_name "GB51732"; oId "GB51732-RA"; contained_in "TCONS_00046671"; nearest_ref "GB51732-RA"; class_code "="; tss_id "TSS23629"; p_id "P12062";
    chr8 Cufflinks exon 12976190 12976308 . + . gene_id "XLOC_012766"; transcript_id "TCONS_00046676"; exon_number "4"; gene_name "GB51732"; oId "GB51732-RA"; contained_in "TCONS_00046671"; nearest_ref "GB51732-RA"; class_code "="; tss_id "TSS23629"; p_id "P12062";
    chr8 Cufflinks exon 12978130 12978150 . + . gene_id "XLOC_012766"; transcript_id "TCONS_00046676"; exon_number "5"; gene_name "GB51732"; oId "GB51732-RA"; contained_in "TCONS_00046671"; nearest_ref "GB51732-RA"; class_code "="; tss_id "TSS23629"; p_id "P12062";
    chr8 Cufflinks exon 12976053 12976115 . + . gene_id "XLOC_012766"; transcript_id "TCONS_00046677"; exon_number "1"; gene_name "GB51732"; oId "CUFF.11857.8"; nearest_ref "GB51732-RA"; class_code "j"; tss_id "TSS23630";
    chr8 Cufflinks exon 12976190 12976308 . + . gene_id "XLOC_012766"; transcript_id "TCONS_00046677"; exon_number "2"; gene_name "GB51732"; oId "CUFF.11857.8"; nearest_ref "GB51732-RA"; class_code "j"; tss_id "TSS23630";
    chr8 Cufflinks exon 12978684 12979560 . + . gene_id "XLOC_012766"; transcript_id "TCONS_00046677"; exon_number "3"; gene_name "GB51732"; oId "CUFF.11857.8"; nearest_ref "GB51732-RA"; class_code "j"; tss_id "TSS23630";
    chr8 Cufflinks exon 12978234 12978452 . + . gene_id "XLOC_012766"; transcript_id "TCONS_00046678"; exon_number "1"; gene_name "GB51676"; oId "CUFF.11857.9"; nearest_ref "GB51676-RA"; class_code "x"; tss_id "TSS23631";
    chr8 Cufflinks exon 12978684 12979560 . + . gene_id "XLOC_012766"; transcript_id "TCONS_00046678"; exon_number "2"; gene_name "GB51676"; oId "CUFF.11857.9"; nearest_ref "GB51676-RA"; class_code "x"; tss_id "TSS23631";

    for completeness, here is the entry for the above genes in my model.gff file.

    $ grep GB51730 models.gff |grep exon
    chr8 amel_OGSv3.2 exon 12970858 12970860 . + . Parent=GB51730-RA
    chr8 amel_OGSv3.2 exon 12972845 12972919 . + . Parent=GB51730-RA
    chr8 amel_OGSv3.2 exon 12972974 12973024 . + . Parent=GB51730-RA
    $ grep GB51731 models.gff |grep exon
    chr8 amel_OGSv3.2 exon 12973248 12973290 . + . Parent=GB51731-RA
    chr8 amel_OGSv3.2 exon 12973358 12973449 . + . Parent=GB51731-RA
    $ grep GB51732 models.gff |grep exon
    chr8 amel_OGSv3.2 exon 12974356 12974471 . + . Parent=GB51732-RA
    chr8 amel_OGSv3.2 exon 12974596 12974698 . + . Parent=GB51732-RA
    chr8 amel_OGSv3.2 exon 12975020 12975167 . + . Parent=GB51732-RA
    chr8 amel_OGSv3.2 exon 12976190 12976308 . + . Parent=GB51732-RA
    chr8 amel_OGSv3.2 exon 12978130 12978150 . + . Parent=GB51732-RA
    grep GB51676 models.gff |grep exon
    chr8 amel_OGSv3.2 exon 12979494 12979760 . - . Parent=GB51676-RA
    chr8 amel_OGSv3.2 exon 12980861 12981025 . - . Parent=GB51676-RA
    chr8 amel_OGSv3.2 exon 12981489 12981659 . - . Parent=GB51676-RA


    Am I doing something stupid ?? Can anyone explain what is going on and how I might be able to overcome this? I'm really stumped and hoping someone with more experience with cufflinks can point out where I am going wrong or suggest some options for improving the output.

    Thanks in advance.
    Last edited by beeman; 11-26-2014, 06:00 PM. Reason: added entries from models.gff file

Latest Articles

Collapse

  • seqadmin
    Exploring the Dynamics of the Tumor Microenvironment
    by seqadmin




    The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
    07-08-2024, 03:19 PM
  • seqadmin
    Exploring Human Diversity Through Large-Scale Omics
    by seqadmin


    In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
    06-25-2024, 06:43 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 11:09 AM
0 responses
16 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-19-2024, 07:20 AM
0 responses
148 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-16-2024, 05:49 AM
0 responses
124 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-15-2024, 06:53 AM
0 responses
111 views
0 likes
Last Post seqadmin  
Working...
X