Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • IsoEM - input/output files

    Hello,

    I am using IsoEM to estimate isoform expression level.
    As input, I provide a SAM file and a gtf file :
    Code:
    DS571145	alternativeSplicer	exon	3484	4143	.	-	.	gene_id "EHI_151170"; transcript_id "exon_EHI_151170.ref-1";
    DS571145	alternativeSplicer	exon	3484	3943	.	-	.	gene_id "EHI_151170"; transcript_id "exon_EHI_151170.alt1-1";
    DS571145	alternativeSplicer	exon	3997	4143	.	-	.	gene_id "EHI_151170"; transcript_id "exon_EHI_151170.alt1-2";
    ...
    I did this GTF file from a GFF file :
    Code:
    DS571145	alternativeSplicer	gene	3484	4143	.	-	.	ID=EHI_151170;Name=hypothetical protein;
    DS571145	alternativeSplicer	mRNA	3484	4143	.	-	.	ID=EHI_151170.ref;Name=EHI_151170.ref;Parent=EHI_151170;completeORF=yes
    DS571145	alternativeSplicer	exon	3484	4143	.	-	.	ID=exon_EHI_151170.ref-1;Name=exon;Parent=EHI_151170.ref;
    DS571145	alternativeSplicer	mRNA	3484	4143	.	-	.	ID=EHI_151170.alt1;Name=EHI_151170.alt1;Parent=EHI_151170;completeORF=no
    DS571145	alternativeSplicer	exon	3484	3943	.	-	.	ID=exon_EHI_151170.alt1-1;Name=exon;Parent=EHI_151170.alt1;
    DS571145	alternativeSplicer	exon	3997	4143	.	-	.	ID=exon_EHI_151170.alt1-2;Name=exon;Parent=EHI_151170.alt1;
    ...
    With the information that I got, I figured out that I must keep only the lines "exon". Am I right? I don't see how IsoEM can know the isoforms if I do like this...

    As a result, I get 2 files. The one called ...iso_estimates gives:
    Code:
    ...
    exon_EHI_151170.ref-1	37.03955548991607
    ...
    exon_EHI_151170.alt1-1	0.0
    exon_EHI_151170.alt1-2	0.0
    ...
    I understand that the result is the exon expression level estimation and not the isoform expression level estimation. Do I interpret badly the results?

    Has someone use IsoEM to this purpose? Do I have a good input GTF file?
    How to get the isoform expression level estimation?

    Thanks for your help,
    Jane

  • #2
    I can add that I am confused by the meaning of "exon" here. In one gene, exons can overlap, which shouldn't be possible. At least, they do not seem to overlap in a same mRNA !

    I can show you an example provided in the documentation of IsoEM:

    Code:
    chr1	hg18_knownGene_GnfAtlas2	exon	1116	2090	0.000000	+	.	gene_id "DUMMYCLUSTER.1"; transcript_id "uc001aaa.2";
    chr1	hg18_knownGene_GnfAtlas2	exon	2476	2584	0.000000	+	.	gene_id "DUMMYCLUSTER.1"; transcript_id "uc001aaa.2";
    chr1	hg18_knownGene_GnfAtlas2	exon	3084	4121	0.000000	+	.	gene_id "DUMMYCLUSTER.1"; transcript_id "uc001aaa.2";
    chr1	hg18_knownGene_GnfAtlas2	exon	1116	2090	0.000000	+	.	gene_id "DUMMYCLUSTER.1"; transcript_id "uc009vip.1";
    chr1	hg18_knownGene_GnfAtlas2	exon	2476	4272	0.000000	+	.	gene_id "DUMMYCLUSTER.1"; transcript_id "uc009vip.1";
    In one gene, we find overlapping exons in 2 different transcripts :

    Code:
    chr1	hg18_knownGene_GnfAtlas2	exon	2476	2584	0.000000	+	.	gene_id "DUMMYCLUSTER.1"; transcript_id "uc001aaa.2";
    chr1	hg18_knownGene_GnfAtlas2	exon	2476	4272	0.000000	+	.	gene_id "DUMMYCLUSTER.1"; transcript_id "uc009vip.1";
    I don't know if "exon" is the right name...
    In the example of the documentation, there are 2 transcripts, we can say 2 isoforms. The first isoform has 2 exons and the second isoform has 3 exons, but only 1 exon is the same in both isoforms. 1 exon seems to be differentially spliced. 1 is present in only one isoform. Am I right ?

    Then, I don't know how we are suppose to deduced isoform expression level estimation from the result files... Do we have to sum the FPKM of all "exons" present in one isoform?
    And my main question is: What should be the result: isoform or exon expression level estimation?

    Thanks for any help you can provide me !

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Essential Discoveries and Tools in Epitranscriptomics
      by seqadmin




      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
      04-22-2024, 07:01 AM
    • seqadmin
      Current Approaches to Protein Sequencing
      by seqadmin


      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
      04-04-2024, 04:25 PM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, 04-25-2024, 11:49 AM
    0 responses
    19 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-24-2024, 08:47 AM
    0 responses
    19 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-11-2024, 12:08 PM
    0 responses
    62 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 10:19 PM
    0 responses
    60 views
    0 likes
    Last Post seqadmin  
    Working...
    X