Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • creation of ExonCountSet in DEXSeq

    Hi,

    I am using DEXSeq for differential exon usage tests. In the vignette of producing ExonCountSet object, the author used dexseq_prepare_annotation.py to convert the GTF file to GFF file. In the GFF output, I see that (since the GTF is downloaded from Ensembl) the gene_id's start with "ENSG".

    I know that the next step is to use dexseq_count.py on the GFF and SAM files to generate counts. However, because currently we have the count data file (which we prefer to use), we are hoping to use our own counts (i.e. the treated2fb.txt as in the vignette example) for the analysis. The issue is that, our count files contain EntrezGene ID's, NOT Ensembl IDs, and the conversion between the two is not bijective (i.e. 1-1). Therefore, we I run the read.HTSeqCounts() function in R, the error message "Count files do not correspond to the flattened annotation file" appears.

    Question:

    (1) is Ensembl GTF the only input for dexseq_prepare_annotation.py? It seems the resultant GFF file contains only Ensembl gene IDs, accordingly...

    (2) in my case of non-Ensembl gene IDs, how can I instruct or manipulate the codes to generate an ExonCountSet object?

    Thank you!

  • #2
    Use the function "newExonCountDataSet" instead of "read.HTSeqCounts". See "?newExonCountDataSet" for details.

    Comment


    • #3
      Originally posted by Simon Anders View Post
      Use the function "newExonCountDataSet" instead of "read.HTSeqCounts". See "?newExonCountDataSet" for details.
      Thank you so much for the information, Simon! I think that's the function that suits my current situation ;-)

      Comment


      • #4
        So I got one more question: if I don't use read.HTSeqCounts() which takes an annotation file, then it takes extra efforts to generate a plot using plotDEXSeq(ExonCountSet, ...), right? The function read.HTSeqCounts() automatically takes care of that, but in newExonCountSet(), I need to specify the "transcripts" argument in order to plot? Thanks!

        Comment


        • #5
          Exactly. Or more specifically, you need the "transcripts" argument (and the "exonIntervals" argument) if you want to get gene models at the bottom of your plot.

          Comment


          • #6
            Originally posted by Simon Anders View Post
            Exactly. Or more specifically, you need the "transcripts" argument (and the "exonIntervals" argument) if you want to get gene models at the bottom of your plot.
            Hi Simon:

            Thanks for the information! I have two question while using DEXSeq, and hopefully you can help me clarify:

            1. in the newExonCountSet() function, you mentioned that in order to get the gene model at the bottom of plot, two more arguments need to be passed: "exonIntervals" and "transcripts" -- do you have an example of how these two inputs are formatted? Are they derived from the GFF files? Any functions in DEXSeq will help me to get the two inputs? (Sorry I didn't find any examples in the help document...)

            2. in the vignette written by Alejandro, about "data preprocessing and creation of the data objects pasillaGenes and pasillaExons", up to section 5 all the steps aim to generate the per-exon read counts for each sample (finally in the form of .txt files to be used in R). My question is, if I have those per-exon read counts files (from other sources) and would like to use them directly, can I just start from section 5: "creation of CountDataSet" using the per-exon read counts I have at hand? Actually, I have another file of junction reads, and I don't know if DEXSeq can ever take it as inputs in whatever functions? Or DEXSeq just work with per-exon read counts?

            Thank you so much!!

            Comment


            • #7
              Hi @alittleboy,

              The function newExonCountSet will allow you to generate your ExonCountSet from basic R data structures. For an example of an ExonCountSet object you could have a look at the pasillaExons object in the pasilla package. You will find how this is supposed to be formatted. Regarding junction reads, you could also input them in DEXSeq, and it will be consider as an additional exon bin!

              Alejandro

              Comment


              • #8
                Originally posted by areyes View Post
                Hi @alittleboy,

                The function newExonCountSet will allow you to generate your ExonCountSet from basic R data structures. For an example of an ExonCountSet object you could have a look at the pasillaExons object in the pasilla package. You will find how this is supposed to be formatted. Regarding junction reads, you could also input them in DEXSeq, and it will be consider as an additional exon bin!

                Alejandro
                Hi Alejandro:

                Thank you for your suggestions! Would you please be more specific on how can I input my junction_reads file to DEXSeq? I read the related vignette and online documents, but didn't find a solution... Thanks ;-)

                Comment


                • #9
                  Hi @alittleboy,

                  You probably realized that exons in DEXSeq are not "real" exons, but rather exon bins (defined as non overlapping exonic parts of transcripts, see the publication for more detail).

                  Now if you want to test your junction reads, you would have to add as a counting bin, e.g. as a row in your count data, a counting bin that reflects the junction between your exons of interest:

                  If you have this gene model with exons A, B and C:


                  ---[ A ]----[ B ]----[ C ]---

                  Your counting bins would be A, B and C, you will count reads that fall into this exons and your matrix would look like this ( I am making the numbers up):

                  A 2
                  B 4
                  C 3


                  If you want to test your exons bins you would need a matrix like this:

                  A 2
                  A-B 1
                  B 4
                  B-C 3
                  C 3
                  A-C 2

                  And input this into DEXSeq

                  Best regards,
                  Alejandro

                  Comment


                  • #10
                    Originally posted by areyes View Post
                    Hi @alittleboy,

                    You probably realized that exons in DEXSeq are not "real" exons, but rather exon bins (defined as non overlapping exonic parts of transcripts, see the publication for more detail).

                    Now if you want to test your junction reads, you would have to add as a counting bin, e.g. as a row in your count data, a counting bin that reflects the junction between your exons of interest:

                    If you have this gene model with exons A, B and C:


                    ---[ A ]----[ B ]----[ C ]---

                    Your counting bins would be A, B and C, you will count reads that fall into this exons and your matrix would look like this ( I am making the numbers up):

                    A 2
                    B 4
                    C 3


                    If you want to test your exons bins you would need a matrix like this:

                    A 2
                    A-B 1
                    B 4
                    B-C 3
                    C 3
                    A-C 2

                    And input this into DEXSeq

                    Best regards,
                    Alejandro
                    Hi @areyes:

                    That's very clear! Thanks for the examples -- I see how important to understand it is the exonic region (exon bin) instead of real exon that constitutes the building block for DEXSeq counting.

                    I remember someone in the past asked why testing these exonic regions instead of real exons is more important biologically and in terms of interpretations. That is also my concern and needs to be clarified. Would you share your thoughts on this? Sorry maybe you've already discussed before, and I appreciate if you can redirect me to the posts ;-)

                    I think maybe it's more concrete to give an example: the top gene in my test comparing two groups is ENSG00000113845. In the HTML output of DEXSeq, there are 22 exonID's from E001 to E022, and the last exon E022 is deferentially expressed. However, from the Ensembl website on this gene here, there are 9 transcripts (splice variants) for this gene. How can I know E022, the exon counting bin, corresponds to which transcript? Or the inference is only limited to the gene-level (maybe wrong), i.e. this gene has at least 1 exon that is DEU, but we don't know which exon is DEU?

                    Thank you so much for your clarifications!
                    Last edited by alittleboy; 07-01-2013, 05:21 PM.

                    Comment


                    • #11
                      DEXSeq exon bins

                      Thank you for this useful discussion.
                      If I understand correctly the "exon bins" to not correspond to actual exons?
                      I did run DEXSeq on 22 samples with condition pre- and post-treament. Inspecting one of the genes of interest ENSG00000113851, in plotDEXSeq this gene is plotted as having 37 exons (or bins) while in reality this gene has only 11 exons. Is the difference due to bin (instead of exon) counting or something went wrong in my analysis?

                      Comment


                      • #12
                        Hi @nbahlis,

                        The preprocessing scripts from DEXSeq define new exon bins based on non overlapping regions of the transcripts (http://www.ncbi.nlm.nih.gov/pmc/arti...195/figure/F1/). If you look at this gene ID in ENSEMBL, (http://www.ensembl.org/Homo_sapiens/...190676-3221394), Two isoforms contain 11 exons, but there are many other isoforms as well. The 37 exon bins defined in DEXSeq come from the preprocessing considering all these isoforms.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Current Approaches to Protein Sequencing
                          by seqadmin


                          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                          04-04-2024, 04:25 PM
                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 04-11-2024, 12:08 PM
                        0 responses
                        31 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 10:19 PM
                        0 responses
                        33 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 09:21 AM
                        0 responses
                        28 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-04-2024, 09:00 AM
                        0 responses
                        53 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X