Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • vincebrown
    Junior Member
    • Jul 2010
    • 6

    Finding exon-exon junction

    Hi,

    I have a list of around 1000 peptides and I want to find which one of those might have come from a gene coded by an exon-exon junction.

    I shall be thankful if somebody can help me figure this out as I am not sure of a precise way to perform this task.

    Vince
  • liux
    Member
    • Mar 2009
    • 30

    #2
    try wise2: http://www.ebi.ac.uk/Tools/Wise2/index.html

    Comment

    • malachig
      Senior Member
      • Aug 2010
      • 117

      #3
      Another option is to compare them to a database of peptides corresponding to exon-exon junctions. For example, these are made available as part of a recent publication here:

      ALEXA-Seq downloads

      For human genes, the files are available for two versions of the genome here: hg18 and hg19

      Each known or hypothetical junction corresponds to an Ensembl exon.

      Comment

      • vincebrown
        Junior Member
        • Jul 2010
        • 6

        #4
        Hi,

        Thanks for the information. Although I forgot to mention
        that these are sequences for a fungi Pichia Pastoris.

        The genomic information is available at NCBI, I am planning to use the algorithm
        and download the NCBI genome and try.

        Do you think this is the correct way of doing it for this specific organism?

        Thanks,
        Vince

        Comment

        • malachig
          Senior Member
          • Aug 2010
          • 117

          #5
          Yes, the option suggested by liux should work for you then. If you are only concerned with identifying matches to known genes, you could also compare your list of peptides to the known ORFeome of your species (say using blastp). Or perhaps a six-frame translation of the transcriptome (say using tblastn) if you do not want to figure out the actual ORFs.

          Comment

          • vincebrown
            Junior Member
            • Jul 2010
            • 6

            #6
            Hi,

            Thanks for being patient, I am beginner in bioinformatics analysis

            All I am concerned is I have a set of peptides and I would like to know if they came from an exon-exon junction, which means there was a splicing event that took place as the coverage of the peptides were from more than one gene.

            If I use a tblastn to figure, can the results distinguish between the peptides
            which came entirely from one know gene and ones which came from a junction?

            Liux and your reply seems quite promising but if a blast can solve the problem I would prefer that, my mentor did not indicate specific tools to do this.

            Thanks again,
            Vince

            Comment

            • litc
              Member
              • Oct 2010
              • 24

              #7
              Originally posted by liux View Post
              I agree with that, Genewise can do that job.

              Comment

              • malachig
                Senior Member
                • Aug 2010
                • 117

                #8
                When one says 'splicing event' it is usually understood to mean the joining of exons of a single gene. This is how a pre-messenger RNA becomes a mature messenger RNA. There are certainly peptides that correspond to the junctions of adjacent exons in a gene.

                Based on your last post, it sounds like you are talking about something else because you refer to peptides from "more than one gene". This is an important distinction because it influences the analysis approach you would take. Splicing can occur between different genes by a process called 'trans-splicing' although this is much less understood than constitutive splicing and alternative splicing. You may also be referring to a 'fusion gene'. These occur when the genome itself has been rearranged. For example, if a rearrangement happens and the break point is within an intron you can get a fused gene (some people call them chimeras) where exons from two different genes may get spliced together into a novel fusion transcript. Detecting these is practically an entire field of next generation sequencing analysis. If that is what you are trying to detect, then the analysis approach would have to be altered.

                Perhaps we should back up slightly to understand the goal more clearly. What is the nature or your data? How was it generated?

                I would also suggest that you quickly read up about RNA splicing, trans-splicing, and fusion genes. Which of these (if any) are you interested in?.

                Comment

                • vincebrown
                  Junior Member
                  • Jul 2010
                  • 6

                  #9
                  @malachig

                  Yes I should step back a little and try to focus on the goal. I am asked to find out from a list of peptide data obtained from Mass Spec, if these peptides span more than one exon ie . they are from exon-exon junctions. I may have got confused with alternate splicing.

                  Can you suggest a method, just to check which are likely to span more than one exon.?

                  Comment

                  • malachig
                    Senior Member
                    • Aug 2010
                    • 117

                    #10
                    Sounds like the original suggestion of wise2 would be the easiest. If you don't like wise2 or would like another option, any gapped aligner that accepts protein sequence should work. For example, Exonerate "will allow introns in the alignment, but also allow frameshifts, and exon phase changes when a codon is split by an intron". Instructions for using Exonerate for this purpose are here. You can obtain the genome sequence various places including at www.pichiagenome.org and bioinformatics.psb.ugent.be

                    If the alignment is reported as a single block, then the peptide likely does not span a junction. If you get a nice gapped alignment, and the boundaries look like valid splice sites, then you probably have a junction peptide. There is a caveat though. Gapped aligners require a reasonable amount of sequence on both sides of the junction to create an accurate gapped alignment. If your peptides are very short it may make this task difficult.

                    Also, if P. pastoris is like other members of the Saccharomycetaceae family (such as bakers yeast) it may have a relatively simple transcriptome. Many genes may consist of only a single exon and only a subset may actually have multiple exons. So peptides corresponding to junctions might be rare for that reason as well.... I'm not familiar with this species. Presumably, pertinent information is readily available in the genome paper for P. pastoris

                    Comment

                    • malachig
                      Senior Member
                      • Aug 2010
                      • 117

                      #11
                      Just noticed an alternative tool that might serve the same function as wise2 for this problem called 'ProSplign' that has recently become available (manuscript still in preparation according to the website). From the website:

                      ProSplign is a utility for computing the alignment of proteins to genomic nucleotide sequence. This alignment can include eukaryotic splicing. At the heart of the program is a global alignment algorithm that specifically accounts for introns and splice signals. It is due to this algorithm that ProSplign is accurate in determining splice sites and tolerant to sequencing errors.

                      ProSplign uses BLAST hits to identify possible locations of genes and their duplications on genomic sequences and then to speed up the core dynamic programming.

                      ProSplign was developed with the following goals in mind:

                      * Accuracy in determining splice signals
                      * Recognition of short exons and non-consensus splices where feasible
                      * Ability to identify and separate multiple compartments typically representing gene copying events

                      ProSplign is used to compute transcript alignments as a part of the NCBI Genome Annotation Pipeline.

                      Reference: ProSplign - Protein to Genomic Alignment Tool. B. Kiryutin, A. Souvorov, T. Tatusova. Manuscript in preparation

                      Comment

                      Latest Articles

                      Collapse

                      • SEQadmin2
                        From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                        by SEQadmin2


                        Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                        The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                        ...
                        Yesterday, 10:05 AM
                      • SEQadmin2
                        Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                        by SEQadmin2


                        With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                        Introduction

                        Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                        05-22-2026, 06:42 AM
                      • SEQadmin2
                        Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                        by SEQadmin2

                        Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                        Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                        05-06-2026, 09:04 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by SEQadmin2, Yesterday, 12:03 PM
                      0 responses
                      19 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, Yesterday, 11:40 AM
                      0 responses
                      14 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 05-28-2026, 11:40 AM
                      0 responses
                      29 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 05-26-2026, 10:12 AM
                      0 responses
                      31 views
                      0 reactions
                      Last Post SEQadmin2  
                      Working...