Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fusion gene analysis using Illumina rna seq data.

    Dear All,

    I am learning gene fusion analysis using 100bp paired-end rna seq data and I am very naive to rna-seq analysis. so here it goes..

    I managed to do the basic analysis, e.g. finding differential gene expression using tophat2 and DESeq2 and I have the desired results that we expected from it.
    Now the problem is with Gene fusion analysis, I did the comparative analysis between tophat-fusion, tophat-fusion post and FusionCatcher. One good thing I like about FusionCatcher is that it gives you an already known fusion genes in the output, which at least makes me feel secure as being a new this analysis.

    Now the problem is, I did not find the fusion of genes, say XX, from both the programs, and the problem here is that the gene XX shows some band with higher weight at western blot. so what could be reason for either not finding the fusion through standard tool (maybe there is not fusion) or there is false positive result with western blot (my collaborative partners are pretty sure there is fusion and they found in multiple replicates)

    Can anyone ever face such problems and happy to share their experience? Secondly, is there any other tool should I used or any other way?

    Please let me know if the problem is unclear to you.

    Looking forward to and thanks a lot in advance!

    Cheers,
    Himan

  • #2
    Originally posted by mhimanshu View Post
    Dear All,

    I am learning gene fusion analysis using 100bp paired-end rna seq data and I am very naive to rna-seq analysis. so here it goes..

    I managed to do the basic analysis, e.g. finding differential gene expression using tophat2 and DESeq2 and I have the desired results that we expected from it.
    Now the problem is with Gene fusion analysis, I did the comparative analysis between tophat-fusion, tophat-fusion post and FusionCatcher. One good thing I like about FusionCatcher is that it gives you an already known fusion genes in the output, which at least makes me feel secure as being a new this analysis.

    Now the problem is, I did not find the fusion of genes, say XX, from both the programs, and the problem here is that the gene XX shows some band with higher weight at western blot. so what could be reason for either not finding the fusion through standard tool (maybe there is not fusion) or there is false positive result with western blot (my collaborative partners are pretty sure there is fusion and they found in multiple replicates)

    Can anyone ever face such problems and happy to share their experience? Secondly, is there any other tool should I used or any other way?

    Please let me know if the problem is unclear to you.

    Looking forward to and thanks a lot in advance!

    Cheers,
    Himan
    It is challenging to say what is going there, but here are some guesses:
    a) it may be that the sample was not RNA sequenced deep enough (i.e. there are no reads in the FASTQ files which support the fusion) even that the fusion can be found in the biological sample and increasing the number of reads might help here;
    b) if the gene fusion is known a-priori then one may use the PCR primer sequences (if you have them or if you can find them) and search/GREP for them in the input FASTQ files
    c) does the XX fusion gene appear in the preliminary list of fusions in case of FusionCatcher?
    d) it may be that the fusion gene XX appears in the list of found fusion genes by TopHat-fusion and FusionCatcher under a different name because many genes have synonyms (for example in old papers the fusion IGH-WHSC1 is known as IGH-MMSET)
    e) are those genes from the gene fusion XX expressed in the RNA-seq data (how many reads are mapping on each of them)?
    f) is the fusion gene XX a novel one? Has it been published before? Is it a somatic fusion gene? Has been found in healthy individuals before (for example fusion TTTY15-USP9Y is known to be found in healthy individuals and some fusion finders might just skip it just for this very reason)?
    g) is the fusion gene a known read-through? Some fusion finders consider read-through genes as not being fusion genes (i.e. it will not be reported) while other consider that it is a valid fusion gene (and it will be reported).
    ....
    ...
    ....
    Last edited by ndaniel; 06-13-2015, 12:42 AM.

    Comment


    • #3
      Hi ndaniel,

      Thank you for your reply and guesses

      here're what I think of them and what I did and some more doubts

      a) it may be that the sample was not RNA sequenced deep enough (i.e. there are no reads in the FASTQ files which support the fusion) even that the fusion can be found in the biological sample and increasing the number of reads might help here;

      I agree, this could be one of the possibility but I have two replicates and I didn't find the fusion of XX gene in both the cases separately and even after combining them together.

      b) if the gene fusion is known a-priori then one may use the PCR primer sequences (if you have them or if you can find them) and search/GREP for them in the input FASTQ files

      This was my next step after consulting with the biologist. I do have some primer sequence for my gene XX but I am not sure what to do if I extract those sequences from the input file? What does this information will tell me about the fusion ? Could you please elaborate this suggestion?

      c) does the XX fusion gene appear in the preliminary list of fusions in case of FusionCatcher?

      It doesn't actually. I checked with both the potential list of fusion genes in FusionCatcher as well as Tophat-fusion.

      d) it may be that the fusion gene XX appears in the list of found fusion genes by TopHat-fusion and FusionCatcher under a different name because many genes have synonyms (for example in old papers the fusion IGH-WHSC1 is known as IGH-MMSET)

      This could be the case too, but my first criteria of looking was with the Chromosome number and then loci I didn't found this way too.

      e) are those genes from the gene fusion XX expressed in the RNA-seq data (how many reads are mapping on each of them)?

      In the sample of interest they are highly expressed in comparison with treated condition, the count of gene XX from htseq-count was 3.5K for gene XX (gene XX is one gene and we are not sure of the other possible fused gene)

      f) is the fusion gene XX a novel one? Has it been published before? Is it a somatic fusion gene? Has been found in healthy individuals before (for example fusion TTTY15-USP9Y is known to be found in healthy individuals and some fusion finders might just skip it just for this very reason)?

      Indeed it is a novel one.

      g) is the fusion gene a known read-through? Some fusion finders consider read-through genes as not being fusion genes (i.e. it will not be reported) while other consider that it is a valid fusion gene (and it will be reported).

      I am not sure what d you mean by read through here, my knowledge is limited so please ignore if I am wrong here. What I know from FusionCatcher is that it detects fusion genes and also give a rough idea what could be the possibility of such fusion and it includes read-through fusion. Does it makes sense?
      However, I some one more silly doubt. I am using IGV viewer to see the alignment, how does change in reference genome e.g., from hg19 to Ensemble hg38, will change the whole picture? I did the indexing and mapping with hg19. I am currently reading one review article (Wang et al 2012), hoping to find some information there.


      Thank you again
      Last edited by mhimanshu; 06-15-2015, 01:32 AM.

      Comment


      • #4
        Originally posted by mhimanshu View Post
        Dear All,

        I am learning gene fusion analysis using 100bp paired-end rna seq data and I am very naive to rna-seq analysis. so here it goes..

        I managed to do the basic analysis, e.g. finding differential gene expression using tophat2 and DESeq2 and I have the desired results that we expected from it.
        Now the problem is with Gene fusion analysis, I did the comparative analysis between tophat-fusion, tophat-fusion post and FusionCatcher. One good thing I like about FusionCatcher is that it gives you an already known fusion genes in the output, which at least makes me feel secure as being a new this analysis.

        Now the problem is, I did not find the fusion of genes, say XX, from both the programs, and the problem here is that the gene XX shows some band with higher weight at western blot. so what could be reason for either not finding the fusion through standard tool (maybe there is not fusion) or there is false positive result with western blot (my collaborative partners are pretty sure there is fusion and they found in multiple replicates)

        Can anyone ever face such problems and happy to share their experience? Secondly, is there any other tool should I used or any other way?

        Please let me know if the problem is unclear to you.

        Looking forward to and thanks a lot in advance!

        Cheers,
        Himan
        Is there a reason that you need to find evidence of the fusion directly in the same data set as your differential expression? It sounds like you're looking for a fusion to a specific gene, as opposed to looking for all fusions in general, which seems to be the perfect time to do some good ol' fashion RACE.

        Comment


        • #5
          Indeed yes, I am looking for a specific gene fusion. As we have already seen some evidence through western blot. I am currently reading this particular article (Panagopoulos et al 2014,2015) which was published recently and seems quite interesting May be help others too

          The ‘‘Grep’’ Command But Not FusionMap, FusionFinder or ChimeraScan Captures the CIC-DUX4 fusion Gene from Whole Transcriptome Sequencing Data on a Small Round Cell Tumor with t(4;19)(q35;q13)

          Comment


          • #6
            e) are those genes from the gene fusion XX expressed in the RNA-seq data (how many reads are mapping on each of them)?

            In the sample of interest they are highly expressed in comparison with treated condition, the count of gene XX from htseq-count was 3.5K for gene XX (gene XX is one gene and we are not sure of the other possible fused gene)
            Ok. I was under the false impression the here are known both gene partners of the novel fusion. Now I understand that only one gene is known! Right?

            I am currently reading this particular article (Panagopoulos et al 2014,2015) which was published recently and seems quite interesting May be help others too

            The ‘‘Grep’’ Command But Not FusionMap, FusionFinder or ChimeraScan Captures the CIC-DUX4 fusion Gene from Whole Transcriptome Sequencing Data on a Small Round Cell Tumor with t(4;19)(q35;q13)
            This is very tricky to get it right so I would say that a negative answer from grep would not mean much here. Here only a positive answer from grep would help!

            As we have already seen some evidence through western blot.
            Here would be great to have evidence at RNA level because here is RNA-seq data. There is a small gap between RNA and protein (e.g. is the antibody specific enough? is the antibody too specific? etc.)

            g) is the fusion gene a known read-through? Some fusion finders consider read-through genes as not being fusion genes (i.e. it will not be reported) while other consider that it is a valid fusion gene (and it will be reported).

            I am not sure what d you mean by read through here, my knowledge is limited so please ignore if I am wrong here. What I know from FusionCatcher is that it detects fusion genes and also give a rough idea what could be the possibility of such fusion and it includes read-through fusion. Does it makes sense?
            Read-through transcription occurs when the RNA-polymerase continues beyond the normal termination sequence and into an adjacent gene, usually within 20 kb [from: http://journals.plos.org/plosone/art....pone.0039987]

            However, I some one more silly doubt. I am using IGV viewer to see the alignment, how does change in reference genome e.g., from hg19 to Ensemble hg38, will change the whole picture? I did the indexing and mapping with hg19. I am currently reading one review article (Wang et al 2012), hoping to find some information there.
            It looks like you already have the bam/sam alignment files. That means that in IGV (or any other viewer) one can look and see on what genes the reads mates (of the reads mapping on the known fusion parner gene) are mapping on.

            Comment


            • #7
              Hi ndaniel,

              Ok. I was under the false impression the here are known both gene partners of the novel fusion. Now I understand that only one gene is known! Right?

              I think I also made it confused with double X, yes its one gene.

              Here would be great to have evidence at RNA level because here is RNA-seq data. There is a small gap between RNA and protein (e.g. is the antibody specific enough? is the antibody too specific? Etc.)

              I am not sure with this part, I have to discuss with the biologist who performed all these experiments.

              It looks like you already have the bam/sam alignment files. That means that in IGV (or any other viewer) one can look and see on what genes the reads mates (of the reads mapping on the known fusion parner gene) are mapping on.


              Yes I have the BAM/SAM files from tophat2 but my question was, if I change the genome from hg19 to ensemble hg38, I see different mapping results (at least I have checked for my gene of interest XX). Why this is happening, any clue?

              I have found the fusion of my XX gene of interest with 6 other genes in the preliminary list of gene fusion, however there is only one “Count_paired-end_reads” for all the 6 fusion pairs. Is there any way where I can check which one is false positive? Or may be give me some more conclusive result? As shown in Panagopoulos et al 2014,2015., they have found only single 101bp read which actually maps to two genes which they called as fusion gene. How would I extract such read? I am trying to grep all the reads which maps with the primer sequence of my gene XX and then going for BLAT & BLAST to confirm it. Is this the right way?

              I am sorry for these many question but I am learning this technique and would be happy to learn some more from the experts


              Thanks in advance!

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM
              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              59 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              57 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              52 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              56 views
              0 likes
              Last Post seqadmin  
              Working...
              X