Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • TopHat questions

    Hi all
    I am working with transcriptomic data generated by solexa, and i am using TopHat for mapping. I am trying to run the program with a GFF file in order to obtain gene expresion indices (-G option) but the program seems to not recognize none of the GFF files i specify. Does anyone know the structure of the GFF file appropiate for TopHat in order to do the mapping?
    Regards

  • #2
    Hi,

    I've just updated the TopHat manual with a brief explanation of TopHat's RPKM calculation and a simple example GFF file. Please see http://tophat.cbcb.umd.edu/manual.html

    Comment


    • #3
      GFF3 file for TopHat

      Originally posted by Eugeni View Post
      Hi all
      I am working with transcriptomic data generated by solexa, and i am using TopHat for mapping. I am trying to run the program with a GFF file in order to obtain gene expresion indices (-G option) but the program seems to not recognize none of the GFF files i specify. Does anyone know the structure of the GFF file appropiate for TopHat in order to do the mapping?
      Regards
      Hi
      can you share the first few lines of a GFF file so I can see how it is structured?
      Thanks

      Comment


      • #4
        Hello, I'm wondering if anybody can direct me to a GFF3 for mouse. I came across GFF3 files for many organisms besides mouse (http://www.sequenceontology.org/reso...databases.html).

        Comment


        • #5
          TopHat GFF3

          Originally posted by geriatrics1200 View Post
          Hello, I'm wondering if anybody can direct me to a GFF3 for mouse. I came across GFF3 files for many organisms besides mouse (http://www.sequenceontology.org/reso...databases.html).

          Please post the Gff3 if you ever get one!!!

          Comment


          • #6
            I thought Tophat needs GFF3 format files. you can download GTF files from ensembl and then transfer them to GFF3 foramt by using one perl script GFF2gtf.pl, you can look for it by google.

            Comment


            • #7
              I think it is better that the tophat website can provide the GFF3 file for some species, for example, the human. To convert the file format is a dirty work :-(
              Xi Wang

              Comment


              • #8
                Originally posted by Xi Wang View Post
                I think it is better that the tophat website can provide the GFF3 file for some species, for example, the human. To convert the file format is a dirty work :-(
                for more details, I downloaded knownGene GTF format annotation from UCSC table browser, and converted the format using gtf2gff3 tool. However, when i run tophat, i got the warining message as follows:
                Warning: TopHat did not find any junctions in GFF file

                i don't how the file should be for tophat using. i want to get help from you. thanks.
                Xi Wang

                Comment


                • #9
                  I am wondering whether all the junctions are based on the gene model or not if the gene annotation is given. Can it be inferred that the more comprehensive gene annotation (even with invalid genes) the better?

                  Thanks,
                  Xi
                  Xi Wang

                  Comment


                  • #10
                    It is so complicated. I am also not sure. I guess Tophat will check the junction when gene annotation is given. The junction is mainly built based on the bowtie mapping results. When you compare the tophat with two tries: annotation-try and no-annotation-try, you will find more junction with annotation-try. It is reasonable.But when you compare them, you will find that there are un-overlapped in both tries. You can say it is the results of gene annotation. It seems that more gene the better. But, I do not check the invalid gene will affect the results. Maybe, you can give us the answer.

                    Comment


                    • #11
                      Originally posted by lmf_bill View Post
                      It is so complicated. I am also not sure. I guess Tophat will check the junction when gene annotation is given. The junction is mainly built based on the bowtie mapping results. When you compare the tophat with two tries: annotation-try and no-annotation-try, you will find more junction with annotation-try. It is reasonable.But when you compare them, you will find that there are un-overlapped in both tries. You can say it is the results of gene annotation. It seems that more gene the better. But, I do not check the invalid gene will affect the results. Maybe, you can give us the answer.
                      Yes. From my experiments, I guess the results given by tophat is a mixture of junctions based on gene annotation and de novo discovering, if the gene annotation is given. But I can't understand why some (<1%) junction reads were not reported although there is a clear splice junction provided by the gene annotation, and even the de novo method (without gene annotations) can report these junctions. In my experiments, only uniquely mapped (or aligned) reads are reported. Could it be due to this reason, or be a possible bug of tophat?

                      BTW, did anyone think of this problem below?
                      Which mapping is better: (1) a read mapping to the genome as a whole with 2 mismatches; (2) the same read mapping to a possible splice junction with only 1 mismatch.

                      Thanks.
                      Xi Wang

                      Comment


                      • #12
                        To Xi Wang,
                        "But I can't understand why some (<1%) junction reads were not reported although there is a clear splice junction provided by the gene annotation, and even the de novo method (without gene annotations) can report these junctions."

                        RE:
                        It is strange. Can you paste one example?
                        another thing of mismatch setting, there is not exactly good selection. In my opinion, set 2 mismatches when as a whole mapping to genome, but 0 or 1 mismatches in the overhang when mapping to junction. it can improve the precision of splice junction prediction.

                        Comment


                        • #13
                          Originally posted by lmf_bill View Post
                          To Xi Wang,
                          "But I can't understand why some (<1%) junction reads were not reported although there is a clear splice junction provided by the gene annotation, and even the de novo method (without gene annotations) can report these junctions."

                          RE:
                          It is strange. Can you paste one example?
                          another thing of mismatch setting, there is not exactly good selection. In my opinion, set 2 mismatches when as a whole mapping to genome, but 0 or 1 mismatches in the overhang when mapping to junction. it can improve the precision of splice junction prediction.
                          Thanks for your reply, Bill.
                          Here I pasted a few splice junctions identified with gene annotation and without gene annotation respectively.

                          Code:
                          Chr#	letf_site	right_site	#juncReadWithoutGeneAnnotation	#juncReadWithGeneAnnotation	GeneName
                          chrY	21147161	21150881	4	0	EIF1AY
                          chrY	21147409	21150881	1	1	intragenic
                          chrY	21150965	21153863	4	4	EIF1AY
                          chrY	21153967	21155747	14	14	EIF1AY
                          chrY	21155798	21159297	9	8	EIF1AY
                          chrY	21159379	21160757	17	17	EIF1AY
                          chrY	21160849	21163614	8	8	EIF1AY
                          chrY	2769668	2770205	0	28	RPS4Y1
                          chrY	2770283	2772117	0	21	RPS4Y1
                          chrY	2772298	2773686	36	17	RPS4Y1
                          chrY	2773784	2782640	58	0	RPS4Y1
                          chrY	2782812	2793128	65	65	RPS4Y1
                          chrY	2793286	2794833	43	43	RPS4Y1
                          chrY	7284271	7295396	0	1	PRKY
                          chrY	8577625	8578201	4	4	intergenic
                          You can find that about half of the two numbers in columns 4 and 5 are the same, but the other half not. Zeros appear in both columns, which means some splice junctions cannot be identified without gene model, and some others even with gene model. Also, there would be another suspicion whether all the detected splice junctions are real. I think tophat tends to suppress false positives, and maybe that's the reason why we can see clearly some false negatives. There could be some tradeoff and maybe it's better that this tradeoff could be specified by the users (however, i didn't check if tophat has already provided this option).

                          For the mismatches setting, I agree strongly with you. There is no best setting. But intuitively, the number of splice junction reads are less than that of exon reads, and there would be a higher risk to claim a read is splice junction read than a exon read, especially in the cases where no other evidence for the corresponding junction.
                          Last edited by Xi Wang; 11-29-2009, 08:22 PM.
                          Xi Wang

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Choosing Between NGS and qPCR
                            by seqadmin



                            Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                            10-18-2024, 07:11 AM
                          • seqadmin
                            Non-Coding RNA Research and Technologies
                            by seqadmin




                            Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                            Nobel Prize for MicroRNA Discovery
                            This week,...
                            10-07-2024, 08:07 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 11-01-2024, 06:09 AM
                          0 responses
                          15 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 10-30-2024, 05:31 AM
                          0 responses
                          17 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 10-24-2024, 06:58 AM
                          0 responses
                          24 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 10-23-2024, 08:43 AM
                          0 responses
                          53 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X