Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Build expanded genome for ERANGE

    Hi all,

    We're trying to use ERANGE for our human samples. The goal is to identify any possible new transcripts. We looked into TopHat and now want to try ERANGE.
    We got pretty confused by how to build the expanded genome for using ERANGE. How should we do this? And particularly, how/where could we obtain the spike sequence? Any suggestions, opinions would be highly appreciate!!!

    Many thanks in advance!

    Zhuzhu

  • #2
    I also have a question with the use of ERANGE. Where can I get the 'knownGene.txt' file mentioned in the README.build.rds file? I need to see its field definitions to format my gene annotation file for building RDS database (-RNA option). It seems 'knownGene.txt' is not in GFF format. Thank you.

    Comment


    • #3
      The 'knownGene.txt' file is one of the standard files used for the UCSC Genome browser. You can download the version for the human hg18 assembly from their FTP site here:

      ftp://hgdownload.cse.ucsc.edu/goldenPath/hg18/database

      You are correct that it is not GFF format. It is their own format directly related to the structure of the corresponding database table. You can find out about the format of the file from the UCSC Table browser (http://genome.ucsc.edu/cgi-bin/hgTables?command=start). Select "Mammal", "Human", "Mar. 2006" from the "clade", "genome", "assembly" pop-up menus, then select "Genes and Gene Prediction Tracks" and "UCSC Genes" from the "group" and "track" pop-ups respectively. Finally select "knownGene" from the "table" menu and then click the "describe table schema" button. This will show a description of each column of data in the table (or txt file).

      Comment


      • #4
        Originally posted by kmcarr View Post
        The 'knownGene.txt' file is one of the standard files used for the UCSC Genome browser. You can download the version for the human hg18 assembly from their FTP site here:

        ftp://hgdownload.cse.ucsc.edu/goldenPath/hg18/database

        You are correct that it is not GFF format. It is their own format directly related to the structure of the corresponding database table. You can find out about the format of the file from the UCSC Table browser (http://genome.ucsc.edu/cgi-bin/hgTables?command=start). Select "Mammal", "Human", "Mar. 2006" from the "clade", "genome", "assembly" pop-up menus, then select "Genes and Gene Prediction Tracks" and "UCSC Genes" from the "group" and "track" pop-ups respectively. Finally select "knownGene" from the "table" menu and then click the "describe table schema" button. This will show a description of each column of data in the table (or txt file).
        Thank you, kmcarr. That helps a lot. I found the table schema of this file.

        Comment


        • #5
          spike sequences

          Hi,

          I am also wondering about the OP's question about spike sequences. They seem to be mentioned only in the ERANGE paper and online help files with no mention of what they actually are! Any ideas ?

          Thanks,

          Kasycas

          Comment


          • #6
            Hi All -

            I might be missing the point of Kasycas and Zhuzhu's questions, but I'll give it a shot anyway:

            If you added spiked-in standards to your sample, then you should be able to find out the sequences through any reference source. Only you can know what your spikes are. Mortazavi et al used in vitro synthesized RNA from Arabidopsis and phage.

            Comment


            • #7
              Hi all and kmcarr,

              Do you know how to convert a gff file to knownGene format? I am working with species not on UCSC genome browser, but do have a gff file for gene annotations.

              Comment


              • #8
                Originally posted by elisa*_* View Post
                Hi all and kmcarr,

                Do you know how to convert a gff file to knownGene format? I am working with species not on UCSC genome browser, but do have a gff file for gene annotations.
                I created a knownGeneTable for TAIR8 starting with the GFF3 file from TAIR. Now whether you consider my method easy or hard depends on whether you are familiar/comfortable with BioPerl, specifically the Bio:B::SeqFeature module. This is the new preferred back end for GBrowse so I already had many of the components in place. I'll give you the outline but if it all sound Greek to you then I'm afraid I'll be no help at all.

                - Install the latest versions of BioPerl and MySQL.

                - Create an empty MySQL database to hold your annotation.

                - Load the database with the annotations from your GFF file using the bp_seqfeature_load.pl script (installed as part of BioPerl).

                - Use the attached script (changing the -dsn and -user parameters as needed) to query the newly created DB and output the knownGenesTable file.

                I know this seems like the long way around to get the file you want but as I said, once you have the database created it is useful for many projects.
                Attached Files

                Comment


                • #9
                  kmcarr,
                  Thank you very much for the info. I will try this.

                  Comment


                  • #10
                    Where to download annotation file for human genome

                    Hello,

                    I have sequence co-ordinates of ChIP-seq peaks. I am trying to map it so that I can know the nearest genes. However, I'm having difficulty in finding annotated file to download. I tried UCSC and also Refseq, but I can't find .gff files. There are so many files and I don't know which to download. I downloaded some of those but they don't have gene names. I just want a file that contains transcription start site, strand (+/-) and gene name for human genome. Can anybody please help me with this?

                    Thanks

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      The Impact of AI in Genomic Medicine
                      by seqadmin



                      Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                      02-26-2024, 02:07 PM
                    • seqadmin
                      Multiomics Techniques Advancing Disease Research
                      by seqadmin


                      New and advanced multiomics tools and technologies have opened new avenues of research and markedly enhanced various disciplines such as disease research and precision medicine1. The practice of merging diverse data from various ‘omes increasingly provides a more holistic understanding of biological systems. As Maddison Masaeli, Co-Founder and CEO at Deepcell, aptly noted, “You can't explain biology in its complex form with one modality.”

                      A major leap in the field has
                      ...
                      02-08-2024, 06:33 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Yesterday, 06:12 AM
                    0 responses
                    16 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 02-23-2024, 04:11 PM
                    0 responses
                    67 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 02-21-2024, 08:52 AM
                    0 responses
                    73 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 02-20-2024, 08:57 AM
                    0 responses
                    62 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X