Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Build expanded genome for ERANGE

    Hi all,

    We're trying to use ERANGE for our human samples. The goal is to identify any possible new transcripts. We looked into TopHat and now want to try ERANGE.
    We got pretty confused by how to build the expanded genome for using ERANGE. How should we do this? And particularly, how/where could we obtain the spike sequence? Any suggestions, opinions would be highly appreciate!!!

    Many thanks in advance!

    Zhuzhu

  • #2
    I also have a question with the use of ERANGE. Where can I get the 'knownGene.txt' file mentioned in the README.build.rds file? I need to see its field definitions to format my gene annotation file for building RDS database (-RNA option). It seems 'knownGene.txt' is not in GFF format. Thank you.

    Comment


    • #3
      The 'knownGene.txt' file is one of the standard files used for the UCSC Genome browser. You can download the version for the human hg18 assembly from their FTP site here:

      ftp://hgdownload.cse.ucsc.edu/goldenPath/hg18/database

      You are correct that it is not GFF format. It is their own format directly related to the structure of the corresponding database table. You can find out about the format of the file from the UCSC Table browser (http://genome.ucsc.edu/cgi-bin/hgTables?command=start). Select "Mammal", "Human", "Mar. 2006" from the "clade", "genome", "assembly" pop-up menus, then select "Genes and Gene Prediction Tracks" and "UCSC Genes" from the "group" and "track" pop-ups respectively. Finally select "knownGene" from the "table" menu and then click the "describe table schema" button. This will show a description of each column of data in the table (or txt file).

      Comment


      • #4
        Originally posted by kmcarr View Post
        The 'knownGene.txt' file is one of the standard files used for the UCSC Genome browser. You can download the version for the human hg18 assembly from their FTP site here:

        ftp://hgdownload.cse.ucsc.edu/goldenPath/hg18/database

        You are correct that it is not GFF format. It is their own format directly related to the structure of the corresponding database table. You can find out about the format of the file from the UCSC Table browser (http://genome.ucsc.edu/cgi-bin/hgTables?command=start). Select "Mammal", "Human", "Mar. 2006" from the "clade", "genome", "assembly" pop-up menus, then select "Genes and Gene Prediction Tracks" and "UCSC Genes" from the "group" and "track" pop-ups respectively. Finally select "knownGene" from the "table" menu and then click the "describe table schema" button. This will show a description of each column of data in the table (or txt file).
        Thank you, kmcarr. That helps a lot. I found the table schema of this file.

        Comment


        • #5
          spike sequences

          Hi,

          I am also wondering about the OP's question about spike sequences. They seem to be mentioned only in the ERANGE paper and online help files with no mention of what they actually are! Any ideas ?

          Thanks,

          Kasycas

          Comment


          • #6
            Hi All -

            I might be missing the point of Kasycas and Zhuzhu's questions, but I'll give it a shot anyway:

            If you added spiked-in standards to your sample, then you should be able to find out the sequences through any reference source. Only you can know what your spikes are. Mortazavi et al used in vitro synthesized RNA from Arabidopsis and phage.

            Comment


            • #7
              Hi all and kmcarr,

              Do you know how to convert a gff file to knownGene format? I am working with species not on UCSC genome browser, but do have a gff file for gene annotations.

              Comment


              • #8
                Originally posted by elisa*_* View Post
                Hi all and kmcarr,

                Do you know how to convert a gff file to knownGene format? I am working with species not on UCSC genome browser, but do have a gff file for gene annotations.
                I created a knownGeneTable for TAIR8 starting with the GFF3 file from TAIR. Now whether you consider my method easy or hard depends on whether you are familiar/comfortable with BioPerl, specifically the Bio:B::SeqFeature module. This is the new preferred back end for GBrowse so I already had many of the components in place. I'll give you the outline but if it all sound Greek to you then I'm afraid I'll be no help at all.

                - Install the latest versions of BioPerl and MySQL.

                - Create an empty MySQL database to hold your annotation.

                - Load the database with the annotations from your GFF file using the bp_seqfeature_load.pl script (installed as part of BioPerl).

                - Use the attached script (changing the -dsn and -user parameters as needed) to query the newly created DB and output the knownGenesTable file.

                I know this seems like the long way around to get the file you want but as I said, once you have the database created it is useful for many projects.
                Attached Files

                Comment


                • #9
                  kmcarr,
                  Thank you very much for the info. I will try this.

                  Comment


                  • #10
                    Where to download annotation file for human genome

                    Hello,

                    I have sequence co-ordinates of ChIP-seq peaks. I am trying to map it so that I can know the nearest genes. However, I'm having difficulty in finding annotated file to download. I tried UCSC and also Refseq, but I can't find .gff files. There are so many files and I don't know which to download. I downloaded some of those but they don't have gene names. I just want a file that contains transcription start site, strand (+/-) and gene name for human genome. Can anybody please help me with this?

                    Thanks

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Non-Coding RNA Research and Technologies
                      by seqadmin




                      Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                      Nobel Prize for MicroRNA Discovery
                      This week,...
                      10-07-2024, 08:07 AM
                    • seqadmin
                      Recent Developments in Metagenomics
                      by seqadmin





                      Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
                      09-23-2024, 06:35 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 10-02-2024, 04:51 AM
                    0 responses
                    104 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 10-01-2024, 07:10 AM
                    0 responses
                    112 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 09-30-2024, 08:33 AM
                    1 response
                    116 views
                    0 likes
                    Last Post EmiTom
                    by EmiTom
                     
                    Started by seqadmin, 09-26-2024, 12:57 PM
                    0 responses
                    22 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X