Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Zhuzhu
    Junior Member
    • Mar 2009
    • 4

    Build expanded genome for ERANGE

    Hi all,

    We're trying to use ERANGE for our human samples. The goal is to identify any possible new transcripts. We looked into TopHat and now want to try ERANGE.
    We got pretty confused by how to build the expanded genome for using ERANGE. How should we do this? And particularly, how/where could we obtain the spike sequence? Any suggestions, opinions would be highly appreciate!!!

    Many thanks in advance!

    Zhuzhu
  • para_seq
    Member
    • Aug 2009
    • 12

    #2
    I also have a question with the use of ERANGE. Where can I get the 'knownGene.txt' file mentioned in the README.build.rds file? I need to see its field definitions to format my gene annotation file for building RDS database (-RNA option). It seems 'knownGene.txt' is not in GFF format. Thank you.

    Comment

    • kmcarr
      Senior Member
      • May 2008
      • 1181

      #3
      The 'knownGene.txt' file is one of the standard files used for the UCSC Genome browser. You can download the version for the human hg18 assembly from their FTP site here:

      ftp://hgdownload.cse.ucsc.edu/goldenPath/hg18/database

      You are correct that it is not GFF format. It is their own format directly related to the structure of the corresponding database table. You can find out about the format of the file from the UCSC Table browser (http://genome.ucsc.edu/cgi-bin/hgTables?command=start). Select "Mammal", "Human", "Mar. 2006" from the "clade", "genome", "assembly" pop-up menus, then select "Genes and Gene Prediction Tracks" and "UCSC Genes" from the "group" and "track" pop-ups respectively. Finally select "knownGene" from the "table" menu and then click the "describe table schema" button. This will show a description of each column of data in the table (or txt file).

      Comment

      • para_seq
        Member
        • Aug 2009
        • 12

        #4
        Originally posted by kmcarr View Post
        The 'knownGene.txt' file is one of the standard files used for the UCSC Genome browser. You can download the version for the human hg18 assembly from their FTP site here:

        ftp://hgdownload.cse.ucsc.edu/goldenPath/hg18/database

        You are correct that it is not GFF format. It is their own format directly related to the structure of the corresponding database table. You can find out about the format of the file from the UCSC Table browser (http://genome.ucsc.edu/cgi-bin/hgTables?command=start). Select "Mammal", "Human", "Mar. 2006" from the "clade", "genome", "assembly" pop-up menus, then select "Genes and Gene Prediction Tracks" and "UCSC Genes" from the "group" and "track" pop-ups respectively. Finally select "knownGene" from the "table" menu and then click the "describe table schema" button. This will show a description of each column of data in the table (or txt file).
        Thank you, kmcarr. That helps a lot. I found the table schema of this file.

        Comment

        • Kasycas
          Member
          • Sep 2009
          • 22

          #5
          spike sequences

          Hi,

          I am also wondering about the OP's question about spike sequences. They seem to be mentioned only in the ERANGE paper and online help files with no mention of what they actually are! Any ideas ?

          Thanks,

          Kasycas

          Comment

          • manducasexta
            Member
            • Mar 2009
            • 12

            #6
            Hi All -

            I might be missing the point of Kasycas and Zhuzhu's questions, but I'll give it a shot anyway:

            If you added spiked-in standards to your sample, then you should be able to find out the sequences through any reference source. Only you can know what your spikes are. Mortazavi et al used in vitro synthesized RNA from Arabidopsis and phage.

            Comment

            • elisa*_*
              Junior Member
              • Aug 2008
              • 8

              #7
              Hi all and kmcarr,

              Do you know how to convert a gff file to knownGene format? I am working with species not on UCSC genome browser, but do have a gff file for gene annotations.

              Comment

              • kmcarr
                Senior Member
                • May 2008
                • 1181

                #8
                Originally posted by elisa*_* View Post
                Hi all and kmcarr,

                Do you know how to convert a gff file to knownGene format? I am working with species not on UCSC genome browser, but do have a gff file for gene annotations.
                I created a knownGeneTable for TAIR8 starting with the GFF3 file from TAIR. Now whether you consider my method easy or hard depends on whether you are familiar/comfortable with BioPerl, specifically the Bio:B::SeqFeature module. This is the new preferred back end for GBrowse so I already had many of the components in place. I'll give you the outline but if it all sound Greek to you then I'm afraid I'll be no help at all.

                - Install the latest versions of BioPerl and MySQL.

                - Create an empty MySQL database to hold your annotation.

                - Load the database with the annotations from your GFF file using the bp_seqfeature_load.pl script (installed as part of BioPerl).

                - Use the attached script (changing the -dsn and -user parameters as needed) to query the newly created DB and output the knownGenesTable file.

                I know this seems like the long way around to get the file you want but as I said, once you have the database created it is useful for many projects.
                Attached Files

                Comment

                • elisa*_*
                  Junior Member
                  • Aug 2008
                  • 8

                  #9
                  kmcarr,
                  Thank you very much for the info. I will try this.

                  Comment

                  • Pankaj
                    Junior Member
                    • May 2010
                    • 1

                    #10
                    Where to download annotation file for human genome

                    Hello,

                    I have sequence co-ordinates of ChIP-seq peaks. I am trying to map it so that I can know the nearest genes. However, I'm having difficulty in finding annotated file to download. I tried UCSC and also Refseq, but I can't find .gff files. There are so many files and I don't know which to download. I downloaded some of those but they don't have gene names. I just want a file that contains transcription start site, strand (+/-) and gene name for human genome. Can anybody please help me with this?

                    Thanks

                    Comment

                    Latest Articles

                    Collapse

                    • GATTACAT
                      Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                      by GATTACAT
                      Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                      07-01-2026, 11:43 AM
                    • SEQadmin2
                      Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                      by SEQadmin2


                      I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                      Here are nine questions we think about, in roughly the order they matter, before...
                      06-18-2026, 07:11 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by SEQadmin2, Yesterday, 11:08 AM
                    0 responses
                    6 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-30-2026, 05:37 AM
                    0 responses
                    11 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-26-2026, 11:10 AM
                    0 responses
                    19 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-17-2026, 06:09 AM
                    0 responses
                    53 views
                    0 reactions
                    Last Post SEQadmin2  
                    Working...