Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • emolinari
    Member
    • May 2013
    • 47

    get average gene length file

    Hi everyone,

    I have a very quick question...probably super easy for experienced programmers!
    I need to generate an average gene length file from the .gtf file I downloaded from Ensembl. In detail, I will need a 2 column file with ENSG id on the first and gene length on the second column (I need for HTseq normalization...)

    Can anyone help me with scripting?

    Thanks!!!
    Manu
  • dpryan
    Devon Ryan
    • Jul 2011
    • 3478

    #2
    In R:
    Code:
    library(GenomicFeatures)
    txdb <- makeTranscriptDbFromGFF("foo.gtf", type="gtf")
    trans <- transcripts(txdb, columns=c("GENEID"))
    df <- data.frame(gene=trans$GENEID, len=width(trans))
    You probably need to specify something else for "GENEID". You can see available options with columns(txdb).

    Comment

    • emolinari
      Member
      • May 2013
      • 47

      #3
      Thank you dpryan!

      I get this error message
      txdb <- makeTranscriptDbFromGFF("foo.gtf", type="gtf")
      Error in makeTranscriptDbFromGFF("foo.gtf", type = "gtf") :
      unused argument (type = "gtf")

      what is it?

      Comment

      • dpryan
        Devon Ryan
        • Jul 2011
        • 3478

        #4
        It should have been format= rather than type=. Mea culpa.

        Comment

        • emolinari
          Member
          • May 2013
          • 47

          #5

          thank you!
          it's running now...i'll update the result soon!!!

          Comment

          • emolinari
            Member
            • May 2013
            • 47

            #6
            Hi dpryan!


            after 10 minutes of running this command:
            Originally posted by dpryan View Post
            In R:
            Code:
            df <- data.frame(gene=trans$GENEID, len=width(trans))
            R Studio simply crashes... any idea why???
            It takes forever to run txdb <- makeTranscriptDbFromGFF("genes.gtf", format="gtf"), but it ultimately does it in 15 min or so...

            Thanks!
            Manu

            Comment

            • dpryan
              Devon Ryan
              • Jul 2011
              • 3478

              #7
              Presumably you're running out of memory and that's causing the crash. R isn't always the best when it comes to memory. You might quickly do:
              Code:
              head(trans$GENEID)
              head(width(trans))
              just to ensure that there's nothing strange that happened while making the "trans" object. The odds of an error there are crazy low, but it couldn't hurt to double check.

              BTW, the txdb object can be saved to a file, should you ever need it again. Just use something like
              Code:
              saveDb(txdb, file="myOrganism.sqlite")
              and you can simply load it again later with
              Code:
              loadDb("myOrganism.sqlite")
              This ends up saving a lot of time if you work with the same organism often.

              Comment

              • emolinari
                Member
                • May 2013
                • 47

                #8
                Originally posted by dpryan View Post
                Presumably you're running out of memory and that's causing the crash. R isn't always the best when it comes to memory. You might quickly do:
                Code:
                head(trans$GENEID)
                head(width(trans))
                just to ensure that there's nothing strange that happened while making the "trans" object. The odds of an error there are crazy low, but it couldn't hurt to double check.

                BTW, the txdb object can be saved to a file, should you ever need it again. Just use something like
                Code:
                saveDb(txdb, file="myOrganism.sqlite")
                and you can simply load it again later with
                Code:
                loadDb("myOrganism.sqlite")
                This ends up saving a lot of time if you work with the same organism often.
                Thanks dpryan!
                as a matter of fact my computer was a bit cranky...it worked perfect after i rebooted it!!!
                Manu

                Comment

                Latest Articles

                Collapse

                • GATTACAT
                  Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                  by GATTACAT
                  Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                  07-01-2026, 11:43 AM
                • SEQadmin2
                  Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                  by SEQadmin2


                  I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                  Here are nine questions we think about, in roughly the order they matter, before...
                  06-18-2026, 07:11 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by SEQadmin2, Yesterday, 11:08 AM
                0 responses
                6 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-30-2026, 05:37 AM
                0 responses
                11 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-26-2026, 11:10 AM
                0 responses
                19 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-17-2026, 06:09 AM
                0 responses
                53 views
                0 reactions
                Last Post SEQadmin2  
                Working...