Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • get average gene length file

    Hi everyone,

    I have a very quick question...probably super easy for experienced programmers!
    I need to generate an average gene length file from the .gtf file I downloaded from Ensembl. In detail, I will need a 2 column file with ENSG id on the first and gene length on the second column (I need for HTseq normalization...)

    Can anyone help me with scripting?

    Thanks!!!
    Manu

  • #2
    In R:
    Code:
    library(GenomicFeatures)
    txdb <- makeTranscriptDbFromGFF("foo.gtf", type="gtf")
    trans <- transcripts(txdb, columns=c("GENEID"))
    df <- data.frame(gene=trans$GENEID, len=width(trans))
    You probably need to specify something else for "GENEID". You can see available options with columns(txdb).

    Comment


    • #3
      Thank you dpryan!

      I get this error message
      txdb <- makeTranscriptDbFromGFF("foo.gtf", type="gtf")
      Error in makeTranscriptDbFromGFF("foo.gtf", type = "gtf") :
      unused argument (type = "gtf")

      what is it?

      Comment


      • #4
        It should have been format= rather than type=. Mea culpa.

        Comment


        • #5

          thank you!
          it's running now...i'll update the result soon!!!

          Comment


          • #6
            Hi dpryan!


            after 10 minutes of running this command:
            Originally posted by dpryan View Post
            In R:
            Code:
            df <- data.frame(gene=trans$GENEID, len=width(trans))
            R Studio simply crashes... any idea why???
            It takes forever to run txdb <- makeTranscriptDbFromGFF("genes.gtf", format="gtf"), but it ultimately does it in 15 min or so...

            Thanks!
            Manu

            Comment


            • #7
              Presumably you're running out of memory and that's causing the crash. R isn't always the best when it comes to memory. You might quickly do:
              Code:
              head(trans$GENEID)
              head(width(trans))
              just to ensure that there's nothing strange that happened while making the "trans" object. The odds of an error there are crazy low, but it couldn't hurt to double check.

              BTW, the txdb object can be saved to a file, should you ever need it again. Just use something like
              Code:
              saveDb(txdb, file="myOrganism.sqlite")
              and you can simply load it again later with
              Code:
              loadDb("myOrganism.sqlite")
              This ends up saving a lot of time if you work with the same organism often.

              Comment


              • #8
                Originally posted by dpryan View Post
                Presumably you're running out of memory and that's causing the crash. R isn't always the best when it comes to memory. You might quickly do:
                Code:
                head(trans$GENEID)
                head(width(trans))
                just to ensure that there's nothing strange that happened while making the "trans" object. The odds of an error there are crazy low, but it couldn't hurt to double check.

                BTW, the txdb object can be saved to a file, should you ever need it again. Just use something like
                Code:
                saveDb(txdb, file="myOrganism.sqlite")
                and you can simply load it again later with
                Code:
                loadDb("myOrganism.sqlite")
                This ends up saving a lot of time if you work with the same organism often.
                Thanks dpryan!
                as a matter of fact my computer was a bit cranky...it worked perfect after i rebooted it!!!
                Manu

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Choosing Between NGS and qPCR
                  by seqadmin



                  Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                  10-18-2024, 07:11 AM
                • seqadmin
                  Non-Coding RNA Research and Technologies
                  by seqadmin




                  Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                  Nobel Prize for MicroRNA Discovery
                  This week,...
                  10-07-2024, 08:07 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 11-01-2024, 06:09 AM
                0 responses
                18 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 10-30-2024, 05:31 AM
                0 responses
                18 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 10-24-2024, 06:58 AM
                0 responses
                24 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 10-23-2024, 08:43 AM
                0 responses
                53 views
                0 likes
                Last Post seqadmin  
                Working...
                X