Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Finding the Longest ORF for all sequences in EMBOSS

    Hi,

    I have trancriptome file consisting around 39000 DNA sequences. Now I would like find the longest ORFs for all the 39000 DNA sequences. I used EMBOSS's getORF from its webservice to find the all the possible ORFs. As I kept the default parameter 30 for minimum number of amino acids for a peptide, I got lot of ORFs which has more than 30 amino acids sequences for a single transcript. Now I would like to retain only the longest peptide with maximum of number of amino acids for all sequences.

    How can I achieve that? is there any alternate way to get only the longest ORF fro all transcript? Kindly guide me

  • #2
    Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

    Comment


    • #3
      Hi genomax,

      I downloaded the the python script and corresponding xml files from the GitHub by right clicking it and the saved the link as it is. Later I installed Biopython.
      But when i ran the command, it got the following error

      Code:
      File "get_orfs_or_cdss.py", line 4
          <!DOCTYPE html>
          ^
      SyntaxError: invalid syntax
      This is how I gave the command,
      Code:
      python get_orfs_or_cdss.py $input_fasta smed_dd_v4.fasta $input_format FASTA $table 1 $ftype CDS $ends open $min_len 30 $strand both $mode top $out_nuc_file dd_nucleotide.fasta $out_prot_file dd_prot.fasta
      Kindly guide me

      Comment


      • #4
        You didn't download the Python script, but an HTML file showing the Python script with nice colours etc. You need to use the "raw" link on GitHub, i.e.


        The resulting get_orfs_or_cdss.py file should be plain text and start with:

        Code:
        #!/usr/bin/env python
        """Find ORFs in a nucleotide sequence file.
        
        ...
        If it was unclear, in place of $input_fasta you would put the filename of your input FASTA file (and so on). i.e.

        Code:
        python get_orfs_or_cdss.py smed_dd_v4.fasta FASTA 1 CDS open 30 both top dd_nucleotide.fasta dd_prot.fasta
        (And yes, I know this is not a very friendly command line interface - it was written primarily for use via Galaxy and I have not yet had reason/time to go back and make this more Unix-like. Sorry)
        Last edited by maubp; 02-25-2015, 12:19 PM. Reason: Adding usage example

        Comment


        • #5
          Originally posted by maubp View Post
          You didn't download the Python script, but an HTML file showing the Python script with nice colours etc. You need to use the "raw" link on GitHub, i.e.


          The resulting get_orfs_or_cdss.py file should be plain text and start with:

          Code:
          #!/usr/bin/env python
          """Find ORFs in a nucleotide sequence file.
          
          ...
          If it was unclear, in place of $input_fasta you would put the filename of your input FASTA file (and so on). i.e.

          Code:
          python get_orfs_or_cdss.py smed_dd_v4.fasta FASTA 1 CDS open 30 both top dd_nucleotide.fasta dd_prot.fasta
          (And yes, I know this is not a very friendly command line interface - it was written primarily for use via Galaxy and I have not yet had reason/time to go back and make this more Unix-like. Sorry)
          Hi Maubp,

          You mean to say that I have to copy the code from the plain text to a editor and save it as a pythin script and later run it as python program? Am I right?

          Comment


          • #6
            Right click on the link Peter provided and then choose "save as" (or "save link as"). That will save the script file locally. You can then run it.

            Comment


            • #7
              Originally posted by GenoMax View Post
              Right click on the link Peter provided and then choose "save as" (or "save link as"). That will save the script file locally. You can then run it.
              Hi Genomax,

              I tried exactly what you said, but it throwed me an error as I stated above

              Comment


              • #8
                Did you modify/try the command as Peter showed?

                Comment


                • #9
                  Originally posted by GenoMax View Post
                  Did you modify/try the command as Peter showed?
                  No. I didnt modify any command. i just ran after saving the link. What has to modified?

                  Comment


                  • #10
                    Code:
                    $ python get_orfs_or_cdss.py smed_dd_v4.fasta FASTA 1 CDS open 30 both top dd_nucleotide.fasta dd_prot.fasta

                    Comment


                    • #11
                      Originally posted by GenoMax View Post
                      Code:
                      $ python get_orfs_or_cdss.py smed_dd_v4.fasta FASTA 1 CDS open 30 both top dd_nucleotide.fasta dd_prot.fasta
                      I tried the above command but it stiil shows syntax error

                      Comment


                      • #12
                        We will have to wait for Peter to chime in then.

                        Comment


                        • #13
                          Originally posted by dena.dinesh View Post
                          Hi Maubp,

                          You mean to say that I have to copy the code from the plain text to a editor and save it as a pythin script and later run it as python program? Am I right?
                          That should work but is unnecessarily complicated. As GenoMax suggested, right clicking on the link https://raw.githubusercontent.com/pe...rfs_or_cdss.py in your browser should give you a save option. I'm puzzled what went wrong, perhaps this depends on your web-browser?

                          The simplest approach would be to download it at the command line with:
                          Code:
                          $ wget https://raw.githubusercontent.com/peterjc/pico_galaxy/master/tools/get_orfs_or_cdss/get_orfs_or_cdss.py
                          Check this worked with:

                          Code:
                          $ head get_orfs_or_cdss.py 
                          #!/usr/bin/env python
                          """Find ORFs in a nucleotide sequence file.
                          
                          get_orfs_or_cdss.py $input_fasta $input_format $table $ftype $ends $mode $min_len $strand $out_nuc_file $out_prot_file
                          
                          Takes ten command line options, input sequence filename, format, genetic
                          code, CDS vs ORF, end type (open, closed), selection mode (all, top, one),
                          minimum length (in amino acids), strand (both, forward, reverse), output
                          nucleotide filename, and output protein filename.

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Strategies for Sequencing Challenging Samples
                            by seqadmin


                            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                            03-22-2024, 06:39 AM
                          • seqadmin
                            Techniques and Challenges in Conservation Genomics
                            by seqadmin



                            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                            Avian Conservation
                            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                            03-08-2024, 10:41 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, Yesterday, 06:37 PM
                          0 responses
                          11 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, Yesterday, 06:07 PM
                          0 responses
                          10 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-22-2024, 10:03 AM
                          0 responses
                          51 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-21-2024, 07:32 AM
                          0 responses
                          67 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X