Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • mirTools with 454 Data for non coding Rna analysis

    Hy Everybody,
    it's my first post. I want to thanks you from the italian scientific community for your wonderful work in this forum.
    I have a question to ask you. I'm a 454 user and i have a file with singleton sequences generated with 'sfffile' and converted in "singleton.fna". I try to put this file in "mirTools" but it doesn't accept this file. So i modified the file like this:
    >sample_1_x58
    ACAGGCGGACACACACACACACACACACACACACACACACACACACTACAACACAGTA
    >sample_2_x160
    CTCACACAGTTATACACAGATTTACACACACAATACACCTACACACACATGCTTATACAC
    ACCACTCAACACAAGTTACCTAACTACAATAATTATC
    >GIK1EHM01D82G8_x145
    GTTTGAGAGGGTGATGATAAGAAGCTTGCGCAGTGGCTCACGCCTGTAATCCCAGCACTT
    TGGGAGGCCAAGGCAGGTGGATTGCCTGAGCTCAGGAGTTTGAGACCAGCCTGAGCAACA
    TGGAAAATCCCATCTCTAAAAATAC
    >GIK1EHM01A3PID_x74
    GGTAACTTTGTGTTGATGTGGGCGAGTGTGGGCAGATGGGAAGCTGTGGTGTGGGGCGAG
    TGTGGGCAGATGGG......etc.etc.

    and it seems to accept this format. The problem is that the originary file contains more than 400.000 sequences and modify it manually is impossible. Is there any script action like sed or grep to delete all the words following by ">name" without delete all the words in the row? Ex:

    from this unaccepted format:
    >GIK1EHM01A0TLT length=84 xy=0302_1103 region=1 run=R_2010_06_08_08_48_07_
    GTGTTTCTGTGTGGAGGTGTGTCTCTGTGGTGTGTGTGTCTGTGTGGGTGTACGTGTGTC
    TCTGTCTGTGGTGTGTGTGTCTGT

    to this accepted format
    >GIK1EHM01A0TLT_x84
    GTGTTTCTGTGTGGAGGTGTGTCTCT...................

    I hope you can help me, i'll try all the way. Waiting for your answer
    Thank you very much

    Giorgio

  • #2
    You have this:

    Code:
    >GIK1EHM01A0TLT length=84 xy=0302_1103 region=1 run=R_2010_06_08_08_48_07_
    GTGTTTCTGTGTGGAGGTGTGTCTCTGTGGTGTGTGTGTCTGTGTGGGTG TACGTGTGTC
    TCTGTCTGTGGTGTGTGTGTCTGT
    Referring to http://222.73.178.238/mirtools/help.php if you use this you are telling mirTools this read occurred 84 times:

    Code:
    >GIK1EHM01A0TLT_x84
    GTGTTTCTGTGTGGAGGTGTGTCTCTGTGGTGTGTGTGTCTGTGTGGGTG TACGTGTGTC
    TCTGTCTGTGGTGTGTGTGTCTGT
    You need to use the read count, which is probably one, when renaming the read:

    Code:
    >GIK1EHM01A0TLT_x1
    GTGTTTCTGTGTGGAGGTGTGTCTCTGTGGTGTGTGTGTCTGTGTGGGTG TACGTGTGTC
    TCTGTCTGTGGTGTGTGTGTCTGT
    Do you know any scripting languages? e.g. Perl or Python

    This Biopython script will probably do what you want...

    Code:
    from Bio import SeqIO
    input_fasta = "original.fasta"
    output_fasta = "fixed.fasta"
    def fix_for_mirtools(records):
        for record in records:
            record.description=""
            record.id += "_x1"
            yield record
    records = SeqIO.parse(input_fasta, "fasta")
    count = SeqIO.write(fix_for_mirtools(records), output_fasta, "fasta")
    print "Saved %i records" % count
    Last edited by maubp; 10-11-2010, 05:23 AM.

    Comment


    • #3
      Thank you for your answer.
      I know a little bit Python, ill' try with your suggests. I hoped that exists something similar for linux cos it's diffcult to use Python.

      Comment


      • #4
        Originally posted by Giorgio C View Post
        Thank you for your answer.
        I know a little bit Python, ill' try with your suggests. I hoped that exists something similar for linux cos it's diffcult to use Python.
        Python works fine on Linux - or did you mean you would like a command line based solution?

        You can turn it into a simple command line script taking piped output if you want,
        Code:
        #!/usr/bin/env python
        """Quick script to read a FASTA file from stdin and write it to stdout,
        formatting identifiers for mirTool assuming single read coverage."""
        import sys
        from Bio import SeqIO
        def fix_for_mirtools(records):
            for record in records:
                record.description=""
                record.id += "_x1"
                yield record
        records = SeqIO.parse(sys.stdin, "fasta")
        count = SeqIO.write(fix_for_mirtools(records), sys.stdout, "fasta")
        print "Saved %i records" % count
        Then save that script (e.g. as fix_for_mirtools) and mark it as executable with chmod, then call it at the command line:
        Code:
        ./fix_for_mirtools < original.fasta > fixed.fasta
        or:
        Code:
        python fix_for_mirtools < original.fasta > fixed.fasta
        Alternatively someone might suggest a one line trick using sed

        Comment


        • #5
          I'v tried like you say me to do:


          from Bio import SeqIO
          >>> input_fasta = "C:\Users\Giorgio Casaburi\Desktop\singleton.fna"
          >>> output_fasta = "C:\Users\Giorgio Casaburi\Desktop\singletonfixed.fna"
          >>> def fix_for_mirtools (records) :
          for record in records:
          record.description=""
          record.id += "_x1"
          yield record
          records = SeqIO.parse(singleton.fna, "fasta")
          count = SeqIO.write(fix_for_mirtools(records), singletonfixed.fna, "fasta")
          print "Saved %i records" % count
          >>>


          So doesn't happen nothing. Is something else i need to do?

          Comment


          • #6
            From your filenames you are using Windows - not Linux.

            It looks like you are trying to cut and paste directly at the Python prompt, but the indentation is all wrong. Save the example as a python script file (a plain text file, usually with the extension .py) and run that. You can do this from within the IDLE GUI that comes with Python.

            Comment


            • #7
              Yes i tried it on windows cos there i have the Python package while with Vnc i'm working on a remote Pc of the centre where is linux installed and i don't know if is intalled Python and howevere i haven't the administration privilege to install it. I'm at the first arms with Python so is difficult to me understand what you say. I'll try. Thank you very much for your golden help

              Comment


              • #8
                This may help: http://hkn.eecs.berkeley.edu/~dyoo/p...tro/index.html

                Comment


                • #9
                  Sorry,
                  I'v read all, i'v tried but there is always something wrong. Syntax error, etc. I really don't know how to do. (Myfile.fna is on the desktop).

                  Comment


                  • #10
                    from Bio import SeqIO
                    input_fasta = "C:\Users\Giorgio Casaburi\Desktop\singleton.fna"
                    output_fasta = "C:\Users\Giorgio Casaburi\Desktop\singletonfixed.fna"
                    def fix_for_mirtools(records):
                    record.description=""
                    record.id += "_x1"
                    yield record
                    records = SeqIO.parse(input_fasta, "fasta")
                    count = SeqIO.write(fix_for_mirtools(records), output_fasta, "fasta")
                    print "Saved %i records" % count

                    (run module).....save....

                    and then:

                    IDLE 2.6.5
                    >>> ================================ RESTART ================================
                    >>>

                    Traceback (most recent call last):
                    File "C:/Python26/singleton", line 9, in <module>
                    count = SeqIO.write(fix_for_mirtools(records), output_fasta, "fasta")
                    File "C:\Python26\lib\site-packages\Bio\SeqIO\__init__.py", line 398, in write
                    count = writer_class(handle).write_file(sequences)
                    File "C:\Python26\lib\site-packages\Bio\SeqIO\Interfaces.py", line 271, in write_file
                    count = self.write_records(records)
                    File "C:\Python26\lib\site-packages\Bio\SeqIO\Interfaces.py", line 255, in write_records
                    for record in records:
                    File "C:/Python26/singleton", line 5, in fix_for_mirtools
                    record.description=""
                    NameError: global name 'record' is not defined
                    >>> i don't know where i wrong, can you know my error? Please

                    Comment


                    • #11
                      Do you have any programmers in your group/department? That would be the easiest way to get help. Once you have the basic skills it will be easier to get help online.

                      So you have this - I have added the [ code ] and [ /code ] tags for display:
                      Originally posted by Giorgio C View Post
                      Code:
                      from Bio import SeqIO
                      input_fasta = "C:\Users\Giorgio Casaburi\Desktop\singleton.fna"
                      output_fasta = "C:\Users\Giorgio Casaburi\Desktop\singletonfixed.fna"
                      def fix_for_mirtools(records):
                          record.description=""
                          record.id += "_x1"
                          yield record
                      records = SeqIO.parse(input_fasta, "fasta")
                      count = SeqIO.write(fix_for_mirtools(records), output_fasta, "fasta")
                      print "Saved %i records" % count
                      You are missing the line 'for record in records', hence the error.
                      Last edited by maubp; 10-11-2010, 06:50 AM. Reason: Updated after seeing Giorgio's second post with error message

                      Comment


                      • #12
                        Yes we have a bioinformatic group, but it's not very friendly, i am a Phd student at the first year, i wanted to try to do alone or with an online help. However i know the difficulty for you to explain this kind of things. Thank you very much for all your help.

                        Comment


                        • #13
                          one line trick:

                          sed 's/ length=.*$/_x1/g' your.fna

                          Comment


                          • #14
                            Originally posted by maubp View Post
                            Alternatively someone might suggest a one line trick using sed
                            Originally posted by dschika View Post
                            one line trick:

                            sed 's/ length=.*$/_x1/g' your.fna
                            I wondered how long it would take

                            Giorgio - sed is a command line tool which will probably be available on the Unix/Linux machine you have access to. Getting sed on Windows is more complicated.

                            Comment


                            • #15
                              one line trick:

                              sed 's/ length=.*$/_x1/g' your.fna


                              Wonderful trick!!! Thank you very much

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Essential Discoveries and Tools in Epitranscriptomics
                                by seqadmin




                                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                04-22-2024, 07:01 AM
                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Today, 10:49 AM
                              0 responses
                              15 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-25-2024, 11:49 AM
                              0 responses
                              23 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-24-2024, 08:47 AM
                              0 responses
                              20 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              62 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X