Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Converting GEO database TXT format to fasta

    Hello!

    I'm new to bioinformatics, but I need to perform an analysis of some sort.

    I've downloaded data from GEO database it's a large TXT file consisting of many lines that look this way : SCS_0004:2:1:1053:18066#0/1 AGCAATATTGACTACANCCTCATCAAAGCCTGTAGGCACC [YITQR]MST\WN\\TEQU[`]WU]]WPYXXXOXU]`\W` 5 29 29 chr17:68048647-68172163_36129 3979 + 1 1

    I need to align those short sequences to a mouse chromosome, and I'm using bowtie under windows.

    But the problem is , bowtie doesn't work with this format, can you recommend an easy-to-use tool for windows to convert this format into fasta or just raw?

  • #2
    This could be done with the unix command line but it would be helpful if you can post a few lines enclosed within code brackets like
    Code:
    paste here
    to get a precise idea of the file format.

    Comment


    • #3
      Originally posted by vivek_ View Post
      This could be done with the unix command line but it would be helpful if you can post a few lines enclosed within code brackets like
      Code:
      paste here
      to get a precise idea of the file format.
      Code:
      SCS_0004:2:1:1053:18066#0/1	AGCAATATTGACTACANCCTCATCAAAGCCTGTAGGCACC	[YITQR]MST\WN\\TEQU[`]WU]]WPYXXXOXU]`\W`	5	29	29	chr17:68048647-68172163_36129	3979	+	1	1
      SCS_0004:2:1:1053:18066#0/1	AGCAATATTGACTACANCCTCATCAAAGCCTGTAGGCACC	[YITQR]MST\WN\\TEQU[`]WU]]WPYXXXOXU]`\W`	5	29	29	chr17:68048647-68172163_36130	3979	+	1	1
      SCS_0004:2:1:1053:18066#0/1	AGCAATATTGACTACANCCTCATCAAAGCCTGTAGGCACC	[YITQR]MST\WN\\TEQU[`]WU]]WPYXXXOXU]`\W`	5	29	29	uc008dkh.1	4033	+	1	1
      SCS_0004:2:1:1053:18066#0/1	AGCAATATTGACTACANCCTCATCAAAGCCTGTAGGCACC	[YITQR]MST\WN\\TEQU[`]WU]]WPYXXXOXU]`\W`	5	29	29	chr17:68046720-68172163_36128	3943	+	1	1
      SCS_0004:2:1:1053:18066#0/1	AGCAATATTGACTACANCCTCATCAAAGCCTGTAGGCACC	[YITQR]MST\WN\\TEQU[`]WU]]WPYXXXOXU]`\W`	5	29	29	chr17:68046720-68172163_36127	3943	+	1	1
      SCS_0004:2:1:1054:5070#0/1	TTTCTCTGTCTTGTCCNCCTAGTTTCCCTCCTGTAGGCAC	aaaaaaaaaaaaaaa]EaaaW]]]Yaa\a`[aa]Pa^]VT	2	30	30	chr2:40378133-40378584_42654	395	-	1	1
      SCS_0004:2:1:1054:5070#0/1	TTTCTCTGTCTTGTCCNCCTAGTTTCCCTCCTGTAGGCAC	aaaaaaaaaaaaaaa]EaaaW]]]Yaa\a`[aa]Pa^]VT	2	30	30	chr1:8926487-8927380_99	16	+	1	1
      Something like this. By the way, I don't have unix installed only Mac OS and windows, but as far as I understand Mac OS is a unix-based system, right?
      Last edited by Etherella; 08-30-2012, 03:03 AM.

      Comment


      • #4
        The easiest way would be to write a small script (in python, perl, whatever) to read that in and spit out the same data (sans alignment information) in fastq format. Column 1 is the read id, column 2 is the sequence, and column 3 is the quality score. If you have python installed on your Mac, then the following would probably work (changing INPUT_FILENAME to the name of the file you got from GEO and SOME_OUTPUT_FILE to whatever you want the output to be):
        Code:
        #!/usr/bin/python
        import csv
        
        f = csv.reader(open("INPUT_FILENAME", "r"), dialect="excel-tab")
        output = open("SOME_OUTPUT_FILE", "w")
        
        last = ""
        for line in f :
            if(line[0] != last) :
                output.write(">%s\n" % (line[0]))
                output.write("%s\n" % (line[1]))
                output.write("+\n") 
                output.write("%s\n" % (line[2]))
                last = line[0]
        output.close()
        Something like that would probably work.

        Comment


        • #5
          thanks for the reply, I managed to get it working through galaxy.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Recent Developments in Metagenomics
            by seqadmin





            Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
            09-23-2024, 06:35 AM
          • seqadmin
            Understanding Genetic Influence on Infectious Disease
            by seqadmin




            During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

            Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
            09-09-2024, 10:59 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 10-02-2024, 04:51 AM
          0 responses
          11 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 10-01-2024, 07:10 AM
          0 responses
          18 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 09-30-2024, 08:33 AM
          0 responses
          22 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 09-26-2024, 12:57 PM
          0 responses
          17 views
          0 likes
          Last Post seqadmin  
          Working...
          X