Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • pchiang
    Junior Member
    • Jun 2011
    • 3

    Align multiple sequences in tabular or fasta format

    Hi Folks,

    I have ~100,000 short sequences (~25bp long) in fasta format. They are oligo probes used in affymetrix mouse 430-2 chip. I want to align all the sequences with mm9 genomic database to get either GFF or BED output. Can anyone suggest a good web- or windows-based tool for this purpose?

    The following is an example of the first probe, thanks!

    >probe:Mouse430_2:1415670_at:269:753; Interrogation_Position=2436; Antisense;
    GGCTGATCACATCCAAAAAGTCATG
  • NicoBxl
    not just another member
    • Aug 2010
    • 264

    #2
    There a several short read aligner for this purpose :

    - Bowtie
    - Soap2
    - BWA
    - Novoalign
    - ...

    Comment

    • husamia
      Member
      • Apr 2010
      • 66

      #3
      For online based, I have seen Galaxy which i think would be good option since your dataset it small.

      Comment

      • pchiang
        Junior Member
        • Jun 2011
        • 3

        #4
        Thanks to NicoBxl and husamia.

        Still trying to understand how to install bowtie in windows....

        I did tried galaxy using my fasta files. It turned out in error "reads file does not look like a FASTQ file." Galaxy requires 2 more columns (strandness and quality score) to run the alignment. However, it is not working even I tried to add 2 dummy columns and change the file identity from FASTA to FASTQ.

        Does anybody know how to run alignment without going through FASTQ requirement on galaxy? Thanks a million!

        Comment

        • kwatts59
          Member
          • Apr 2011
          • 46

          #5
          Write a simple PERL script to convert your FASTA format into a FASTQ format.
          Then run bowtie to do the alignment.

          Comment

          • Kennels
            Senior Member
            • Feb 2011
            • 149

            #6
            Galaxy should auto detect your format, and it should be able to take up fasta formats. If it is spitting out a fastq related error, make sure you are uploading with the correct options.
            Otherwise, the headers to your fasta file may be causing problems? Not sure if you can use wordpad or some other program in windows to change the headers to something simpler if you aren't familiar with command line.
            There are windows large text file editor programs such as 'gVim', or google for one.

            Comment

            • husamia
              Member
              • Apr 2010
              • 66

              #7
              Originally posted by Kennels View Post
              There are windows large text file editor programs such as 'gVim', or google for one.
              anybody has experience with opening large text files such as fasta in windows? I usually like to use search and replace function alot what are some good editors for large files ~12GB
              I know this is huge file but I wonder if there anybody know of editor that responsibly handles such files without hogging up memory or crashing.

              Comment

              • pchiang
                Junior Member
                • Jun 2011
                • 3

                #8
                Turned out working by aligning using bowtie! Thank you everyone for your suggestions.

                Comment

                Latest Articles

                Collapse

                • SEQadmin2
                  Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                  by SEQadmin2


                  I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                  Here are nine questions we think about, in roughly the order they matter, before...
                  06-18-2026, 07:11 AM
                • SEQadmin2
                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                  by SEQadmin2


                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                  ...
                  06-02-2026, 10:05 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by SEQadmin2, Today, 05:37 AM
                0 responses
                5 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-26-2026, 11:10 AM
                0 responses
                16 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-17-2026, 06:09 AM
                0 responses
                50 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-09-2026, 11:58 AM
                0 responses
                109 views
                0 reactions
                Last Post SEQadmin2  
                Working...