Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • kmcarr
    Senior Member
    • May 2008
    • 1181

    #16
    Originally posted by mghita View Post
    I have added the program to my path and I set the permission right, but now I have another issue:
    "You need the Rosetta software to run faSomeRecords. The Rosetta installer is in Optional Installs on your Mac OS X installation disc."

    and I don't have Rosetta installed, or the CD for installation, so I don't know how to handle this problem. Any suggestions?


    Thanks,
    Madalina
    Originally posted by GenoMax View Post
    Madalina,

    If you are connected to the internet you should automatically be offered the option to download rosetta and install it.

    Do you have a PowerPC- or an intel-based Mac? What OS are you running?
    Originally posted by mghita View Post
    I have Mac OS X 10.6.8, 3.06 GHz. I just get that message in bash, I don't get any install option. I tried to download it, but it doesn't work.
    Madalina,

    Your Mac has an Intel CPU but the version of faSomeRecords which you are trying to run is compiled for PowerPC based Macs. You could try to intall Rosetta (Rosetta is a compatibility layer which allows PPC code to run on Intel Macs) but the easier course of action would be to install a proper version of the binary for your computer.

    If you go back to the download site (http://hgdownload.cse.ucsc.edu/admin/exe/) you will see that there are two directories for macOSX software, one for PowerPC (macOSX.ppc) and one for Intel (macOSX.i386). Make sure to download and install the program from the macOSX.i386 directory.

    Comment

    • mghita
      Member
      • Aug 2011
      • 10

      #17
      Hi,

      Yes, that seems to work, but the command itself doesn't. The reads in my fasta file (file.fas) are named @Frag_1, @Frag_2 .... @Frag_20000. I want to extract some of them - I have their names in a text file (diff.txt) saved like this

      @Frag_93
      @Frag_530
      @Frag_2183
      @Frag_3988
      @Frag_7733

      I used:

      faSomeRecord file.fas diff.txt output.fas

      and output.fas is empty. Any idea why this happens?


      Thanks
      Madalina

      Comment

      • GenoMax
        Senior Member
        • Feb 2008
        • 7142

        #18
        Originally posted by mghita View Post
        Hi,

        Yes, that seems to work, but the command itself doesn't. The reads in my fasta file (file.fas) are named @Frag_1, @Frag_2 .... @Frag_20000. I want to extract some of them - I have their names in a text file (diff.txt) saved like this

        @Frag_93
        @Frag_530
        @Frag_2183
        @Frag_3988
        @Frag_7733

        I used:

        faSomeRecord file.fas diff.txt output.fas

        and output.fas is empty. Any idea why this happens?


        Thanks
        Madalina
        NOTE: Please use new names for the files as shown below on the command lines. This would preserve your original files as they are.

        Madalina,

        The program is expecting the fasta identifiers to start with ">" rather than "@". You can do the replacement with a program called "sed" that should be there in MacOS (do not have a Mac handy to check that out).

        Do this on the command line (note single quotes):

        sed 's/@/>/g' original_fasta_file > new_file.fas

        The "new_file.fas" should have all "@" replaced by ">".

        Remember you need fasta id's (without the ">") in the file you supply for extraction. You can use the same "sed" program to strip the "@" signs from your fasta identifiers like this,

        sed 's/@//g' diff.txt new_diff.txt

        Now you can use the two new files you created to get the output.

        faSomeRecord new_file.fas new_diff.txt output.fas
        Last edited by GenoMax; 08-09-2011, 04:36 AM. Reason: adding_info_to_clarify

        Comment

        • mghita
          Member
          • Aug 2011
          • 10

          #19
          I have given up. I replaced the @ with > and still didn't work. I have combined a little awk and R and does my job just fine. Thanks a lot for the effort!

          Madalina

          Comment

          • scopak
            Junior Member
            • Dec 2009
            • 1

            #20
            krobison, I too like Perl one-liners.

            In the example below, sed bookends are used to add and remove blank lines for the regex search.

            sed 's/^>.*/\n&/' <in.fasta | perl -e ' while(<>){ print if(/^>chr1/.../^\n/); }' | sed '/^$/d' >patterns.fasta

            Sed is used to add a blank line above each fasta record beginning with '>.*' in the file in.fasta. The stdout is then piped to a Perl range finder that searches for lines that begin with >chr1 and all sequence lines to the next blank line (^\n).
            Finally, blank lines are removed with sed and the matching records are saved to the outfile, patterns.fasta.

            Hope that helps

            Comment

            • julianeishida
              Junior Member
              • Mar 2012
              • 1

              #21
              Thanks.

              I didn`t know about Biopieces. It is really useful. Highly recommended for those whose programing ability is low

              Comment

              • swaraj
                Member
                • Feb 2012
                • 50

                #22
                A quick way to do in bioperl

                Comment

                • pjyoti
                  Junior Member
                  • Nov 2011
                  • 2

                  #23
                  hello everyone...

                  I am using the following perl script for retrieving sequences in fasta format.....


                  use Bio::Perl;
                  $database="genbank";
                  $format="fasta";
                  $pipe ="\\|";
                  $space = " ";
                  open(INPUTFILE, "<1.txt");
                  while(<INPUTFILE>)
                  {
                  my($line) = $_;
                  chomp($line);
                  $line=~ s/$space/:/;
                  $line=~ s/$pipe/$space/;
                  $line=~ s/g/G/;
                  $line=~ s/i/I/;
                  $id= "$line";
                  #print "$id";
                  #print "\n";
                  $sequence = get_sequence($database, $id);
                  $test = write_sequence( ">>sequences_1.txt", $format, $sequence);
                  open (CHK , ">>checking.txt");
                  print CHK <<HERE;
                  $test
                  HERE
                  close CHK;
                  }
                  exit;



                  after getting some sequences i am getting an error messege....

                  -----------Exception-------------
                  MSG: WebDBSeqI Request Error:
                  HTTP/1.1 502 Bad Gateway
                  connection: close
                  Date:
                  .
                  .
                  .
                  .
                  .
                  .
                  <?xml version="1.0" encoding="ISO-8859-1"?




                  The proxy server received an invalid response from an upstream server.


                  plz help me out...

                  Comment

                  • pjyoti
                    Junior Member
                    • Nov 2011
                    • 2

                    #24
                    hello everyone...

                    I am using the following perl script for retrieving sequences in fasta format.....


                    use Bio::Perl;
                    $database="genbank";
                    $format="fasta";
                    $pipe ="\\|";
                    $space = " ";
                    open(INPUTFILE, "<1.txt");
                    while(<INPUTFILE>)
                    {
                    my($line) = $_;
                    chomp($line);
                    $line=~ s/$space/:/;
                    $line=~ s/$pipe/$space/;
                    $line=~ s/g/G/;
                    $line=~ s/i/I/;
                    $id= "$line";
                    #print "$id";
                    #print "\n";
                    $sequence = get_sequence($database, $id);
                    $test = write_sequence( ">>sequences_1.txt", $format, $sequence);
                    open (CHK , ">>checking.txt");
                    print CHK <<HERE;
                    $test
                    HERE
                    close CHK;
                    }
                    exit;



                    after getting some sequences i am getting an error messege....

                    -----------Exception-------------
                    MSG: WebDBSeqI Request Error:
                    HTTP/1.1 502 Bad Gateway
                    connection: close
                    Date:
                    .
                    .
                    .
                    .
                    .
                    .
                    <?xml version="1.0" encoding="ISO-8859-1"?
                    <!DOCTYPE html PUBLIC "-//W#C//DTD XHTML 1.0 Strict//EN"
                    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
                    <html xmlns="htttp://www.org/1999/xhtml" lang="en" xm:lang="en"
                    <head>
                    <title>Bad Gateway!</title> <link rev="made" href="mailto:[email protected]"/>





                    The proxy server received an invalid response from an upstream server.


                    plz help me out...

                    Comment

                    • vivek
                      Junior Member
                      • Sep 2010
                      • 2

                      #25
                      Dear ......,

                      I follow the same steps but it is not working ...

                      Vivek

                      Originally posted by apc2010 View Post
                      If you need sequences extracted from a multi-FASTA and are open to using a pre-existing tool, I would also suggest either the faSomeRecords or faOneRecord command line utilities from UCSC.

                      They have versions of this tool for OSX and Linux. Here is a link to the executable downloads:



                      The difference between the two: faOneRecord takes the sequence name to extract from the command line, faSomeRecords reads in a file of 1 or more sequence names to extract from the multi-FASTA.

                      Usage:
                      Code:
                      ================================================================
                      ========   faOneRecord   ====================================
                      ================================================================
                      faOneRecord - Extract a single record from a .FA file
                      usage:
                         faOneRecord in.fa recordName
                      
                      ================================================================
                      ========   faSomeRecords   ====================================
                      ================================================================
                      faSomeRecords - Extract multiple fa records
                      usage:
                         faSomeRecords in.fa listFile out.fa
                      options:
                         -exclude - output sequences not in the list file.
                      Vivek Keshri

                      Comment

                      • yzzhang
                        Member
                        • Jan 2013
                        • 67

                        #26
                        don't contain > in the file list, the script faSomeRecords can work well.
                        Originally posted by mghita View Post
                        I have given up. I replaced the @ with > and still didn't work. I have combined a little awk and R and does my job just fine. Thanks a lot for the effort!

                        Madalina

                        Comment

                        • ML1975
                          Junior Member
                          • Dec 2017
                          • 3

                          #27
                          Originally posted by boetsie View Post
                          Hi,

                          I've attached a script which can do this. If i understand it correctly you have a file like;

                          >chr1
                          AGCTGATGATAGT...
                          >chr2
                          ACAAAATAGTCGAT....
                          >chr3
                          ....

                          And your perl script would be something like;

                          perl extractSequence.pl genomefile.fa chr1

                          where 'chr1' corresponds to a sequence named chr1 (indicated by chr1)?

                          Say you have a more complicated file like;

                          >chr1_coverage1000_length100
                          AGATGTATGTTAGA

                          You can do something like;

                          perl extractSequence.pl genomefile.fa chr1_.

                          which will extract all the sequences containing the header chr1_

                          To store the results, do;

                          perl extractSequence.pl genomefile.fa chr1 > filename.txt

                          If this is what you want, you can use my script.

                          Boetsie
                          7 years later and I have used your script - thanks for sharing Works a treat!

                          Comment

                          • kausikmhg
                            Junior Member
                            • Jul 2012
                            • 3

                            #28
                            Originally posted by boetsie View Post
                            Hi,

                            I've attached a script which can do this. If i understand it correctly you have a file like;

                            >chr1
                            AGCTGATGATAGT...
                            >chr2
                            ACAAAATAGTCGAT....
                            >chr3
                            ....

                            And your perl script would be something like;

                            perl extractSequence.pl genomefile.fa chr1

                            where 'chr1' corresponds to a sequence named chr1 (indicated by chr1)?

                            Say you have a more complicated file like;

                            >chr1_coverage1000_length100
                            AGATGTATGTTAGA

                            You can do something like;

                            perl extractSequence.pl genomefile.fa chr1_.

                            which will extract all the sequences containing the header chr1_

                            To store the results, do;

                            perl extractSequence.pl genomefile.fa chr1 > filename.txt

                            If this is what you want, you can use my script.

                            Boetsie

                            Hello,

                            Can you please tell me how can I fetch multiple identifiers like chr1 chr2 chr3 chr5 etc putting them into a single file using your script? I believe this script doesn't take a file with several identifiers and when i tried it showed me a black file output instead.

                            Thank a lot if you can help

                            Comment

                            Latest Articles

                            Collapse

                            • SEQadmin2
                              From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                              by SEQadmin2


                              Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                              The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                              ...
                              06-02-2026, 10:05 AM
                            • SEQadmin2
                              Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                              by SEQadmin2


                              With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                              Introduction

                              Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                              05-22-2026, 06:42 AM
                            • SEQadmin2
                              Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                              by SEQadmin2

                              Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                              Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                              05-06-2026, 09:04 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by SEQadmin2, Yesterday, 08:59 AM
                            0 responses
                            14 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, 06-02-2026, 12:03 PM
                            0 responses
                            22 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, 06-02-2026, 11:40 AM
                            0 responses
                            19 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, 05-28-2026, 11:40 AM
                            0 responses
                            32 views
                            0 reactions
                            Last Post SEQadmin2  
                            Working...