Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with While []... done in bash

    Hello all!! I need help converting tab to fasta... I am a newbie and know only a little bash scripting. I need to convert a tab delimited SNP file into either a single fasta file or a multiple fasta files for each column using the first line as identifier.

    The closest I got was a script that generates the required .fasta files but enters a loop and can only be stopped by ctrl-C.


    #!/bin/bash


    echo ">" > carat.txt

    counter=1
    #My tab file has 64 columns

    while : [ $counter -lt 64]
    do

    less <SNP.txt |awk "{print$"$counter"}"| cat carat.txt - >$counter.fa
    counter=$(($counter +1))

    done

    exit



    SNP_001 SNP_002 SNP_003....
    T T T T
    C C C C
    C C C C
    C C C C
    A A A A
    A A A A
    T T T T
    T T T T
    C C C C
    G G G G
    G G G G
    C C C C

  • #2
    See if this thread helps: http://stackoverflow.com/questions/1...a-file-in-bash

    Comment


    • #3
      Is your input the following:

      T T T T
      C C C C
      C C C C
      C C C C
      A A A A
      A A A A
      T T T T
      T T T T
      C C C C
      G G G G
      G G G G
      C C C C

      and do you want to get

      TCCCAATTCGGC
      TCCCAATTCGGC
      TCCCAATTCGGC
      TCCCAATTCGGC

      back as a result?
      Last edited by blakeoft; 03-31-2014, 05:00 AM. Reason: nitpicky spacing

      Comment


      • #4
        I think @musta1234 wants the matrix transposed and then converted to a multi-fasta file.

        >SNP_001
        TCCCAATTCGGC
        >SNP_002
        TCCCAATTCGGC
        >SNP_003
        TCCCAATTCGGC

        Comment


        • #5
          Thats right

          Sorry for the sloppy explanation, but all the nucleotides are from a tab delimited file and Genomax stated the way I want it perfectly.


          >SNP_001
          TCCCAATTCGGC
          >SNP_002
          TCCCAATTCGGC
          >SNP_003
          TCCCAATTCGGC

          ......

          SNP_XXX
          ATGCATGCATGC

          Thanks

          Comment


          • #6
            This is a bash shell script based on a solution in the stackoverflow thread I had posted above.

            Save the code in a file (script.sh in example below) and then run as follows:

            Code:
            $ sh script.sh your_data file
            Code:
            #!/bin/bash 
            declare -a array=( )                      # we build a 1-D-array
            
            read -a line < "$1"                       # read the headline
            
            COLS=${#line[@]}                          # save number of columns
            
            index=0
            while read -a line; do
                for (( COUNTER=0; COUNTER<${#line[@]}; COUNTER++ )); do
                    array[$index]=${line[$COUNTER]}
                    ((index++))
                done
            done < "$1"
            
            for (( ROW = 0; ROW < COLS; ROW++ )); do
                    printf ">"
              for (( COUNTER = ROW; COUNTER < ${#array[@]}; COUNTER += COLS )); do
                printf "%s" ${array[$COUNTER]}
                if [ $COUNTER == $ROW ]
                then
                    printf "\n"
                fi
              done
              printf "\n" 
            done

            Comment


            • #7
              Thanks

              I will definitely give it a try...

              Comment


              • #8
                Works GREAT!!!

                Hey Genomax and all!!

                The code works great... handles a file with 160 columns and 128,000 lines very well.

                Thanks

                Originally posted by GenoMax View Post
                This is a bash shell script based on a solution in the stackoverflow thread I had posted above.




                Save the code in a file (script.sh in example below) and then run as follows:

                Code:
                $ sh script.sh your_data file
                Code:
                #!/bin/bash 
                declare -a array=( )                      # we build a 1-D-array
                
                read -a line < "$1"                       # read the headline
                
                COLS=${#line[@]}                          # save number of columns
                
                index=0
                while read -a line; do
                    for (( COUNTER=0; COUNTER<${#line[@]}; COUNTER++ )); do
                        array[$index]=${line[$COUNTER]}
                        ((index++))
                    done
                done < "$1"
                
                for (( ROW = 0; ROW < COLS; ROW++ )); do
                        printf ">"
                  for (( COUNTER = ROW; COUNTER < ${#array[@]}; COUNTER += COLS )); do
                    printf "%s" ${array[$COUNTER]}
                    if [ $COUNTER == $ROW ]
                    then
                        printf "\n"
                    fi
                  done
                  printf "\n" 
                done

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Recent Developments in Metagenomics
                  by seqadmin





                  Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
                  09-23-2024, 06:35 AM
                • seqadmin
                  Understanding Genetic Influence on Infectious Disease
                  by seqadmin




                  During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

                  Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
                  09-09-2024, 10:59 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 10-02-2024, 04:51 AM
                0 responses
                13 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 10-01-2024, 07:10 AM
                0 responses
                22 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 09-30-2024, 08:33 AM
                0 responses
                26 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 09-26-2024, 12:57 PM
                0 responses
                18 views
                0 likes
                Last Post seqadmin  
                Working...
                X