Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • musta1234
    Member
    • Jun 2013
    • 10

    Help with While []... done in bash

    Hello all!! I need help converting tab to fasta... I am a newbie and know only a little bash scripting. I need to convert a tab delimited SNP file into either a single fasta file or a multiple fasta files for each column using the first line as identifier.

    The closest I got was a script that generates the required .fasta files but enters a loop and can only be stopped by ctrl-C.


    #!/bin/bash


    echo ">" > carat.txt

    counter=1
    #My tab file has 64 columns

    while : [ $counter -lt 64]
    do

    less <SNP.txt |awk "{print$"$counter"}"| cat carat.txt - >$counter.fa
    counter=$(($counter +1))

    done

    exit



    SNP_001 SNP_002 SNP_003....
    T T T T
    C C C C
    C C C C
    C C C C
    A A A A
    A A A A
    T T T T
    T T T T
    C C C C
    G G G G
    G G G G
    C C C C
  • GenoMax
    Senior Member
    • Feb 2008
    • 7142

    #2
    See if this thread helps: http://stackoverflow.com/questions/1...a-file-in-bash

    Comment

    • blakeoft
      Member
      • Oct 2013
      • 79

      #3
      Is your input the following:

      T T T T
      C C C C
      C C C C
      C C C C
      A A A A
      A A A A
      T T T T
      T T T T
      C C C C
      G G G G
      G G G G
      C C C C

      and do you want to get

      TCCCAATTCGGC
      TCCCAATTCGGC
      TCCCAATTCGGC
      TCCCAATTCGGC

      back as a result?
      Last edited by blakeoft; 03-31-2014, 05:00 AM. Reason: nitpicky spacing

      Comment

      • GenoMax
        Senior Member
        • Feb 2008
        • 7142

        #4
        I think @musta1234 wants the matrix transposed and then converted to a multi-fasta file.

        >SNP_001
        TCCCAATTCGGC
        >SNP_002
        TCCCAATTCGGC
        >SNP_003
        TCCCAATTCGGC

        Comment

        • musta1234
          Member
          • Jun 2013
          • 10

          #5
          Thats right

          Sorry for the sloppy explanation, but all the nucleotides are from a tab delimited file and Genomax stated the way I want it perfectly.


          >SNP_001
          TCCCAATTCGGC
          >SNP_002
          TCCCAATTCGGC
          >SNP_003
          TCCCAATTCGGC

          ......

          SNP_XXX
          ATGCATGCATGC

          Thanks

          Comment

          • GenoMax
            Senior Member
            • Feb 2008
            • 7142

            #6
            This is a bash shell script based on a solution in the stackoverflow thread I had posted above.

            Save the code in a file (script.sh in example below) and then run as follows:

            Code:
            $ sh script.sh your_data file
            Code:
            #!/bin/bash 
            declare -a array=( )                      # we build a 1-D-array
            
            read -a line < "$1"                       # read the headline
            
            COLS=${#line[@]}                          # save number of columns
            
            index=0
            while read -a line; do
                for (( COUNTER=0; COUNTER<${#line[@]}; COUNTER++ )); do
                    array[$index]=${line[$COUNTER]}
                    ((index++))
                done
            done < "$1"
            
            for (( ROW = 0; ROW < COLS; ROW++ )); do
                    printf ">"
              for (( COUNTER = ROW; COUNTER < ${#array[@]}; COUNTER += COLS )); do
                printf "%s" ${array[$COUNTER]}
                if [ $COUNTER == $ROW ]
                then
                    printf "\n"
                fi
              done
              printf "\n" 
            done

            Comment

            • musta1234
              Member
              • Jun 2013
              • 10

              #7
              Thanks

              I will definitely give it a try...

              Comment

              • musta1234
                Member
                • Jun 2013
                • 10

                #8
                Works GREAT!!!

                Hey Genomax and all!!

                The code works great... handles a file with 160 columns and 128,000 lines very well.

                Thanks

                Originally posted by GenoMax View Post
                This is a bash shell script based on a solution in the stackoverflow thread I had posted above.




                Save the code in a file (script.sh in example below) and then run as follows:

                Code:
                $ sh script.sh your_data file
                Code:
                #!/bin/bash 
                declare -a array=( )                      # we build a 1-D-array
                
                read -a line < "$1"                       # read the headline
                
                COLS=${#line[@]}                          # save number of columns
                
                index=0
                while read -a line; do
                    for (( COUNTER=0; COUNTER<${#line[@]}; COUNTER++ )); do
                        array[$index]=${line[$COUNTER]}
                        ((index++))
                    done
                done < "$1"
                
                for (( ROW = 0; ROW < COLS; ROW++ )); do
                        printf ">"
                  for (( COUNTER = ROW; COUNTER < ${#array[@]}; COUNTER += COLS )); do
                    printf "%s" ${array[$COUNTER]}
                    if [ $COUNTER == $ROW ]
                    then
                        printf "\n"
                    fi
                  done
                  printf "\n" 
                done

                Comment

                Latest Articles

                Collapse

                • GATTACAT
                  Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                  by GATTACAT
                  Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                  07-01-2026, 11:43 AM
                • SEQadmin2
                  Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                  by SEQadmin2


                  I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                  Here are nine questions we think about, in roughly the order they matter, before...
                  06-18-2026, 07:11 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by SEQadmin2, 07-02-2026, 11:08 AM
                0 responses
                12 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-30-2026, 05:37 AM
                0 responses
                14 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-26-2026, 11:10 AM
                0 responses
                20 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-17-2026, 06:09 AM
                0 responses
                54 views
                0 reactions
                Last Post SEQadmin2  
                Working...