Unconfigured Ad

**Heisman** · 07-08-2013, 08:54 PM

I am on my phone and can't type anything elegant (and I don't know perl), but if you want to get the job done with basic linux tools you can look up how to print out every other line in a file with sed (google sed one liners if you can't find it easily), make these separate files, then you can use the paste command followed by the tr command to convert the tabs to new line characters and get what you want. It is ugly but you should be able to figure it out quickly. Use the lines you posted above as test files so you don't waste time practicing with large files.

**Heisman** · 07-08-2013, 09:41 PM

Here's what I had in mind. Save this in a script, give yourself permission to execute it, and then run it as: ./script file1 file2 output

Code:

#! /bin/bash

file_1=$1
file_2=$2
output=$3

sed -n '1,${p;n}' $file_1 > temp1
sed -n '1,${n;p}' $file_1 > temp2
sed -n '1,${p;n;n;n}' $file_2 > temp3
sed -n '1,${n;p;n;n}' $file_2 > temp4
sed -n '1,${n;n;p;n}' $file_2 > temp5
sed -n '1,${n;n;n;p}' $file_2 > temp6
paste temp1 temp2 temp3 temp4 temp1 temp2 temp5 temp6 | tr '\t' '\n' > $output

rm temp1 temp2 temp3 temp4 temp5 temp6

This is quite inefficient with large files but should introduce some basic commands. You can make it a lot faster by running all of the sed commands together and then having it wait for them to complete prior to putting them together:

Code:

#! /bin/bash

file_1=$1
file_2=$2
output=$3

sed -n '1,${p;n}' $file_1 > temp1 &
pid1=$!
sed -n '1,${n;p}' $file_1 > temp2 &
pid2=$!
sed -n '1,${p;n;n;n}' $file_2 > temp3 &
pid3=$!
sed -n '1,${n;p;n;n}' $file_2 > temp4 &
pid4=$!
sed -n '1,${n;n;p;n}' $file_2 > temp5 &
pid5=$!
sed -n '1,${n;n;n;p}' $file_2 > temp6 &
pid6=$!

wait $pid1 $pid2 $pid3 $pid4 $pid5 $pid6

paste temp1 temp2 temp3 temp4 temp1 temp2 temp5 temp6 | tr '\t' '\n' > $output

rm temp1 temp2 temp3 temp4 temp5 temp6

But obviously with perl you can read in both files and just output the lines in the order you desire. So definitely figure that out too. But it is nice to be able to get stuff done with linux commands while learning how to do things in a much better fashion with a scripting language, so if you can understand how this works that would also be useful.

**martinghunt** · 07-09-2013, 12:21 PM

Assuming your files are called 1.fa and 2.fa, this hack will work:

Code:

samtools faidx 2.fa
awk '{id=substr($1,2); getline; for (i=1;i<3;i++){print ">"id; print; system("samtools faidx 2.fa "id"_probe"i)}}' 1.fa

awk is pretty powerful for this kind of thing.

**HMorrison** · 07-10-2013, 06:15 AM

as a one-off solution:

sed -e '$!N;s/\n/\t/' file1 > col1
sed -e '$!N;s/\n/\t/' file2 | sed -e '$!N;s/\n/\t/' > col2
paste col1 col2 | fmt -5

>seq1
TTTGGATTACAAAGTTATTTAAATCACATGT....
>seq1_probe1
CTTTGTCCTTGTCCTTGGTGGCGG....
>seq1_probe2
ATTTCTTCTCATCCTCCTCCTCCTA....
>seq2
GCCGTGCCATTTCAATTACAAATACATAATA....
>seq2_probe1
ACTAAAAACTCGTTGAAGAAATCC....
>seq2_probe2
AGGATATAACACACAGCCATCACC....

**huma Asif** · 09-08-2014, 10:45 AM

how i can convert
>1...>2....>3...>10000 to >1

and
>1..>2..>3....>10000 for b.fasta to >2 and
same for for all my 5 samples

**HMorrison** · 09-08-2014, 10:52 AM

Originally posted by huma Asif View Post

how i can convert
>1...>2....>3...>10000 to >1

and
>1..>2..>3....>10000 for b.fasta to >2 and
same for for all my 5 samples

I do not understand the question. Can you explain further?

**westerman** · 09-08-2014, 11:05 AM

I agree with @HMorrison -- the question needs to be stated better. That said 'fastx_renamer' will rename FastA files.

**GenoMax** · 09-08-2014, 11:07 AM

This is the parent thread with "some" additional information: http://seqanswers.com/forums/showthread.php?t=46474

**huma Asif** · 09-08-2014, 11:19 AM

i created fasta from vcf file using target intervals so now in fasta file i have the same number of header as the coordinates in bed

so what i am doing is i want to cat all these sequences

Topics	Statistics	Last Post
High-Resolution Sequencing Exposes Hidden Toxoplasma Diversity by SEQadmin2 Started by SEQadmin2, 07-02-2026, 11:08 AM	0 responses 12 views 0 reactions	Last Post by SEQadmin2 07-02-2026, 11:08 AM
New AI Model Captures Long-Range Genomic Signals to Improve RNA Splice Site Prediction by SEQadmin2 Started by SEQadmin2, 06-30-2026, 05:37 AM	0 responses 14 views 0 reactions	Last Post by SEQadmin2 06-30-2026, 05:37 AM
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 20 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 54 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM

Unconfigured Ad

Combine FASTA files in a specific order based on sequence ID

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News