Unconfigured Ad

**dschika** · 03-15-2015, 06:11 AM

This command should do the job:

Code:

awk '{if ($1 ~ /SRR/) {split($0, T, ".1 "); print T[1], T[2]}  else print $0}' YOURINPUT.fastq

Since you mentioned sed and awk I assume you know that $1 = first field, $2 = second field (fields separated by whitespaces if not defined otherwise by FS) and $0 = whole line.

The command checks if in the first field SRR is present ("if ($1 ~ /SRR/)"). If yes, it splits the content of the first line by ".1 " and stores the result in T. In this case ".1 " can be found only once in each line with SRR. Therefore, printing of T[0] and T[1] results in the line without ".1 ".

If the line does not contain SRR (i.e., we are at a line with sequence or quality values) print the whole line.

**yao_licr** · 03-15-2015, 07:36 AM

Thanks for the detailed explanation. It works well.

**dschika** · 03-15-2015, 08:02 AM

No problem, I just realized, that I was thinking way to complicated:

sed 's/\.1 / /g' YOURINPUT.fastq > out.fastq

Just replace ".1 " with " " and escape the "." with a backslash.

**yao_licr** · 03-15-2015, 08:11 AM

Sorry, ".1" is the pattern to be replace by " ", so why do you put an empty space after 1? I mean the empty space in "/\.1 /" .

Thanks!

**yao_licr** · 03-15-2015, 08:17 AM

I got it now, as I have .1.1 in the first read; if there is no empty space after "\.1", both of them will be replaced. In your script, ".1 " is the pattern but not ".1" .

Thanks!

**dschika** · 03-15-2015, 08:20 AM

Correct, you're welcome!

Topics	Statistics	Last Post
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 13 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 48 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 107 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 125 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM

Unconfigured Ad

substitution using sed or awk

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News