Seqanswers Leaderboard Ad

**dschika** · 10-18-2015, 10:50 PM

You could try:

sed 's/^\([ATGC]*\)N\([ATGCN]*\)/\1/g' YOURFILE > NEWFILE

at least if you have a *nix operating system.

**Pol8** · 10-19-2015, 06:37 AM

Thanks, that works only for the first hit of my fasta file. How can I do that for all the sequences?

**blancha** · 10-19-2015, 06:46 AM

This command is cute and simple.
If there any Ns in the sequence identifiers of your FASTA file, they will also be trimmed off, however.

Code:

cut -d N -f 1 test.fa

Code:

[blancha@lg-1r17-n04 ~]$ more test.fa 
>R1
GGGGGGGTTTTTTTTTTTTTTTNT
>R2
GGGGGGGGGGTTTTTTTTTTNTTNT
[blancha@lg-1r17-n04 ~]$ cut -d N -f 1 test.fa 
>R1
GGGGGGGTTTTTTTTTTTTTTT
>R2
GGGGGGGGGGTTTTTTTTTT

**GenoMax** · 10-19-2015, 06:58 AM

Originally posted by Pol8 View Post

Thanks, that works only for the first hit of my fasta file. How can I do that for all the sequences?

Do you have a dos/OS X formatted file? You may need to pass it through dos2unix/mac2unix utility before using @dschika's code.

@dschika's code works for me.

**dschika** · 10-19-2015, 07:01 AM

Is GenoMax's assumption right? That may cause problems...

Just in case:

Code:

sed 's/^\([ATGC]*\)N\([ATGCN]*\)/\1/g' test.fa
>R1
GGGGGGGTTTTTTTTTTTTTTT
>R2
GGGGGGGGGGTTTTTTTTTT

... and this solution does not alter the sequence identifiers

**blancha** · 10-19-2015, 07:07 AM

I hadn't realised the sed command worked.
@dscika, better than my cut example too.

I'm posting my awk command anyway, since I spent a bit of time researching it.
It will not trim the N in the sequence identifier, like my simpler cut command.

Code:

awk -F "N" '{if (NR % 2==0) {print $1} else {print}}' test2.fa

Code:

[blancha@lg-1r17-n04 ~]$ more test2.fa 
>R1 N in the sequence identifier.
GGGGGGGTTTTTTTTTTTTTTTNT
>R2
GGGGGGGGGGTTTTTTTTTTNTTNT
[blancha@lg-1r17-n04 ~]$ awk -F "N" '{if (NR % 2==0) {print $1} else {print}}' test2.fa 
>R1 N in the sequence identifier.
GGGGGGGTTTTTTTTTTTTTTT
>R2
GGGGGGGGGGTTTTTTTTTT

**dschika** · 10-19-2015, 07:28 AM

Also quite nice!

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Today, 11:49 AM	0 responses 8 views 0 likes	Last Post by seqadmin Today, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

cut reads befor N

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News