Seqanswers Leaderboard Ad

**pmiguel** · 09-04-2012, 11:32 AM

On the sorted names?
Try:

grep ">" file.fasta | sort |uniq -d

If you get nothing, try:

perl -lane '$a{$F[0]}++;END{for (keys %a){print if $a{$_}>1;}}'

If still nothing, something else if probably going on. Not clear what you mean by:

"Picard tells me I have duplicate sam sequences"

Is "sam" a typo or you are working with a SAM file?

--
Phillip

**sklages** · 09-05-2012, 06:48 AM

Originally posted by ercfrtz View Post

I have a fasta file I created from the bovine gene information and I ran a uniq -d command on the names to make sure I didn't have any name duplicated. But when I use it as a reference and align reads to it and then try to run those reads through picard. Picard tells me I have duplicate sam sequences.

Does anyone know of a simple solution to this or have a way to identify those troubling sequences that appear to have the same name, even though the uniq command won't identify them?

How have you applied your 'uniq' command? Keep in mind that the fasta name is everything up to the first whitespace in the definition line. So something like

>bumblebee is great

and

>bumblebee is yellow

is for most programs the same name. Uniq'ing the definition lines is not enough.

Sven

**ercfrtz** · 09-05-2012, 06:51 AM

It looks like my file wasn't sorted and I thought it was, so uniq wasn't catching certain things. Using sort into uniq fixed the issue. Thanks for the replies.

Topics	Statistics	Last Post
Gene Misexpression in the Healthy Human Population by seqadmin Started by seqadmin, Yesterday, 06:46 AM	0 responses 9 views 0 likes	Last Post by seqadmin Yesterday, 06:46 AM
New Method for Rapid Genetic Diagnosis of Mendelian Disorders by seqadmin Started by seqadmin, 07-24-2024, 11:09 AM	0 responses 25 views 0 likes	Last Post by seqadmin 07-24-2024, 11:09 AM
Advancing Nanopore Technology for Portable Sensing Devices by seqadmin Started by seqadmin, 07-19-2024, 07:20 AM	0 responses 159 views 0 likes	Last Post by seqadmin 07-19-2024, 07:20 AM
New RNA-Based Gene Writing Technology Achieves Precise Gene Integration by seqadmin Started by seqadmin, 07-16-2024, 05:49 AM	0 responses 127 views 0 likes	Last Post by seqadmin 07-16-2024, 05:49 AM

Seqanswers Leaderboard Ad

Announcement

Non-uniq names in FASTA

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News