Seqanswers Leaderboard Ad

**dpryan** · 07-25-2013, 01:33 AM

Edit: I should have refreshed before posting, Maubp's suggestion is probably easier!

You might be able to just use grep (grep -v -f ids.file ...), though you might have to collapse the read name and sequence onto one line (with awk) first, then pipe that into grep and pipe the output back to awk to split things back again.

**flacchy** · 07-25-2013, 01:36 AM

thank you ...
but could you please be a little more specific ... I am very new to this field and I am still learning... also I have to consider if using the splitted files or if I can use my big file (which contain ~20 mil reads)...

Thanks in advance for every advice and suggestion

**flacchy** · 07-25-2013, 03:11 AM

so grep is not working..or at least not in this case... so I'll try to explain better...

1. I have a fasta file splitted (each file contain 3 mil reads) with line like this:
>D3P26HQ1:180:c0yj8acxx:4:2208:3279:56003 1:N:0:TGTCAA
CCTCACCAGCCGCACGAACACGCCCCCGCTGAGCAAGCATCCCGTGGCGTCAGCGGATGAGCGACGCGGAGACAGCACCTGACCCATGTTGATGTAGTGT
>D3P26HQ1:180:c0yj8acxx:4:1108:21179:84973 1:N:0:AGACCA
CACGTCTGAACTCCAGTCACACAGTGATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAGAAAGAAAGAAGAAACAGTGGGAGAGTGGGGGGGACGGAG
>D3P26HQ1:180:c0yj8acxx:4:2103:16692:4396 1:N:0:TGTCAA
GATCGGAAGAGCACACGTCTGAACTCCAGTCACAGTCAAATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAACAACACCAAAAAGGTGAAGAGATCGATA
>D3P26HQ1:180:c0yj8acxx:4:2307:9878:25361 1:N:0:GGCTAA
ACAGCAATAACTGTGCCGCCATCGTCAGAATATTGGCGGGCGATTTTCATGATTTGAATTTTGTGACGAATATCTAAGCTTGAGATTGGCTAGATCTGAA

2. a txt file with ID's of reads I want to remove
D3P26HQ1:180:c0yj8acxx:4:2316:9843:31035
D3P26HQ1:180:c0yj8acxx:4:2316:9844:63006
D3P26HQ1:180:c0yj8acxx:4:2316:9885:5144
D3P26HQ1:180:c0yj8acxx:4:2316:9888:45894
D3P26HQ1:180:c0yj8acxx:4:2316:9914:29032

What I want to do is remove all the reads that have the ID's presents into the txt file.

Can anyone help me sort this out???

thanks!!!!!

**dpryan** · 07-25-2013, 03:58 AM

It'll be easier for you to use the python script that maubp linked to.

**rhinoceros** · 07-25-2013, 05:57 AM

I usually use filter_fasta.py from QIIME for this purpose..

**flacchy** · 07-25-2013, 06:02 AM

Thanks rhinoceros... that sounds nice.. but instead of keeping the sequence in the list can I just discard them?? or create two files one with the kept reads and one with the discarded???

filter_fasta.py -f inseqs.fasta -o list_filtered_seqs.fasta -s seqs_to_keep.txt

**rhinoceros** · 07-25-2013, 06:04 AM

Originally posted by flacchy View Post

Thanks rhinoceros... that sounds nice.. but instead of keeping the sequence in the list can I just discard them?? or create two files one with the kept reads and one with the discarded???

filter_fasta.py -f inseqs.fasta -o list_filtered_seqs.fasta -s seqs_to_keep.txt

filter_fasta.py -f inseqs.fasta -o list_filtered_seqs.fasta -s seqs_to_keep.txt
filter_fasta.py -f inseqs.fasta -o list_filtered_seqs.fasta -s seqs_to_remove.txt -n

Note, it's not a standalone solution but has dependencies so you need to have qiime installed (which is highly recommended because there's a ton of other useful stuff too)..

**flacchy** · 07-25-2013, 06:07 AM

Thanks so so much... I'll try ... just one more thing... I know I am a bit of a pain but I've seriously started 3 months ago and there are tons of things I need to learn...

could I only use this: filter_fasta.py -f inseqs.fasta -o list_filtered_seqs.fasta -s seqs_to_remove.txt -n???
and have my clean file with only the reads that are not presents in the ID list???

**rhinoceros** · 07-25-2013, 06:09 AM

Originally posted by flacchy View Post

Thanks so so much... I'll try ... just one more thing... I know I am a bit of a pain but I've seriously started 3 months ago and there are tons of things I need to learn...

could I only use this: filter_fasta.py -f inseqs.fasta -o list_filtered_seqs.fasta -s seqs_to_remove.txt -n???
and have my clean file with only the reads that are not presents in the ID list???

Yes, but like I said, the script is not standalone but relies on other qiime stuff so you need to have that installed. If you happen to be on Mac OS X, I highly recommend Macqiime, which is very painless to install..

**flacchy** · 07-25-2013, 06:10 AM

we do have quiime installed into the biolinux platform

**flacchy** · 07-25-2013, 06:16 AM

Oh my THANK YOU so so much rhinocheros it did work!!!!!! ^_^

Topics	Statistics	Last Post
Gene Misexpression in the Healthy Human Population by seqadmin Started by seqadmin, 07-25-2024, 06:46 AM	0 responses 9 views 0 likes	Last Post by seqadmin 07-25-2024, 06:46 AM
New Method for Rapid Genetic Diagnosis of Mendelian Disorders by seqadmin Started by seqadmin, 07-24-2024, 11:09 AM	0 responses 26 views 0 likes	Last Post by seqadmin 07-24-2024, 11:09 AM
Advancing Nanopore Technology for Portable Sensing Devices by seqadmin Started by seqadmin, 07-19-2024, 07:20 AM	0 responses 160 views 0 likes	Last Post by seqadmin 07-19-2024, 07:20 AM
New RNA-Based Gene Writing Technology Achieves Precise Gene Integration by seqadmin Started by seqadmin, 07-16-2024, 05:49 AM	0 responses 127 views 0 likes	Last Post by seqadmin 07-16-2024, 05:49 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News