Seqanswers Leaderboard Ad

**dpryan** · 07-25-2013, 01:33 AM

Edit: I should have refreshed before posting, Maubp's suggestion is probably easier!

You might be able to just use grep (grep -v -f ids.file ...), though you might have to collapse the read name and sequence onto one line (with awk) first, then pipe that into grep and pipe the output back to awk to split things back again.

**flacchy** · 07-25-2013, 01:36 AM

thank you ...
but could you please be a little more specific ... I am very new to this field and I am still learning... also I have to consider if using the splitted files or if I can use my big file (which contain ~20 mil reads)...

Thanks in advance for every advice and suggestion

**flacchy** · 07-25-2013, 03:11 AM

so grep is not working..or at least not in this case... so I'll try to explain better...

1. I have a fasta file splitted (each file contain 3 mil reads) with line like this:
>D3P26HQ1:180:c0yj8acxx:4:2208:3279:56003 1:N:0:TGTCAA
CCTCACCAGCCGCACGAACACGCCCCCGCTGAGCAAGCATCCCGTGGCGTCAGCGGATGAGCGACGCGGAGACAGCACCTGACCCATGTTGATGTAGTGT
>D3P26HQ1:180:c0yj8acxx:4:1108:21179:84973 1:N:0:AGACCA
CACGTCTGAACTCCAGTCACACAGTGATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAGAAAGAAAGAAGAAACAGTGGGAGAGTGGGGGGGACGGAG
>D3P26HQ1:180:c0yj8acxx:4:2103:16692:4396 1:N:0:TGTCAA
GATCGGAAGAGCACACGTCTGAACTCCAGTCACAGTCAAATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAACAACACCAAAAAGGTGAAGAGATCGATA
>D3P26HQ1:180:c0yj8acxx:4:2307:9878:25361 1:N:0:GGCTAA
ACAGCAATAACTGTGCCGCCATCGTCAGAATATTGGCGGGCGATTTTCATGATTTGAATTTTGTGACGAATATCTAAGCTTGAGATTGGCTAGATCTGAA

2. a txt file with ID's of reads I want to remove
D3P26HQ1:180:c0yj8acxx:4:2316:9843:31035
D3P26HQ1:180:c0yj8acxx:4:2316:9844:63006
D3P26HQ1:180:c0yj8acxx:4:2316:9885:5144
D3P26HQ1:180:c0yj8acxx:4:2316:9888:45894
D3P26HQ1:180:c0yj8acxx:4:2316:9914:29032

What I want to do is remove all the reads that have the ID's presents into the txt file.

Can anyone help me sort this out???

thanks!!!!!

**dpryan** · 07-25-2013, 03:58 AM

It'll be easier for you to use the python script that maubp linked to.

**rhinoceros** · 07-25-2013, 05:57 AM

I usually use filter_fasta.py from QIIME for this purpose..

**flacchy** · 07-25-2013, 06:02 AM

Thanks rhinoceros... that sounds nice.. but instead of keeping the sequence in the list can I just discard them?? or create two files one with the kept reads and one with the discarded???

filter_fasta.py -f inseqs.fasta -o list_filtered_seqs.fasta -s seqs_to_keep.txt

**rhinoceros** · 07-25-2013, 06:04 AM

Originally posted by flacchy View Post

Thanks rhinoceros... that sounds nice.. but instead of keeping the sequence in the list can I just discard them?? or create two files one with the kept reads and one with the discarded???

filter_fasta.py -f inseqs.fasta -o list_filtered_seqs.fasta -s seqs_to_keep.txt

filter_fasta.py -f inseqs.fasta -o list_filtered_seqs.fasta -s seqs_to_keep.txt
filter_fasta.py -f inseqs.fasta -o list_filtered_seqs.fasta -s seqs_to_remove.txt -n

Note, it's not a standalone solution but has dependencies so you need to have qiime installed (which is highly recommended because there's a ton of other useful stuff too)..

**flacchy** · 07-25-2013, 06:07 AM

Thanks so so much... I'll try ... just one more thing... I know I am a bit of a pain but I've seriously started 3 months ago and there are tons of things I need to learn...

could I only use this: filter_fasta.py -f inseqs.fasta -o list_filtered_seqs.fasta -s seqs_to_remove.txt -n???
and have my clean file with only the reads that are not presents in the ID list???

**rhinoceros** · 07-25-2013, 06:09 AM

Originally posted by flacchy View Post

Thanks so so much... I'll try ... just one more thing... I know I am a bit of a pain but I've seriously started 3 months ago and there are tons of things I need to learn...

could I only use this: filter_fasta.py -f inseqs.fasta -o list_filtered_seqs.fasta -s seqs_to_remove.txt -n???
and have my clean file with only the reads that are not presents in the ID list???

Yes, but like I said, the script is not standalone but relies on other qiime stuff so you need to have that installed. If you happen to be on Mac OS X, I highly recommend Macqiime, which is very painless to install..

**flacchy** · 07-25-2013, 06:10 AM

we do have quiime installed into the biolinux platform

**flacchy** · 07-25-2013, 06:16 AM

Oh my THANK YOU so so much rhinocheros it did work!!!!!! ^_^

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 17 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News