Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Edit: I should have refreshed before posting, Maubp's suggestion is probably easier!

    You might be able to just use grep (grep -v -f ids.file ...), though you might have to collapse the read name and sequence onto one line (with awk) first, then pipe that into grep and pipe the output back to awk to split things back again.
    Last edited by dpryan; 07-25-2013, 01:34 AM. Reason: Too slow

    Comment


    • #17
      thank you ...
      but could you please be a little more specific ... I am very new to this field and I am still learning... also I have to consider if using the splitted files or if I can use my big file (which contain ~20 mil reads)...

      Thanks in advance for every advice and suggestion

      Comment


      • #18
        so grep is not working..or at least not in this case... so I'll try to explain better...

        1. I have a fasta file splitted (each file contain 3 mil reads) with line like this:
        >D3P26HQ1:180:c0yj8acxx:4:2208:3279:56003 1:N:0:TGTCAA
        CCTCACCAGCCGCACGAACACGCCCCCGCTGAGCAAGCATCCCGTGGCGTCAGCGGATGAGCGACGCGGAGACAGCACCTGACCCATGTTGATGTAGTGT
        >D3P26HQ1:180:c0yj8acxx:4:1108:21179:84973 1:N:0:AGACCA
        CACGTCTGAACTCCAGTCACACAGTGATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAGAAAGAAAGAAGAAACAGTGGGAGAGTGGGGGGGACGGAG
        >D3P26HQ1:180:c0yj8acxx:4:2103:16692:4396 1:N:0:TGTCAA
        GATCGGAAGAGCACACGTCTGAACTCCAGTCACAGTCAAATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAACAACACCAAAAAGGTGAAGAGATCGATA
        >D3P26HQ1:180:c0yj8acxx:4:2307:9878:25361 1:N:0:GGCTAA
        ACAGCAATAACTGTGCCGCCATCGTCAGAATATTGGCGGGCGATTTTCATGATTTGAATTTTGTGACGAATATCTAAGCTTGAGATTGGCTAGATCTGAA

        2. a txt file with ID's of reads I want to remove
        D3P26HQ1:180:c0yj8acxx:4:2316:9843:31035
        D3P26HQ1:180:c0yj8acxx:4:2316:9844:63006
        D3P26HQ1:180:c0yj8acxx:4:2316:9885:5144
        D3P26HQ1:180:c0yj8acxx:4:2316:9888:45894
        D3P26HQ1:180:c0yj8acxx:4:2316:9914:29032

        What I want to do is remove all the reads that have the ID's presents into the txt file.

        Can anyone help me sort this out???

        thanks!!!!!

        Comment


        • #19
          It'll be easier for you to use the python script that maubp linked to.

          Comment


          • #20
            I usually use filter_fasta.py from QIIME for this purpose..
            savetherhino.org

            Comment


            • #21
              Thanks rhinoceros... that sounds nice.. but instead of keeping the sequence in the list can I just discard them?? or create two files one with the kept reads and one with the discarded???

              filter_fasta.py -f inseqs.fasta -o list_filtered_seqs.fasta -s seqs_to_keep.txt

              Comment


              • #22
                Originally posted by flacchy View Post
                Thanks rhinoceros... that sounds nice.. but instead of keeping the sequence in the list can I just discard them?? or create two files one with the kept reads and one with the discarded???

                filter_fasta.py -f inseqs.fasta -o list_filtered_seqs.fasta -s seqs_to_keep.txt
                filter_fasta.py -f inseqs.fasta -o list_filtered_seqs.fasta -s seqs_to_keep.txt
                filter_fasta.py -f inseqs.fasta -o list_filtered_seqs.fasta -s seqs_to_remove.txt -n


                Note, it's not a standalone solution but has dependencies so you need to have qiime installed (which is highly recommended because there's a ton of other useful stuff too)..
                Last edited by rhinoceros; 07-25-2013, 06:06 AM.
                savetherhino.org

                Comment


                • #23
                  Thanks so so much... I'll try ... just one more thing... I know I am a bit of a pain but I've seriously started 3 months ago and there are tons of things I need to learn...

                  could I only use this: filter_fasta.py -f inseqs.fasta -o list_filtered_seqs.fasta -s seqs_to_remove.txt -n???
                  and have my clean file with only the reads that are not presents in the ID list???

                  Comment


                  • #24
                    Originally posted by flacchy View Post
                    Thanks so so much... I'll try ... just one more thing... I know I am a bit of a pain but I've seriously started 3 months ago and there are tons of things I need to learn...

                    could I only use this: filter_fasta.py -f inseqs.fasta -o list_filtered_seqs.fasta -s seqs_to_remove.txt -n???
                    and have my clean file with only the reads that are not presents in the ID list???
                    Yes, but like I said, the script is not standalone but relies on other qiime stuff so you need to have that installed. If you happen to be on Mac OS X, I highly recommend Macqiime, which is very painless to install..
                    savetherhino.org

                    Comment


                    • #25
                      we do have quiime installed into the biolinux platform

                      Comment


                      • #26
                        Oh my THANK YOU so so much rhinocheros it did work!!!!!! ^_^

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM
                        • seqadmin
                          Techniques and Challenges in Conservation Genomics
                          by seqadmin



                          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                          Avian Conservation
                          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                          03-08-2024, 10:41 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, Yesterday, 06:37 PM
                        0 responses
                        7 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, Yesterday, 06:07 PM
                        0 responses
                        7 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-22-2024, 10:03 AM
                        0 responses
                        49 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-21-2024, 07:32 AM
                        0 responses
                        66 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X