Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Edit: I should have refreshed before posting, Maubp's suggestion is probably easier!

    You might be able to just use grep (grep -v -f ids.file ...), though you might have to collapse the read name and sequence onto one line (with awk) first, then pipe that into grep and pipe the output back to awk to split things back again.
    Last edited by dpryan; 07-25-2013, 01:34 AM. Reason: Too slow

    Comment


    • #17
      thank you ...
      but could you please be a little more specific ... I am very new to this field and I am still learning... also I have to consider if using the splitted files or if I can use my big file (which contain ~20 mil reads)...

      Thanks in advance for every advice and suggestion

      Comment


      • #18
        so grep is not working..or at least not in this case... so I'll try to explain better...

        1. I have a fasta file splitted (each file contain 3 mil reads) with line like this:
        >D3P26HQ1:180:c0yj8acxx:4:2208:3279:56003 1:N:0:TGTCAA
        CCTCACCAGCCGCACGAACACGCCCCCGCTGAGCAAGCATCCCGTGGCGTCAGCGGATGAGCGACGCGGAGACAGCACCTGACCCATGTTGATGTAGTGT
        >D3P26HQ1:180:c0yj8acxx:4:1108:21179:84973 1:N:0:AGACCA
        CACGTCTGAACTCCAGTCACACAGTGATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAGAAAGAAAGAAGAAACAGTGGGAGAGTGGGGGGGACGGAG
        >D3P26HQ1:180:c0yj8acxx:4:2103:16692:4396 1:N:0:TGTCAA
        GATCGGAAGAGCACACGTCTGAACTCCAGTCACAGTCAAATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAACAACACCAAAAAGGTGAAGAGATCGATA
        >D3P26HQ1:180:c0yj8acxx:4:2307:9878:25361 1:N:0:GGCTAA
        ACAGCAATAACTGTGCCGCCATCGTCAGAATATTGGCGGGCGATTTTCATGATTTGAATTTTGTGACGAATATCTAAGCTTGAGATTGGCTAGATCTGAA

        2. a txt file with ID's of reads I want to remove
        D3P26HQ1:180:c0yj8acxx:4:2316:9843:31035
        D3P26HQ1:180:c0yj8acxx:4:2316:9844:63006
        D3P26HQ1:180:c0yj8acxx:4:2316:9885:5144
        D3P26HQ1:180:c0yj8acxx:4:2316:9888:45894
        D3P26HQ1:180:c0yj8acxx:4:2316:9914:29032

        What I want to do is remove all the reads that have the ID's presents into the txt file.

        Can anyone help me sort this out???

        thanks!!!!!

        Comment


        • #19
          It'll be easier for you to use the python script that maubp linked to.

          Comment


          • #20
            I usually use filter_fasta.py from QIIME for this purpose..
            savetherhino.org

            Comment


            • #21
              Thanks rhinoceros... that sounds nice.. but instead of keeping the sequence in the list can I just discard them?? or create two files one with the kept reads and one with the discarded???

              filter_fasta.py -f inseqs.fasta -o list_filtered_seqs.fasta -s seqs_to_keep.txt

              Comment


              • #22
                Originally posted by flacchy View Post
                Thanks rhinoceros... that sounds nice.. but instead of keeping the sequence in the list can I just discard them?? or create two files one with the kept reads and one with the discarded???

                filter_fasta.py -f inseqs.fasta -o list_filtered_seqs.fasta -s seqs_to_keep.txt
                filter_fasta.py -f inseqs.fasta -o list_filtered_seqs.fasta -s seqs_to_keep.txt
                filter_fasta.py -f inseqs.fasta -o list_filtered_seqs.fasta -s seqs_to_remove.txt -n


                Note, it's not a standalone solution but has dependencies so you need to have qiime installed (which is highly recommended because there's a ton of other useful stuff too)..
                Last edited by rhinoceros; 07-25-2013, 06:06 AM.
                savetherhino.org

                Comment


                • #23
                  Thanks so so much... I'll try ... just one more thing... I know I am a bit of a pain but I've seriously started 3 months ago and there are tons of things I need to learn...

                  could I only use this: filter_fasta.py -f inseqs.fasta -o list_filtered_seqs.fasta -s seqs_to_remove.txt -n???
                  and have my clean file with only the reads that are not presents in the ID list???

                  Comment


                  • #24
                    Originally posted by flacchy View Post
                    Thanks so so much... I'll try ... just one more thing... I know I am a bit of a pain but I've seriously started 3 months ago and there are tons of things I need to learn...

                    could I only use this: filter_fasta.py -f inseqs.fasta -o list_filtered_seqs.fasta -s seqs_to_remove.txt -n???
                    and have my clean file with only the reads that are not presents in the ID list???
                    Yes, but like I said, the script is not standalone but relies on other qiime stuff so you need to have that installed. If you happen to be on Mac OS X, I highly recommend Macqiime, which is very painless to install..
                    savetherhino.org

                    Comment


                    • #25
                      we do have quiime installed into the biolinux platform

                      Comment


                      • #26
                        Oh my THANK YOU so so much rhinocheros it did work!!!!!! ^_^

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Latest Developments in Precision Medicine
                          by seqadmin



                          Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                          Somatic Genomics
                          “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                          05-24-2024, 01:16 PM
                        • seqadmin
                          Recent Advances in Sequencing Analysis Tools
                          by seqadmin


                          The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                          05-06-2024, 07:48 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 05-24-2024, 07:15 AM
                        0 responses
                        16 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 05-23-2024, 10:28 AM
                        0 responses
                        18 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 05-23-2024, 07:35 AM
                        0 responses
                        22 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 05-22-2024, 02:06 PM
                        0 responses
                        11 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X