Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • zillur
    Senior Member
    • Sep 2014
    • 106

    Sorting fasta file according to header

    Hi there,
    I have a fasta file like this:
    Code:
    [zillur@genomics filter]$ head new_12.fasta 
    >000000M00365:7:000000000-A48JK:1:1110:10044:9619
    TACGGAGGGTGCAAGCGTTATCCGGAATCACTGGGTTTAAAGGGTGCGTAGGCGGATATATAAGTCAGAGGTGAAAGCTCGCAGCTTAACTGCGGAATTGCCTTTGATACTGTTTATCTTGAATTATGTTGAGGTTAGCGGAATGAGTCAT
    >000000M00365:7:000000000-A48JK:1:2105:14983:8496
    TACGGAGGGGGTTAGCGTTGTTCGGAATTACTGGGCGTAAAGCGTACGTAGGCGGATTGGAAAGTATGGGGTGAAATCCCAGGGCTCAACCCTGGAACTGCCCTGTAAACTATCAGTCTAGAGTTCTGGAGAGGTGAGTGGAATTGCTAGG
    >000000M00365:7:000000000-A48JK:1:2113:12381:28279
    TACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCGCGTAGGCGGTTTGATAAGTCAGATGTGAAATCCCCGGGCTTAACCTGGGAACTGCATTTGATACTGTCAGACTAGAGTATGTTAGAGGAATGCGGAATTCCGGGT
    >000001M00365:7:000000000-A48JK:1:1110:15899:9619
    TACGAACTGTGCAAACGTTATTCGGAATCACTGGGCTTAAAGGGTGCGTAGGCGGGTTTGTAAGTCAGAGGTGAAAGTTTGCAGCTTAACTGTAAAATTGCCTTTGAAACTGTAGAACTTGAGTAGCGTTGAGGTCAGCGGAATGTGACAT
    >000001M00365:7:000000000-A48JK:1:2105:15157:8497
    TACGAAGGTCCCAAGCGTTATTCGGAATCACTGGGCGTAAAGGGAGCGTAGGCGGCGTGGAAAGTCAGATGTGAAATCTCAAGGCTCAACCTTGAAACTGCATCCGATACTTCCATGCTAGAGGACTGGAGAGGTGTTTGGAATTATCGGT
    I want to sort this file according to header informations. How can I do this?

    Best Regards
    Zillur
  • wdecoster
    Member
    • Oct 2015
    • 97

    #2
    Can you be more specific about which header information? Alphabetical sorting?

    Comment

    • zillur
      Senior Member
      • Sep 2014
      • 106

      #3
      Thank you very much. alphabetically/numerically whichever convenient.

      Best Regards
      Zillur

      Comment

      • GenoMax
        Senior Member
        • Feb 2008
        • 7142

        #4
        And the reason you want to do this, if I may ask?

        Comment

        • zillur
          Senior Member
          • Sep 2014
          • 106

          #5
          Thanks.
          And the reason you want to do this, if I may ask?
          Yeah sure. I wanted to create fastq file using my .qual ahd fasta file using qiime. But it gaves me:
          Code:
          KeyError: 'QUAL header (M00365:7:000000000-A48JK:1:1101:14885:1320) does not match FASTA header (M00365:7:000000000-A48JK:1:1101:16466:1388)
          In my qual file I have many other sequences including my fasta. So, I think sorting may resolve the issue. I appreciate your suggestions.

          Best Regards
          Zillur

          Comment

          • Persistent LABS
            Member
            • Apr 2016
            • 21

            #6
            I guess sort on linux will work.
            cat file.fasta|paste - -|sort|sed 's/\t/\n/g'
            Try this.
            Persistent LABS

            Comment

            • GenoMax
              Senior Member
              • Feb 2008
              • 7142

              #7
              Following is untested but you could give it a try and see if it works. It may avoid the sort etc. You will find reformat.sh in BBMap suite.

              Code:
              reformat.sh in=your_fasta_file.fa qfin=your_qual_file.qual out=fastq_format_file.fq

              Comment

              • zillur
                Senior Member
                • Sep 2014
                • 106

                #8
                Thank your very much. I have tried this:
                cat file.fasta|paste - -|sort|sed 's/\t/\n/g'
                But it doesn't resolve all:
                Code:
                (qiime191) [zillur@genomics final]$ head new_sorted_1.fasta 
                >M00365:7:000000000-A48JK:1:1101:10000:14343
                TACGGAGGGGGCTAGCGTTGTTCGGAATTACTGGGCGTAAAGCGTGCGTAGGCGGATTATTAAGTTAGGGGTGAAATCCCGAGGCTCAACCTCGGAACTGCCCTTAAAACTGTTGGTCTTGAGTTCTGGAGAGGTGAGTGGAATTGCTAGT
                >M00365:7:000000000-A48JK:1:1101:10000:18084
                TACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCGCGTAGGCGGCTAGGTCAGTCAGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATTTGATACTGCCTAGCTAGAGTATGTTAGAGGAATGCGGAATTCCAGGT
                >M00365:7:000000000-A48JK:1:1101:10000:25105
                TACGAAGGGGGCTAGCGTTGTTCGGAATTACTGGGCGTAAAGAGTTCGTAGGCGGGTTATTAAGTCAGATGTGAAATCCCAGGGCTCAACCTTGGAACTGCATTTGAAACTGGTAACCTAGAGACTAGGAGAGGTCAGTGGAATACCGAGT
                >M00365:7:000000000-A48JK:1:1101:10000:5055
                CACGTAGGGGGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGGGCTCGTAGGCTGTTCAGTAAGTCAGGTGTGAAAATCCAAGGCTCAACCTTGGGACGCCACCTGATACCGCTGTGACTAGAGTCCGGTAGAGGAGATTGGAATTCCTGG
                >M00365:7:000000000-A48JK:1:1101:10001:16084
                TACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCGCGTAGGCGGCTAGGTCAGTCAGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATTTGATACTGCCTAGCTAGAGTATGTTAGAGGATTGCGGAATTCCAGGT
                refomart.sh gives me:
                Code:
                [zillur@genomics final]$ ./bbmap/reformat.sh in=new_15.fasta qfin=qual_.1.qual out=f_nw_15_ql_.1.fq
                java -ea -Xmx111g -cp /home/zillur/Desktop/zillur/yadira/study_1799_split_library_seqs_and_mapping/filter/final/bbmap/current/ jgi.ReformatReads in=new_15.fasta qfin=qual_.1.qual out=f_nw_15_ql_.1.fq
                Executing jgi.ReformatReads [in=new_15.fasta, qfin=qual_.1.qual, out=f_nw_15_ql_.1.fq]
                
                Input is being processed as unpaired
                Exception in thread "Thread-1" java.lang.AssertionError: Quality and Base headers differ for read 0
                	at stream.FastaQualReadInputStream.toReadList(FastaQualReadInputStream.java:128)
                	at stream.FastaQualReadInputStream.toReads(FastaQualReadInputStream.java:110)
                	at stream.FastaQualReadInputStream.fillBuffer(FastaQualReadInputStream.java:94)
                	at stream.FastaQualReadInputStream.hasMore(FastaQualReadInputStream.java:54)
                	at stream.ConcurrentGenericReadInputStream$ReadThread.readLists(ConcurrentGenericReadInputStream.java:643)
                	at stream.ConcurrentGenericReadInputStream$ReadThread.run(ConcurrentGenericReadInputStream.java:635)
                What should I do now?

                Best Regards
                Zillur

                Comment

                • khericlim
                  Junior Member
                  • Oct 2016
                  • 2

                  #9
                  When you sort the fasta file, did you also sort the qual file?

                  Originally posted by zillur View Post
                  In my qual file I have many other sequences including my fasta.
                  What do you mean by having other sequences in your qual file?

                  Comment

                  • dgscofield
                    Member
                    • Nov 2010
                    • 28

                    #10
                    If you have BioPerl ≥ 1.6.922 and Sort::Naturally, then

                    Various bioinformatics tools. Contribute to douglasgscofield/bioinfo development by creating an account on GitHub.


                    shows how to sort on sequence name, using natural sort as it seems you require.

                    Comment

                    • Persistent LABS
                      Member
                      • Apr 2016
                      • 21

                      #11
                      Originally posted by zillur View Post
                      Thank your very much. I have tried this: But it doesn't resolve all:
                      Code:
                      (qiime191) [zillur@genomics final]$ head new_sorted_1.fasta 
                      >M00365:7:000000000-A48JK:1:1101:10000:14343
                      TACGGAGGGGGCTAGCGTTGTTCGGAATTACTGGGCGTAAAGCGTGCGTAGGCGGATTATTAAGTTAGGGGTGAAATCCCGAGGCTCAACCTCGGAACTGCCCTTAAAACTGTTGGTCTTGAGTTCTGGAGAGGTGAGTGGAATTGCTAGT
                      >M00365:7:000000000-A48JK:1:1101:10000:18084
                      TACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCGCGTAGGCGGCTAGGTCAGTCAGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATTTGATACTGCCTAGCTAGAGTATGTTAGAGGAATGCGGAATTCCAGGT
                      >M00365:7:000000000-A48JK:1:1101:10000:25105
                      TACGAAGGGGGCTAGCGTTGTTCGGAATTACTGGGCGTAAAGAGTTCGTAGGCGGGTTATTAAGTCAGATGTGAAATCCCAGGGCTCAACCTTGGAACTGCATTTGAAACTGGTAACCTAGAGACTAGGAGAGGTCAGTGGAATACCGAGT
                      >M00365:7:000000000-A48JK:1:1101:10000:5055
                      CACGTAGGGGGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGGGCTCGTAGGCTGTTCAGTAAGTCAGGTGTGAAAATCCAAGGCTCAACCTTGGGACGCCACCTGATACCGCTGTGACTAGAGTCCGGTAGAGGAGATTGGAATTCCTGG
                      >M00365:7:000000000-A48JK:1:1101:10001:16084
                      TACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCGCGTAGGCGGCTAGGTCAGTCAGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATTTGATACTGCCTAGCTAGAGTATGTTAGAGGATTGCGGAATTCCAGGT
                      refomart.sh gives me:
                      Code:
                      [zillur@genomics final]$ ./bbmap/reformat.sh in=new_15.fasta qfin=qual_.1.qual out=f_nw_15_ql_.1.fq
                      java -ea -Xmx111g -cp /home/zillur/Desktop/zillur/yadira/study_1799_split_library_seqs_and_mapping/filter/final/bbmap/current/ jgi.ReformatReads in=new_15.fasta qfin=qual_.1.qual out=f_nw_15_ql_.1.fq
                      Executing jgi.ReformatReads [in=new_15.fasta, qfin=qual_.1.qual, out=f_nw_15_ql_.1.fq]
                      
                      Input is being processed as unpaired
                      Exception in thread "Thread-1" java.lang.AssertionError: Quality and Base headers differ for read 0
                      	at stream.FastaQualReadInputStream.toReadList(FastaQualReadInputStream.java:128)
                      	at stream.FastaQualReadInputStream.toReads(FastaQualReadInputStream.java:110)
                      	at stream.FastaQualReadInputStream.fillBuffer(FastaQualReadInputStream.java:94)
                      	at stream.FastaQualReadInputStream.hasMore(FastaQualReadInputStream.java:54)
                      	at stream.ConcurrentGenericReadInputStream$ReadThread.readLists(ConcurrentGenericReadInputStream.java:643)
                      	at stream.ConcurrentGenericReadInputStream$ReadThread.run(ConcurrentGenericReadInputStream.java:635)
                      What should I do now?

                      Best Regards
                      Zillur
                      The sort example has sorted your data alphabetically. If you try to sort your qual file, I think you will get the same order of headers.
                      Persistent LABS

                      Comment

                      Latest Articles

                      Collapse

                      • SEQadmin2
                        Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                        by SEQadmin2


                        I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


                        Here are nine questions we think about, in roughly the order they matter, before...
                        06-18-2026, 07:11 AM
                      • SEQadmin2
                        From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                        by SEQadmin2


                        Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                        The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                        ...
                        06-02-2026, 10:05 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by SEQadmin2, 06-17-2026, 06:09 AM
                      0 responses
                      26 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-09-2026, 11:58 AM
                      0 responses
                      43 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-05-2026, 10:09 AM
                      0 responses
                      48 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-04-2026, 08:59 AM
                      0 responses
                      49 views
                      0 reactions
                      Last Post SEQadmin2  
                      Working...