Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Seta
    Member
    • Mar 2011
    • 14

    ABySS input

    Hello,
    I'm new to the bioinformatics side of life and started to use ABySS for allignement. I have however an input file which is not in FASTA format.
    For example the lines look like:

    @HWI-EAS313_0005:1:1:1158:9100#0/1
    TCGATAGGCCGTGGACAGNGCTGACCGTAGGGGTGGGCTGNGNNNNNNTANGTACGTGNCTGGGTGTACCGAATANNCNT
    +HWI-EAS313_0005:1:1:1158:9100#0/1
    cbdddccddcece^c_ZbBb``b``cdddddcdad[`Vb^BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB


    with in the second line the sequence and in the fouth line the quality score.
    I wondered whether there is a command line with which ABySS could read this file (and therefore include the quality scores in the analysis as they are quite important) or whether it is easier to make a FASTA file from this txt file. In the latter case does anyone knows a quick way to do that?

    Tanks
  • ffinkernagel
    Senior Member
    • Oct 2009
    • 110

    #2
    My AbySS readme states:
    "in specifies the input files to read, which may be in FASTA, FASTQ, qseq or export format and compressed with gz, bz2 or xz."

    and since your data is fastq, you can pass it directly into AbySS.

    So long

    Comment

    • Seta
      Member
      • Mar 2011
      • 14

      #3
      If I try to run ABYSS I get the next error:

      error: Expected either `>' or `@' or 11 fields
      and saw `' and 1 fields near
      ??|Ms_7_sequence.txt?[?v?ʎ}??я?z?%yJ?.ža<I?T?T??H??4??툢?????

      What should I do if my data are already in FASTQ?

      Thanks

      Comment

      • ffinkernagel
        Senior Member
        • Oct 2009
        • 110

        #4
        '??|MS_7_sequencetxt?' isn't fastaq - that looks more like a tarred & gziped file - try uncompressing it with tar -xf myfastq.tgz

        Comment

        • Seta
          Member
          • Mar 2011
          • 14

          #5
          Hi,

          the fiel is a .txt.gz file. Abyss doesn't read the unzipped txt file and gives the abovementioned error if I use the .txt.gz file. I also tought that the extension of the file would be a problem...is there a way to convert the .txt file to a .fasta extension?

          Thanks

          Comment

          • ffinkernagel
            Senior Member
            • Oct 2009
            • 110

            #6
            There's a good chance the file extension is the whole problem - I don't know if Abyss just does file-extension detection or if it actually analysis the first few bytes.

            You can simply rename the file "mv name.txt.gz name.fastq.gz" in a standard shell...

            Comment

            • Seta
              Member
              • Mar 2011
              • 14

              #7
              I keep getting the same error....although I have changed the extension the way you said (mv name.txt.gz name.fastq.gz)
              the error still is
              error: Expected either `>' or `@' or 11 fields
              and saw `' and 1 fields near
              ??|Ms_7_sequence.txt?[?v?ʎ}??я?z?%yJ?.ža<I?T?T??H??4??툢?????
              Any other ideas? Has anybody else encountered this problem?
              Cheers

              Comment

              • ffinkernagel
                Senior Member
                • Oct 2009
                • 110

                #8
                When you said your files look like
                "@HWI-EAS313_0005:1:1:1158:9100#0/1
                TCGATAGGCCGTGGACAGNGCTGACCGTAGGGGTGGGCTGNGNNNNNNTANGTACGTGNCTGGGTGTACCGAATANNCNT
                +HWI-EAS313_0005:1:1:1158:9100#0/1
                cbdddccddcece^c_ZbBb``b``cdddddcdad[`Vb^BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB"

                how did you view them?

                Comment

                • Seta
                  Member
                  • Mar 2011
                  • 14

                  #9
                  as it was a txt file I opened it as such and viewed it

                  Comment

                  • ffinkernagel
                    Senior Member
                    • Oct 2009
                    • 110

                    #10
                    Many editors silently decompress gziped data, but AbySS should be able to read it, at least that's what it says in it's documentation.

                    "??|Ms_7_sequence.txt?[?v?ʎ}??я?z?%yJ?.ža<I?T?T??H??4??툢?????"
                    Doesn't look like
                    "@HWI-EAS313_0005:1:1:1158:9100#0/1" at all though.

                    ok, how about this:
                    do
                    "gzip -cd name.fastq.gz | head"
                    and show me the output.

                    Comment

                    • Seta
                      Member
                      • Mar 2011
                      • 14

                      #11
                      The output is:

                      @HWI-ST538_0098:7:1:13334:2000#NNNNNN/1
                      AGNTCACCAATCTCAACGTGGAGTTCTCCGCTAAGGACCCTTTCTNNCGTCAGTCAACTGTGTGGAAACTTGATGGATCGAGGAAGGAGGGAATTGTCAC
                      +HWI-ST538_0098:7:1:13334:2000#NNNNNN/1
                      X\B\\cccccggggggggggggdggfffffgfgfggfgggbbbbcBB_][][_]_fcgbgbfafeVbadd\ebeeeeeffdffeegfbggfXfdc^Xddd
                      @HWI-ST538_0098:7:1:17263:1999#NNNNNN/1
                      CTNTAAGCAGTGGTATCAACGCAGAGTACGGGGGGGTTCCTCACANNGTTGACGCTCTTTCGTCTACGGGAGAACGCTATAGCTCTGGGGAACATCTAAA
                      +HWI-ST538_0098:7:1:17263:1999#NNNNNN/1
                      XVBWU]^^]Zd`ddb^eeeddeeeeddebebddddc^ZcBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
                      @HWI-ST538_0098:7:1:19073:1999#NNNNNN/1
                      AGNTATTTGCAAAATCTGAAAGAGTTCAAAGGAAACGCTTCTCATNNAGAGAAGAGGAAAGCCATATAAAGATACAACCACGCTCTATATGTCTCCTTTA

                      what does it mean? it is just a part from the file right?

                      Comment

                      • ffinkernagel
                        Senior Member
                        • Oct 2009
                        • 110

                        #12
                        Yeah, gzip -d decompresses, and the c says: write to the console, which | passes into head, which gives you the first few lines of a file.

                        So. You have a fastq file - at least in the beginning, but what AbySS reads doesn't look fastq.

                        You can try to uncompress the file first (gzip -d name.fastq.gz - will give you name.fastq, while making name.fastq.gz disappear - gzip name.fastq will do the reverse)), but I suspect the problem somewhere else.

                        How are you calling AbySS (the exact command line)?
                        Is this a paired end run?

                        Comment

                        • Seta
                          Member
                          • Mar 2011
                          • 14

                          #13
                          it is a single end run and the command line I use (exactly as the read me says):
                          ABYSS -k15 name.fastq.gz -o name_contig.fastq

                          Comment

                          • ffinkernagel
                            Senior Member
                            • Oct 2009
                            • 110

                            #14
                            Stubborn problem, ain't it ?

                            Your call looks fine, the beginning of your fastq file looks fine.
                            Let's try to invalidate the hypothesis that there's something wrong with it.
                            What do
                            gzip -cd name.fastq.gz | grep "Ms_7_sequence.txt"
                            and gzip -cd name.fastq.gz | tail
                            output?
                            (if you have the file currently uncompressed, you can
                            do 'tail name.fastq' and 'grep "Ms_7_sequence.txt" name.fastq'
                            instead).
                            Last edited by ffinkernagel; 04-01-2011, 04:58 AM. Reason: Apperantly pushed submit before finishing my last sentence

                            Comment

                            • Seta
                              Member
                              • Mar 2011
                              • 14

                              #15
                              Yeah, it quitte someting, but its a good learning upportiunity I'm glad than you want to help me

                              if I used the line
                              gzip -cd name.fastq.gz | tail

                              it gives me:
                              +HWI-ST538_0098:7:66:21272:200778#NNNNNN/1
                              _BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
                              @HWI-ST538_0098:7:66:21298:200868#NNNNNN/1
                              TGNTGGCGGTGGTTTTTGGGGGGGGTGGGGGGTGTTTGGTGGGGGGTTGGGGGGGTGTTTTTTGTGGTTGTTTTGGTTTGGGTGTGGGGTTGGTTGTTGT
                              +HWI-ST538_0098:7:66:21298:200868#NNNNNN/1
                              BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
                              @HWI-ST538_0098:7:66:21274:200885#NNNNNN/1
                              GTNTGGGTGGGTTGGTTGGGGTGTTGGGGTGGGGGTGGCGTTTTCTGGGGAGGGTTGGGGGTTTTGGGTTGTAGGGTGTTGGTTTGGGGTGGAGGGGGTG
                              +HWI-ST538_0098:7:66:21274:200885#NNNNNN/1
                              BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

                              Which looks similar as the heading....so there must be something wrong in the middle maybe the amount of 'BBBBB's? this is a measure of a low quality, right?

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                by SEQadmin2


                                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                                Here are nine questions we think about, in roughly the order they matter, before...
                                06-18-2026, 07:11 AM
                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Today, 05:37 AM
                              0 responses
                              5 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-26-2026, 11:10 AM
                              0 responses
                              16 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-17-2026, 06:09 AM
                              0 responses
                              50 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-09-2026, 11:58 AM
                              0 responses
                              109 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...