Announcement

Collapse
No announcement yet.

ABySS input

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • ABySS input

    Hello,
    I'm new to the bioinformatics side of life and started to use ABySS for allignement. I have however an input file which is not in FASTA format.
    For example the lines look like:

    @HWI-EAS313_0005:1:1:1158:9100#0/1
    TCGATAGGCCGTGGACAGNGCTGACCGTAGGGGTGGGCTGNGNNNNNNTANGTACGTGNCTGGGTGTACCGAATANNCNT
    +HWI-EAS313_0005:1:1:1158:9100#0/1
    cbdddccddcece^c_ZbBb``b``cdddddcdad[`Vb^BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB


    with in the second line the sequence and in the fouth line the quality score.
    I wondered whether there is a command line with which ABySS could read this file (and therefore include the quality scores in the analysis as they are quite important) or whether it is easier to make a FASTA file from this txt file. In the latter case does anyone knows a quick way to do that?

    Tanks

  • #2
    My AbySS readme states:
    "in specifies the input files to read, which may be in FASTA, FASTQ, qseq or export format and compressed with gz, bz2 or xz."

    and since your data is fastq, you can pass it directly into AbySS.

    So long

    Comment


    • #3
      If I try to run ABYSS I get the next error:

      error: Expected either `>' or `@' or 11 fields
      and saw `' and 1 fields near
      ??|Ms_7_sequence.txt?[?v?ʎ}??я?z?%yJ?.ža<I?T?T??H??4??툢?????

      What should I do if my data are already in FASTQ?

      Thanks

      Comment


      • #4
        '??|MS_7_sequencetxt?' isn't fastaq - that looks more like a tarred & gziped file - try uncompressing it with tar -xf myfastq.tgz

        Comment


        • #5
          Hi,

          the fiel is a .txt.gz file. Abyss doesn't read the unzipped txt file and gives the abovementioned error if I use the .txt.gz file. I also tought that the extension of the file would be a problem...is there a way to convert the .txt file to a .fasta extension?

          Thanks

          Comment


          • #6
            There's a good chance the file extension is the whole problem - I don't know if Abyss just does file-extension detection or if it actually analysis the first few bytes.

            You can simply rename the file "mv name.txt.gz name.fastq.gz" in a standard shell...

            Comment


            • #7
              I keep getting the same error....although I have changed the extension the way you said (mv name.txt.gz name.fastq.gz)
              the error still is
              error: Expected either `>' or `@' or 11 fields
              and saw `' and 1 fields near
              ??|Ms_7_sequence.txt?[?v?ʎ}??я?z?%yJ?.ža<I?T?T??H??4??툢?????
              Any other ideas? Has anybody else encountered this problem?
              Cheers

              Comment


              • #8
                When you said your files look like
                "@HWI-EAS313_0005:1:1:1158:9100#0/1
                TCGATAGGCCGTGGACAGNGCTGACCGTAGGGGTGGGCTGNGNNNNNNTANGTACGTGNCTGGGTGTACCGAATANNCNT
                +HWI-EAS313_0005:1:1:1158:9100#0/1
                cbdddccddcece^c_ZbBb``b``cdddddcdad[`Vb^BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB"

                how did you view them?

                Comment


                • #9
                  as it was a txt file I opened it as such and viewed it

                  Comment


                  • #10
                    Many editors silently decompress gziped data, but AbySS should be able to read it, at least that's what it says in it's documentation.

                    "??|Ms_7_sequence.txt?[?v?ʎ}??я?z?%yJ?.ža<I?T?T??H??4??툢?????"
                    Doesn't look like
                    "@HWI-EAS313_0005:1:1:1158:9100#0/1" at all though.

                    ok, how about this:
                    do
                    "gzip -cd name.fastq.gz | head"
                    and show me the output.

                    Comment


                    • #11
                      The output is:

                      @HWI-ST538_0098:7:1:13334:2000#NNNNNN/1
                      AGNTCACCAATCTCAACGTGGAGTTCTCCGCTAAGGACCCTTTCTNNCGTCAGTCAACTGTGTGGAAACTTGATGGATCGAGGAAGGAGGGAATTGTCAC
                      +HWI-ST538_0098:7:1:13334:2000#NNNNNN/1
                      X\B\\cccccggggggggggggdggfffffgfgfggfgggbbbbcBB_][][_]_fcgbgbfafeVbadd\ebeeeeeffdffeegfbggfXfdc^Xddd
                      @HWI-ST538_0098:7:1:17263:1999#NNNNNN/1
                      CTNTAAGCAGTGGTATCAACGCAGAGTACGGGGGGGTTCCTCACANNGTTGACGCTCTTTCGTCTACGGGAGAACGCTATAGCTCTGGGGAACATCTAAA
                      +HWI-ST538_0098:7:1:17263:1999#NNNNNN/1
                      XVBWU]^^]Zd`ddb^eeeddeeeeddebebddddc^ZcBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
                      @HWI-ST538_0098:7:1:19073:1999#NNNNNN/1
                      AGNTATTTGCAAAATCTGAAAGAGTTCAAAGGAAACGCTTCTCATNNAGAGAAGAGGAAAGCCATATAAAGATACAACCACGCTCTATATGTCTCCTTTA

                      what does it mean? it is just a part from the file right?

                      Comment


                      • #12
                        Yeah, gzip -d decompresses, and the c says: write to the console, which | passes into head, which gives you the first few lines of a file.

                        So. You have a fastq file - at least in the beginning, but what AbySS reads doesn't look fastq.

                        You can try to uncompress the file first (gzip -d name.fastq.gz - will give you name.fastq, while making name.fastq.gz disappear - gzip name.fastq will do the reverse)), but I suspect the problem somewhere else.

                        How are you calling AbySS (the exact command line)?
                        Is this a paired end run?

                        Comment


                        • #13
                          it is a single end run and the command line I use (exactly as the read me says):
                          ABYSS -k15 name.fastq.gz -o name_contig.fastq

                          Comment


                          • #14
                            Stubborn problem, ain't it ?

                            Your call looks fine, the beginning of your fastq file looks fine.
                            Let's try to invalidate the hypothesis that there's something wrong with it.
                            What do
                            gzip -cd name.fastq.gz | grep "Ms_7_sequence.txt"
                            and gzip -cd name.fastq.gz | tail
                            output?
                            (if you have the file currently uncompressed, you can
                            do 'tail name.fastq' and 'grep "Ms_7_sequence.txt" name.fastq'
                            instead).
                            Last edited by ffinkernagel; 04-01-2011, 04:58 AM. Reason: Apperantly pushed submit before finishing my last sentence

                            Comment


                            • #15
                              Yeah, it quitte someting, but its a good learning upportiunity I'm glad than you want to help me

                              if I used the line
                              gzip -cd name.fastq.gz | tail

                              it gives me:
                              +HWI-ST538_0098:7:66:21272:200778#NNNNNN/1
                              _BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
                              @HWI-ST538_0098:7:66:21298:200868#NNNNNN/1
                              TGNTGGCGGTGGTTTTTGGGGGGGGTGGGGGGTGTTTGGTGGGGGGTTGGGGGGGTGTTTTTTGTGGTTGTTTTGGTTTGGGTGTGGGGTTGGTTGTTGT
                              +HWI-ST538_0098:7:66:21298:200868#NNNNNN/1
                              BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
                              @HWI-ST538_0098:7:66:21274:200885#NNNNNN/1
                              GTNTGGGTGGGTTGGTTGGGGTGTTGGGGTGGGGGTGGCGTTTTCTGGGGAGGGTTGGGGGTTTTGGGTTGTAGGGTGTTGGTTTGGGGTGGAGGGGGTG
                              +HWI-ST538_0098:7:66:21274:200885#NNNNNN/1
                              BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

                              Which looks similar as the heading....so there must be something wrong in the middle maybe the amount of 'BBBBB's? this is a measure of a low quality, right?

                              Comment

                              Working...
                              X