Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • ABySS input

    Hello,
    I'm new to the bioinformatics side of life and started to use ABySS for allignement. I have however an input file which is not in FASTA format.
    For example the lines look like:

    @HWI-EAS313_0005:1:1:1158:9100#0/1
    TCGATAGGCCGTGGACAGNGCTGACCGTAGGGGTGGGCTGNGNNNNNNTANGTACGTGNCTGGGTGTACCGAATANNCNT
    +HWI-EAS313_0005:1:1:1158:9100#0/1
    cbdddccddcece^c_ZbBb``b``cdddddcdad[`Vb^BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB


    with in the second line the sequence and in the fouth line the quality score.
    I wondered whether there is a command line with which ABySS could read this file (and therefore include the quality scores in the analysis as they are quite important) or whether it is easier to make a FASTA file from this txt file. In the latter case does anyone knows a quick way to do that?

    Tanks

  • #2
    My AbySS readme states:
    "in specifies the input files to read, which may be in FASTA, FASTQ, qseq or export format and compressed with gz, bz2 or xz."

    and since your data is fastq, you can pass it directly into AbySS.

    So long

    Comment


    • #3
      If I try to run ABYSS I get the next error:

      error: Expected either `>' or `@' or 11 fields
      and saw `' and 1 fields near
      ??|Ms_7_sequence.txt?[?v?ʎ}??я?z?%yJ?.ža<I?T?T??H??4??툢?????

      What should I do if my data are already in FASTQ?

      Thanks

      Comment


      • #4
        '??|MS_7_sequencetxt?' isn't fastaq - that looks more like a tarred & gziped file - try uncompressing it with tar -xf myfastq.tgz

        Comment


        • #5
          Hi,

          the fiel is a .txt.gz file. Abyss doesn't read the unzipped txt file and gives the abovementioned error if I use the .txt.gz file. I also tought that the extension of the file would be a problem...is there a way to convert the .txt file to a .fasta extension?

          Thanks

          Comment


          • #6
            There's a good chance the file extension is the whole problem - I don't know if Abyss just does file-extension detection or if it actually analysis the first few bytes.

            You can simply rename the file "mv name.txt.gz name.fastq.gz" in a standard shell...

            Comment


            • #7
              I keep getting the same error....although I have changed the extension the way you said (mv name.txt.gz name.fastq.gz)
              the error still is
              error: Expected either `>' or `@' or 11 fields
              and saw `' and 1 fields near
              ??|Ms_7_sequence.txt?[?v?ʎ}??я?z?%yJ?.ža<I?T?T??H??4??툢?????
              Any other ideas? Has anybody else encountered this problem?
              Cheers

              Comment


              • #8
                When you said your files look like
                "@HWI-EAS313_0005:1:1:1158:9100#0/1
                TCGATAGGCCGTGGACAGNGCTGACCGTAGGGGTGGGCTGNGNNNNNNTANGTACGTGNCTGGGTGTACCGAATANNCNT
                +HWI-EAS313_0005:1:1:1158:9100#0/1
                cbdddccddcece^c_ZbBb``b``cdddddcdad[`Vb^BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB"

                how did you view them?

                Comment


                • #9
                  as it was a txt file I opened it as such and viewed it

                  Comment


                  • #10
                    Many editors silently decompress gziped data, but AbySS should be able to read it, at least that's what it says in it's documentation.

                    "??|Ms_7_sequence.txt?[?v?ʎ}??я?z?%yJ?.ža<I?T?T??H??4??툢?????"
                    Doesn't look like
                    "@HWI-EAS313_0005:1:1:1158:9100#0/1" at all though.

                    ok, how about this:
                    do
                    "gzip -cd name.fastq.gz | head"
                    and show me the output.

                    Comment


                    • #11
                      The output is:

                      @HWI-ST538_0098:7:1:13334:2000#NNNNNN/1
                      AGNTCACCAATCTCAACGTGGAGTTCTCCGCTAAGGACCCTTTCTNNCGTCAGTCAACTGTGTGGAAACTTGATGGATCGAGGAAGGAGGGAATTGTCAC
                      +HWI-ST538_0098:7:1:13334:2000#NNNNNN/1
                      X\B\\cccccggggggggggggdggfffffgfgfggfgggbbbbcBB_][][_]_fcgbgbfafeVbadd\ebeeeeeffdffeegfbggfXfdc^Xddd
                      @HWI-ST538_0098:7:1:17263:1999#NNNNNN/1
                      CTNTAAGCAGTGGTATCAACGCAGAGTACGGGGGGGTTCCTCACANNGTTGACGCTCTTTCGTCTACGGGAGAACGCTATAGCTCTGGGGAACATCTAAA
                      +HWI-ST538_0098:7:1:17263:1999#NNNNNN/1
                      XVBWU]^^]Zd`ddb^eeeddeeeeddebebddddc^ZcBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
                      @HWI-ST538_0098:7:1:19073:1999#NNNNNN/1
                      AGNTATTTGCAAAATCTGAAAGAGTTCAAAGGAAACGCTTCTCATNNAGAGAAGAGGAAAGCCATATAAAGATACAACCACGCTCTATATGTCTCCTTTA

                      what does it mean? it is just a part from the file right?

                      Comment


                      • #12
                        Yeah, gzip -d decompresses, and the c says: write to the console, which | passes into head, which gives you the first few lines of a file.

                        So. You have a fastq file - at least in the beginning, but what AbySS reads doesn't look fastq.

                        You can try to uncompress the file first (gzip -d name.fastq.gz - will give you name.fastq, while making name.fastq.gz disappear - gzip name.fastq will do the reverse)), but I suspect the problem somewhere else.

                        How are you calling AbySS (the exact command line)?
                        Is this a paired end run?

                        Comment


                        • #13
                          it is a single end run and the command line I use (exactly as the read me says):
                          ABYSS -k15 name.fastq.gz -o name_contig.fastq

                          Comment


                          • #14
                            Stubborn problem, ain't it ?

                            Your call looks fine, the beginning of your fastq file looks fine.
                            Let's try to invalidate the hypothesis that there's something wrong with it.
                            What do
                            gzip -cd name.fastq.gz | grep "Ms_7_sequence.txt"
                            and gzip -cd name.fastq.gz | tail
                            output?
                            (if you have the file currently uncompressed, you can
                            do 'tail name.fastq' and 'grep "Ms_7_sequence.txt" name.fastq'
                            instead).
                            Last edited by ffinkernagel; 04-01-2011, 04:58 AM. Reason: Apperantly pushed submit before finishing my last sentence

                            Comment


                            • #15
                              Yeah, it quitte someting, but its a good learning upportiunity I'm glad than you want to help me

                              if I used the line
                              gzip -cd name.fastq.gz | tail

                              it gives me:
                              +HWI-ST538_0098:7:66:21272:200778#NNNNNN/1
                              _BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
                              @HWI-ST538_0098:7:66:21298:200868#NNNNNN/1
                              TGNTGGCGGTGGTTTTTGGGGGGGGTGGGGGGTGTTTGGTGGGGGGTTGGGGGGGTGTTTTTTGTGGTTGTTTTGGTTTGGGTGTGGGGTTGGTTGTTGT
                              +HWI-ST538_0098:7:66:21298:200868#NNNNNN/1
                              BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
                              @HWI-ST538_0098:7:66:21274:200885#NNNNNN/1
                              GTNTGGGTGGGTTGGTTGGGGTGTTGGGGTGGGGGTGGCGTTTTCTGGGGAGGGTTGGGGGTTTTGGGTTGTAGGGTGTTGGTTTGGGGTGGAGGGGGTG
                              +HWI-ST538_0098:7:66:21274:200885#NNNNNN/1
                              BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

                              Which looks similar as the heading....so there must be something wrong in the middle maybe the amount of 'BBBBB's? this is a measure of a low quality, right?

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              7 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              7 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              66 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X