Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Any idea what sequence format this is?

    Hi Everyone,

    Apologies for the total newbie question.

    I am a biologist (trying to learn some bioinformatics) and have been given some sequencing files from a collaborator - but can not work out what format they are in.

    I have pasted a small part of the file below - does this format look familiar to anyone?

    Thanks in advance
    Liz


    D44TDFP1 1 8 2315 8836 55092 CGATGT 1 GGGGAAAACAACATTAAAACACGCATCTTCAGCTCCTCAAATTTTCTGGTTGCATAGAGAACCAAAAAAAAAAGA ^__c\cR`abcZc]`df]cedRb[eccbe[XIIY^feZX^^^ffhaIXaecLMHN\\bH\bedeZU\dBBBBBBB 2 20978416 R 39C33A1 285 601 -168 F Y
    D44TDFP1 1 8 2315 8772 55102 CGATGT 1 GGCCAGCTTGCTTTGGCGTTTTGTTTCTGGTTGAACGCTATCAGACTGCAGCAGACCATCCATATAGTATCCATA ^JYa^aJ``cWb`dYZbdH[[bRb_bSP^]IYHOY[^^^PGINac[\HM\H\_Q_QGHH\\```\ZHHMZZ^]]` 13 23066392 F 27C47 324 526 5200 R N
    D44TDFP1 1 8 2315 8827 55104 CGATGT 1 GTGGCATTCCTGCTCAAGGTCTGTGATCTGCTTGTCTTTTTGTTTGGCTTCAGCAAAGCAACACTCCAGCTGACC _^_c`cccge]eegfhhhhJ``e^^dgfgffhhdgffffagfhhhfVadffhhhfhhbcgffgb]Zefdfhdggd 17 23312957 R 56G18 311 585 -61 F Y
    D44TDFP1 1 8 2315 8996 55131 CGATGT 1 CCCGTCACAGTCCCCGCACGACATCTGTCCTCGGCTCCTCCTTTAATCTTCTTCCAGGGCTCGTTCAGCTGGATG ___eeeeegfegegghiihhhifcghfghfhiihagfhffgiiighifhbggedgdgcddecc]\^bbcb_bXbb 6 58837122 F GTGTGTGGGTGGGTGTGTATT2C41T9 0 174 62 R Y
    D44TDFP1 1 8 2315 8834 55153 CGATGT 1 GCAGTTTCAGCACCGATTAGTAAAGAGTAAAAAACAGGTCTGAGAAACACTGAAGCAAGCTTAAATGTTCTATCG bbbeeeeegggfgiiiiiiibggiiihbfhiiiiiiihghiiiiiiiiiiiiiiiiiihiiggggffgeeeeeec 15 5795216 F 4C23^A$13AG32 223 488 14 R Y
    D44TDFP1 1 8 2315 8986 55172 CGATGT 1 CTCAAATGGTATTGCTGAACTATACTCTAGCTTAGTGCCTCATGGGAAAACTTAATGAAGACCCCAAATAGCAGC ___eeeeegaegchfhhiiihfggghiiihh[gghhhiihhhiidhfggighhcfhhihihfdghiihhfdhihi 11 18655029 F A2C17^ATA$6C25G4^AT$4AT11 109 109 94 R Y
    D44TDFP1 1 8 2315 8771 55174 CGATGT 1 GGAAGATGAAGTACCTGATGATGAAACTGTTAATCAGATGATCGCCAGAAGCGAGGAAGAGTTCGACCACTTTAT aa_ceceeggfbeghiighhfhihiihifhihiiidggi_eghgiihidfhiiihcWbghhbfhigggeeb`bbd 3 19093693 F 75 356 356 2026 R Y
    D44TDFP1 1 8 2315 8872 55193 CGATGT 1 GGTCCAGTGTCCTGCCTATTGATACCATGACGAGTGATAGCCATCAAGTGACCCCTGCAGGTCATTGTGTCACAC bbbeeeedggggfhiiiiiihiiiiiiiiiiiiighiiiiiiiihiiieghihighiiiiiffhiiihhhihfeg 12 23924740 R 75 356 712 -92 F Y
    D44TDFP1 1 8 2315 8792 55236 CGATGT 1 GGATGTTGATGTTGCCCACCAGCACAGACTTGCCCATGTCCTGAACCAGACTGCACAGAGACGCATACAGTGTCT _b_e_ccegg`egggiiiihiiihihfgbghhiiihihiiigiiffhhhhiihhhiifffdhiicggggdZ_Z_b 16 27507286 R A74 321 321 -7454 F Y
    D44TDFP1 1 8 2315 9213 55002 CGATGT 1 CCATAACCTCCATAGCCTTGCTGGCCACCGTATCCGTAGCCCTGACCCCCATAACCCTGATTCCAGTAGTTGTTA ^__eeeeeggggfihfgfhihiihiiiiiiihighifgiefhihhihhfhhihii`fhihggeggggecacecbd 21 3194293 F 17C57 314 314 1364 R Y
    D44TDFP1 1 8 2315 9090 55010 CGATGT 1 GCTGGCAAAAACAAAACAAACAAAAACAAACATTCCCTGCACATCAACTTCTTTTGAACAAACATCCTACGAAGA bbbeeeeegggggiiiiiiiiiihiiiihigiiiihiiiiiihihhhiiihihhihiiiihhiigdgbeeeeccc 7 23987474 R 75 356 712 -198 F Y
    D44TDFP1 1 8 2315 9226 55031 CGATGT 1 GGCAGGGAATGCTACTGGTTGGTGGGTCATGTTTGCTAGTGATTCCTGTTCCATGGAAATAAGGTCTAGACCAGG J\\^cU_J[bccSK[`dJJ[`P[JPHPYccXOIYIIOXIO^XXa^II^IW\aaSHH\HHHN_`b_BBBBBBBBBB 1:0:0 N
    D44TDFP1 1 8 2315 9154 55068 CGATGT 1 GCTCCTTCCCATCCTTCTCTGCACTGCCGTCCAGCAATCTGCCACCCACCCGCGTCCAGGTGTACAGAGGCTCAG bbbeeeeefggggiiiiiiihiiiiiiiiiiiiihhiiiiiiiiihiiiiiihiggeeeebbbdcdbbccacccc 11 29537802 F 36G29A8 272 0 0 N Y
    D44TDFP1 1 8 2315 9019 55092 CGATGT 1 GGCTCTTTCAGCTTAGTTTGCAGTGAATGTGCCATGCTCTCATTTCTCTTCACCACGGTCCACAACTTGATATAT ^\^caacc^KKQ[b^[KK[bddbY`e_YRJJ[[^`d_XbSYbc_c_c[[ccaaRac`^cGVab[`bR\S`ch_BB 6 21628279 R 75 344 571 -4355 F N
    D44TDFP1 1 8 2315 9056 55123 CGATGT 1 TGTGTGTGGGAGCCGGCGCCGGCTGACTCTTCACTGGTGTTTTTAAGTGTTGCGCTGTGGCTTGAGAACAGGATG ^_^caacecegeghc_`edga_aa[\egbdgdhge`g`a^^adbadd]`beababaa_b^aaaTW`ab_^baabb 19 7652288 F 17A57 315 315 114 R Y

  • #2
    Hi ej_duncan,
    The best would be for you to ask your collaborator. On second thought, this contains components of a FASTQ file that has been transform to some kind of tab format.

    HTH

    Comment


    • #3
      Hi Apexy,

      My collaborator has no idea! She got the sequences through a collaborator of a collaborator (weird I know), and they are not responding to our e-mails. Some of the files they have sent through are clearly fastq and others are in bam format. We are going to hope that we get more information from them, but thought in the meantime someone might be able to shed some light on the format for me.

      I have no idea why someone might convert fastq to tab format, but I will have a go at reformatting the file and seeing if it works.

      Thanks
      -Liz

      Comment


      • #4
        I think that is the old Illumina 'qseq' format:
        Most Illumina NGS data files we face are FASTQ/FASTA formats, which include the read sequence and (possible) quality scores. If reads are m...

        Comment


        • #5
          Originally posted by nickloman View Post
          I think that is the old Illumina 'qseq' format:
          http://allaboutbioinfo.blogspot.co.u...-illumina.html
          I think that it is actually the 'Export' format as mentioned on the same page of that link. The data posted above actually has many more fields than qseq.

          Comment


          • #6
            maybe I'm mistaken, but isn't this just a SAM file?

            Comment


            • #7
              Originally posted by RickBioinf View Post
              maybe I'm mistaken, but isn't this just a SAM file?
              It's not, though it's not completely dissimilar.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:37 PM
              0 responses
              10 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 06:07 PM
              0 responses
              10 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              51 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              67 views
              0 likes
              Last Post seqadmin  
              Working...
              X