Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Any idea what sequence format this is?

    Hi Everyone,

    Apologies for the total newbie question.

    I am a biologist (trying to learn some bioinformatics) and have been given some sequencing files from a collaborator - but can not work out what format they are in.

    I have pasted a small part of the file below - does this format look familiar to anyone?

    Thanks in advance
    Liz


    D44TDFP1 1 8 2315 8836 55092 CGATGT 1 GGGGAAAACAACATTAAAACACGCATCTTCAGCTCCTCAAATTTTCTGGTTGCATAGAGAACCAAAAAAAAAAGA ^__c\cR`abcZc]`df]cedRb[eccbe[XIIY^feZX^^^ffhaIXaecLMHN\\bH\bedeZU\dBBBBBBB 2 20978416 R 39C33A1 285 601 -168 F Y
    D44TDFP1 1 8 2315 8772 55102 CGATGT 1 GGCCAGCTTGCTTTGGCGTTTTGTTTCTGGTTGAACGCTATCAGACTGCAGCAGACCATCCATATAGTATCCATA ^JYa^aJ``cWb`dYZbdH[[bRb_bSP^]IYHOY[^^^PGINac[\HM\H\_Q_QGHH\\```\ZHHMZZ^]]` 13 23066392 F 27C47 324 526 5200 R N
    D44TDFP1 1 8 2315 8827 55104 CGATGT 1 GTGGCATTCCTGCTCAAGGTCTGTGATCTGCTTGTCTTTTTGTTTGGCTTCAGCAAAGCAACACTCCAGCTGACC _^_c`cccge]eegfhhhhJ``e^^dgfgffhhdgffffagfhhhfVadffhhhfhhbcgffgb]Zefdfhdggd 17 23312957 R 56G18 311 585 -61 F Y
    D44TDFP1 1 8 2315 8996 55131 CGATGT 1 CCCGTCACAGTCCCCGCACGACATCTGTCCTCGGCTCCTCCTTTAATCTTCTTCCAGGGCTCGTTCAGCTGGATG ___eeeeegfegegghiihhhifcghfghfhiihagfhffgiiighifhbggedgdgcddecc]\^bbcb_bXbb 6 58837122 F GTGTGTGGGTGGGTGTGTATT2C41T9 0 174 62 R Y
    D44TDFP1 1 8 2315 8834 55153 CGATGT 1 GCAGTTTCAGCACCGATTAGTAAAGAGTAAAAAACAGGTCTGAGAAACACTGAAGCAAGCTTAAATGTTCTATCG bbbeeeeegggfgiiiiiiibggiiihbfhiiiiiiihghiiiiiiiiiiiiiiiiiihiiggggffgeeeeeec 15 5795216 F 4C23^A$13AG32 223 488 14 R Y
    D44TDFP1 1 8 2315 8986 55172 CGATGT 1 CTCAAATGGTATTGCTGAACTATACTCTAGCTTAGTGCCTCATGGGAAAACTTAATGAAGACCCCAAATAGCAGC ___eeeeegaegchfhhiiihfggghiiihh[gghhhiihhhiidhfggighhcfhhihihfdghiihhfdhihi 11 18655029 F A2C17^ATA$6C25G4^AT$4AT11 109 109 94 R Y
    D44TDFP1 1 8 2315 8771 55174 CGATGT 1 GGAAGATGAAGTACCTGATGATGAAACTGTTAATCAGATGATCGCCAGAAGCGAGGAAGAGTTCGACCACTTTAT aa_ceceeggfbeghiighhfhihiihifhihiiidggi_eghgiihidfhiiihcWbghhbfhigggeeb`bbd 3 19093693 F 75 356 356 2026 R Y
    D44TDFP1 1 8 2315 8872 55193 CGATGT 1 GGTCCAGTGTCCTGCCTATTGATACCATGACGAGTGATAGCCATCAAGTGACCCCTGCAGGTCATTGTGTCACAC bbbeeeedggggfhiiiiiihiiiiiiiiiiiiighiiiiiiiihiiieghihighiiiiiffhiiihhhihfeg 12 23924740 R 75 356 712 -92 F Y
    D44TDFP1 1 8 2315 8792 55236 CGATGT 1 GGATGTTGATGTTGCCCACCAGCACAGACTTGCCCATGTCCTGAACCAGACTGCACAGAGACGCATACAGTGTCT _b_e_ccegg`egggiiiihiiihihfgbghhiiihihiiigiiffhhhhiihhhiifffdhiicggggdZ_Z_b 16 27507286 R A74 321 321 -7454 F Y
    D44TDFP1 1 8 2315 9213 55002 CGATGT 1 CCATAACCTCCATAGCCTTGCTGGCCACCGTATCCGTAGCCCTGACCCCCATAACCCTGATTCCAGTAGTTGTTA ^__eeeeeggggfihfgfhihiihiiiiiiihighifgiefhihhihhfhhihii`fhihggeggggecacecbd 21 3194293 F 17C57 314 314 1364 R Y
    D44TDFP1 1 8 2315 9090 55010 CGATGT 1 GCTGGCAAAAACAAAACAAACAAAAACAAACATTCCCTGCACATCAACTTCTTTTGAACAAACATCCTACGAAGA bbbeeeeegggggiiiiiiiiiihiiiihigiiiihiiiiiihihhhiiihihhihiiiihhiigdgbeeeeccc 7 23987474 R 75 356 712 -198 F Y
    D44TDFP1 1 8 2315 9226 55031 CGATGT 1 GGCAGGGAATGCTACTGGTTGGTGGGTCATGTTTGCTAGTGATTCCTGTTCCATGGAAATAAGGTCTAGACCAGG J\\^cU_J[bccSK[`dJJ[`P[JPHPYccXOIYIIOXIO^XXa^II^IW\aaSHH\HHHN_`b_BBBBBBBBBB 1:0:0 N
    D44TDFP1 1 8 2315 9154 55068 CGATGT 1 GCTCCTTCCCATCCTTCTCTGCACTGCCGTCCAGCAATCTGCCACCCACCCGCGTCCAGGTGTACAGAGGCTCAG bbbeeeeefggggiiiiiiihiiiiiiiiiiiiihhiiiiiiiiihiiiiiihiggeeeebbbdcdbbccacccc 11 29537802 F 36G29A8 272 0 0 N Y
    D44TDFP1 1 8 2315 9019 55092 CGATGT 1 GGCTCTTTCAGCTTAGTTTGCAGTGAATGTGCCATGCTCTCATTTCTCTTCACCACGGTCCACAACTTGATATAT ^\^caacc^KKQ[b^[KK[bddbY`e_YRJJ[[^`d_XbSYbc_c_c[[ccaaRac`^cGVab[`bR\S`ch_BB 6 21628279 R 75 344 571 -4355 F N
    D44TDFP1 1 8 2315 9056 55123 CGATGT 1 TGTGTGTGGGAGCCGGCGCCGGCTGACTCTTCACTGGTGTTTTTAAGTGTTGCGCTGTGGCTTGAGAACAGGATG ^_^caacecegeghc_`edga_aa[\egbdgdhge`g`a^^adbadd]`beababaa_b^aaaTW`ab_^baabb 19 7652288 F 17A57 315 315 114 R Y

  • #2
    Hi ej_duncan,
    The best would be for you to ask your collaborator. On second thought, this contains components of a FASTQ file that has been transform to some kind of tab format.

    HTH

    Comment


    • #3
      Hi Apexy,

      My collaborator has no idea! She got the sequences through a collaborator of a collaborator (weird I know), and they are not responding to our e-mails. Some of the files they have sent through are clearly fastq and others are in bam format. We are going to hope that we get more information from them, but thought in the meantime someone might be able to shed some light on the format for me.

      I have no idea why someone might convert fastq to tab format, but I will have a go at reformatting the file and seeing if it works.

      Thanks
      -Liz

      Comment


      • #4
        I think that is the old Illumina 'qseq' format:
        Most Illumina NGS data files we face are FASTQ/FASTA formats, which include the read sequence and (possible) quality scores. If reads are m...

        Comment


        • #5
          Originally posted by nickloman View Post
          I think that is the old Illumina 'qseq' format:
          http://allaboutbioinfo.blogspot.co.u...-illumina.html
          I think that it is actually the 'Export' format as mentioned on the same page of that link. The data posted above actually has many more fields than qseq.

          Comment


          • #6
            maybe I'm mistaken, but isn't this just a SAM file?

            Comment


            • #7
              Originally posted by RickBioinf View Post
              maybe I'm mistaken, but isn't this just a SAM file?
              It's not, though it's not completely dissimilar.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                New Genomics Tools and Methods Shared at AGBT 2025
                by seqadmin


                This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                The Headliner
                The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                03-03-2025, 01:39 PM
              • seqadmin
                Investigating the Gut Microbiome Through Diet and Spatial Biology
                by seqadmin




                The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
                02-24-2025, 06:31 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 03-03-2025, 01:15 PM
              0 responses
              178 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 02-28-2025, 12:58 PM
              0 responses
              271 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 02-24-2025, 02:48 PM
              0 responses
              654 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 02-21-2025, 02:46 PM
              0 responses
              267 views
              0 likes
              Last Post seqadmin  
              Working...
              X