Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Any idea what sequence format this is?

    Hi Everyone,

    Apologies for the total newbie question.

    I am a biologist (trying to learn some bioinformatics) and have been given some sequencing files from a collaborator - but can not work out what format they are in.

    I have pasted a small part of the file below - does this format look familiar to anyone?

    Thanks in advance
    Liz


    D44TDFP1 1 8 2315 8836 55092 CGATGT 1 GGGGAAAACAACATTAAAACACGCATCTTCAGCTCCTCAAATTTTCTGGTTGCATAGAGAACCAAAAAAAAAAGA ^__c\cR`abcZc]`df]cedRb[eccbe[XIIY^feZX^^^ffhaIXaecLMHN\\bH\bedeZU\dBBBBBBB 2 20978416 R 39C33A1 285 601 -168 F Y
    D44TDFP1 1 8 2315 8772 55102 CGATGT 1 GGCCAGCTTGCTTTGGCGTTTTGTTTCTGGTTGAACGCTATCAGACTGCAGCAGACCATCCATATAGTATCCATA ^JYa^aJ``cWb`dYZbdH[[bRb_bSP^]IYHOY[^^^PGINac[\HM\H\_Q_QGHH\\```\ZHHMZZ^]]` 13 23066392 F 27C47 324 526 5200 R N
    D44TDFP1 1 8 2315 8827 55104 CGATGT 1 GTGGCATTCCTGCTCAAGGTCTGTGATCTGCTTGTCTTTTTGTTTGGCTTCAGCAAAGCAACACTCCAGCTGACC _^_c`cccge]eegfhhhhJ``e^^dgfgffhhdgffffagfhhhfVadffhhhfhhbcgffgb]Zefdfhdggd 17 23312957 R 56G18 311 585 -61 F Y
    D44TDFP1 1 8 2315 8996 55131 CGATGT 1 CCCGTCACAGTCCCCGCACGACATCTGTCCTCGGCTCCTCCTTTAATCTTCTTCCAGGGCTCGTTCAGCTGGATG ___eeeeegfegegghiihhhifcghfghfhiihagfhffgiiighifhbggedgdgcddecc]\^bbcb_bXbb 6 58837122 F GTGTGTGGGTGGGTGTGTATT2C41T9 0 174 62 R Y
    D44TDFP1 1 8 2315 8834 55153 CGATGT 1 GCAGTTTCAGCACCGATTAGTAAAGAGTAAAAAACAGGTCTGAGAAACACTGAAGCAAGCTTAAATGTTCTATCG bbbeeeeegggfgiiiiiiibggiiihbfhiiiiiiihghiiiiiiiiiiiiiiiiiihiiggggffgeeeeeec 15 5795216 F 4C23^A$13AG32 223 488 14 R Y
    D44TDFP1 1 8 2315 8986 55172 CGATGT 1 CTCAAATGGTATTGCTGAACTATACTCTAGCTTAGTGCCTCATGGGAAAACTTAATGAAGACCCCAAATAGCAGC ___eeeeegaegchfhhiiihfggghiiihh[gghhhiihhhiidhfggighhcfhhihihfdghiihhfdhihi 11 18655029 F A2C17^ATA$6C25G4^AT$4AT11 109 109 94 R Y
    D44TDFP1 1 8 2315 8771 55174 CGATGT 1 GGAAGATGAAGTACCTGATGATGAAACTGTTAATCAGATGATCGCCAGAAGCGAGGAAGAGTTCGACCACTTTAT aa_ceceeggfbeghiighhfhihiihifhihiiidggi_eghgiihidfhiiihcWbghhbfhigggeeb`bbd 3 19093693 F 75 356 356 2026 R Y
    D44TDFP1 1 8 2315 8872 55193 CGATGT 1 GGTCCAGTGTCCTGCCTATTGATACCATGACGAGTGATAGCCATCAAGTGACCCCTGCAGGTCATTGTGTCACAC bbbeeeedggggfhiiiiiihiiiiiiiiiiiiighiiiiiiiihiiieghihighiiiiiffhiiihhhihfeg 12 23924740 R 75 356 712 -92 F Y
    D44TDFP1 1 8 2315 8792 55236 CGATGT 1 GGATGTTGATGTTGCCCACCAGCACAGACTTGCCCATGTCCTGAACCAGACTGCACAGAGACGCATACAGTGTCT _b_e_ccegg`egggiiiihiiihihfgbghhiiihihiiigiiffhhhhiihhhiifffdhiicggggdZ_Z_b 16 27507286 R A74 321 321 -7454 F Y
    D44TDFP1 1 8 2315 9213 55002 CGATGT 1 CCATAACCTCCATAGCCTTGCTGGCCACCGTATCCGTAGCCCTGACCCCCATAACCCTGATTCCAGTAGTTGTTA ^__eeeeeggggfihfgfhihiihiiiiiiihighifgiefhihhihhfhhihii`fhihggeggggecacecbd 21 3194293 F 17C57 314 314 1364 R Y
    D44TDFP1 1 8 2315 9090 55010 CGATGT 1 GCTGGCAAAAACAAAACAAACAAAAACAAACATTCCCTGCACATCAACTTCTTTTGAACAAACATCCTACGAAGA bbbeeeeegggggiiiiiiiiiihiiiihigiiiihiiiiiihihhhiiihihhihiiiihhiigdgbeeeeccc 7 23987474 R 75 356 712 -198 F Y
    D44TDFP1 1 8 2315 9226 55031 CGATGT 1 GGCAGGGAATGCTACTGGTTGGTGGGTCATGTTTGCTAGTGATTCCTGTTCCATGGAAATAAGGTCTAGACCAGG J\\^cU_J[bccSK[`dJJ[`P[JPHPYccXOIYIIOXIO^XXa^II^IW\aaSHH\HHHN_`b_BBBBBBBBBB 1:0:0 N
    D44TDFP1 1 8 2315 9154 55068 CGATGT 1 GCTCCTTCCCATCCTTCTCTGCACTGCCGTCCAGCAATCTGCCACCCACCCGCGTCCAGGTGTACAGAGGCTCAG bbbeeeeefggggiiiiiiihiiiiiiiiiiiiihhiiiiiiiiihiiiiiihiggeeeebbbdcdbbccacccc 11 29537802 F 36G29A8 272 0 0 N Y
    D44TDFP1 1 8 2315 9019 55092 CGATGT 1 GGCTCTTTCAGCTTAGTTTGCAGTGAATGTGCCATGCTCTCATTTCTCTTCACCACGGTCCACAACTTGATATAT ^\^caacc^KKQ[b^[KK[bddbY`e_YRJJ[[^`d_XbSYbc_c_c[[ccaaRac`^cGVab[`bR\S`ch_BB 6 21628279 R 75 344 571 -4355 F N
    D44TDFP1 1 8 2315 9056 55123 CGATGT 1 TGTGTGTGGGAGCCGGCGCCGGCTGACTCTTCACTGGTGTTTTTAAGTGTTGCGCTGTGGCTTGAGAACAGGATG ^_^caacecegeghc_`edga_aa[\egbdgdhge`g`a^^adbadd]`beababaa_b^aaaTW`ab_^baabb 19 7652288 F 17A57 315 315 114 R Y

  • #2
    Hi ej_duncan,
    The best would be for you to ask your collaborator. On second thought, this contains components of a FASTQ file that has been transform to some kind of tab format.

    HTH

    Comment


    • #3
      Hi Apexy,

      My collaborator has no idea! She got the sequences through a collaborator of a collaborator (weird I know), and they are not responding to our e-mails. Some of the files they have sent through are clearly fastq and others are in bam format. We are going to hope that we get more information from them, but thought in the meantime someone might be able to shed some light on the format for me.

      I have no idea why someone might convert fastq to tab format, but I will have a go at reformatting the file and seeing if it works.

      Thanks
      -Liz

      Comment


      • #4
        I think that is the old Illumina 'qseq' format:
        Most Illumina NGS data files we face are FASTQ/FASTA formats, which include the read sequence and (possible) quality scores. If reads are m...

        Comment


        • #5
          Originally posted by nickloman View Post
          I think that is the old Illumina 'qseq' format:
          http://allaboutbioinfo.blogspot.co.u...-illumina.html
          I think that it is actually the 'Export' format as mentioned on the same page of that link. The data posted above actually has many more fields than qseq.

          Comment


          • #6
            maybe I'm mistaken, but isn't this just a SAM file?

            Comment


            • #7
              Originally posted by RickBioinf View Post
              maybe I'm mistaken, but isn't this just a SAM file?
              It's not, though it's not completely dissimilar.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Best Practices for Single-Cell Sequencing Analysis
                by seqadmin



                While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                06-06-2024, 07:15 AM
              • seqadmin
                Latest Developments in Precision Medicine
                by seqadmin



                Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                Somatic Genomics
                “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                05-24-2024, 01:16 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 07:24 AM
              0 responses
              10 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 06-13-2024, 08:58 AM
              0 responses
              11 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 06-12-2024, 02:20 PM
              0 responses
              16 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 06-07-2024, 06:58 AM
              0 responses
              184 views
              0 likes
              Last Post seqadmin  
              Working...
              X