Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SRA Toolkit and Conversion to Illumina Fastq Format

    Hi Seqers,

    I am trying to convert the SRA ChIP-Seq file (SRA Archive) to Illumina Fastq format. I ran illumina-dump -A <Accession Number> <filename>. I got about more 100 qcal and seq files. Now, I would like to know what should me my input file for ELAND_standalone.pl aligner program.

    Do I have to concatenate all my 100 seq files into 1 file and then run ELAND_standalone.pl ?

    Any help/hints/suggestions/advice would highly be appreciated.

    Thanks.

  • #2
    You could use EMBOSS seqret (or BioPerl or Biopython or ...) to convert from Sanger FASTQ encoding to the old Illumina encoding - but you might also need to massage the record names to suit ELAND.

    Comment


    • #3
      EMBOSS seqret command for converting FAstq Sanger old illumina FASTQ

      I am trying to find what unix command I may use to convert FAStq new QC format (sanger) to old illumina qc format. I have PE data. Thanks
      Last edited by mathew; 05-17-2012, 05:55 AM.

      Comment


      • #4
        Originally posted by mathew View Post
        I am trying to find what unix command I may use to convert FAStq new QC format (sanger) to old illumina qc format. I have PE data. Thanks
        What do you mean by QC?

        EMBOSS seqret (mentioned earlier) can interconvert Sanger FASTQ (used for Illumina 1.8+), the original Solexa to Illumina 1.2 FASTQ (which did not use PHRED scores), and the Illumina 1.3 to 1.7 FASTQ variants.

        Comment


        • #5
          i.e. If your input Sanger encoded FASTQ file is called input.fastq, and you want to turn it into Illumina 1.3+ encoded FASTQ, try:

          seqret -sequence=input.fastq -sformat=fastq-sanger -osformat=fastq-illumina -outseq=output.fastq

          Comment


          • #6
            conversion to illumina 1.3Fastq

            Hi maubp,

            Thanks for your help I ran the command it did not gave me any error. Here is a part of file before running command (before, sanger) and after running comand (after Illumina). I dont see a difference. Am I missed something or did something wrong.
            I just inserted in put and out put file names. Any advice please.
            __
            (Before, sanger)


            @HWI-ST413:193092FACXX:1:1101:1180:1912 1:N:0:
            NATGTACCTGACGAAGCAGCTACCATCTCAGCAGTTGCTGGTCACTGTGCAGTGGAAAAGAGAGAAGTGCATGAAGTCAGCAATTATACTTGGCCTGGAAG
            +
            #1=DDDFFHGHHGHAHGIIIIEIGHGHGIIGDIACDHIIIIHBDHHGIEHIIFHIGCHG@@FGIEHI=CHGEFCB?DCDFAECCECCDACCCC>>A@BBCC
            @HWI-ST413:193092FACXX:1:1101:1225:1915 1:N:0:
            NAACAGAATAAAGATTATAATTACATTTGATTTAGTTCCAAAAACGGAGTCAAAAATCTTAACCTTTGACAAGACCTGTGTAAAGAAGCTGAGGTAAGCAT
            +
            #1:BDAABFDB9E:E:<?IECFEABA<EEF@@C?B?<FFGII<0?09BGFEC>8=)=B8B4=)7.)7=2=D))7)=@B@BBB96>6;ABB5;>@BBB
            @HWI-ST413:193092FACXX:1:1101:1201:1926 1:N:0:
            NAGGCTTCCTTCATCTCTCCTCTACACAATCTCTTCCTAGTCTTGCTATAGCCAAATTTGTCTCCTTGCTGTTTGTGAAGAAGCCAAACATATTTCTACCT
            +
            #1=DADDFHHHHDGIIFCHIGGHGIDHEIJIGEIGIIIFGGHIJIJIJJJDIIEFGIIJJCHIIIIJGGHCHJJIGGIIGHGFEE>CEFECCEEEEECCC>
            @HWI-ST413:193092FACXX:1:1101:1176:1929 1:N:0:
            NCATCTCCAAGTTGCTAAAGCCTAATGAGAAAAAAAAATGGTAAATATCCATATCATCTCTTATGATGAAAAGCTATTATGTTTTCAAAACTTAACTAAAC
            +
            #11AB;BDDDBDBEEEBEAEEBEFIIIEEIIIIIIDIIIEIEEEEEEIEEECCEEII;7?;;ACCDDD;?D@A96(;>D>A>AD?A>A>:9AAAAAAAAA9
            @HWI-ST413:193092FACXX:1:1101:1249:1946 1:N:0:
            NAATTTAACCAACAAGGTGAAATATCTGTTATACCAAAAATTATAAAACATTGAGGAAATTGCCGATGACACAAATAAGTGGAAAGGTATCCCATGTTCAT
            +
            #11ADDDDHDHDHDAFA2<?EDFFF<BHHE@EFEEFEDEHHFBFFCFCGIIIFII9BHCGHGECGGIEDHCCEHDBEEEEDAC@CAB>@@CCCAAC>CD@>
            @HWI-ST413:193092FACXX:1:1101:1227:1952 1:N:0:
            NTCTGCCTTTACCTTCAAAGTCTGAGCAAATATGATTTTATATCTTTTTAATTAGAGATTCTTTTAAAGACCAAGTTACTGCAGTCCTGTCTTGTTCTTCT
            +
            #1=DDDFFHHGGHJJGIJGHEHHHDFHFHFHGIIFFIIIJJCHEIIJIGICHEIIFGIJJJIGGGIGCHCGIIJIHEHCHIGEHHCEHFBE>@DEECDAC@
            @HWI-ST413:193092FACXX:1:1101:1157:1988 1:N:0:
            CNAGAAGCGCTAACAATTATTTTGTATGATCAATAGAGAATTGCAACAGTTTTTGTTGTGTTGATACTCAATGACTTATGATGCTGAAAAACTAGTGAGGA
            +
            @#1ADDDDGFFHHJJJJJIJIIJICGIIHIGIEGCGGHHEHGIDHIEHI@FIHJIIIJGHGGIEGICHEHEEHCB;CFEF@CEECCACDCDDCDDCCCCAA
            @HWI-ST413:193092FACXX:1:1101:1225:2000 1:N:0:
            TTTGTTTACATTCTATTCGATTCCATTCCATTTGAATCAATTATATTGCAATTTATTGCATTGGAGTCCGTTCAAATGCACTCCATACCGTTCCATTCCAT
            +

            ###########################################
            After _ Illumina

            @HWI-ST413:193092FACXX:1:1101:1180:1912 1:N:0:
            NATGTACCTGACGAAGCAGCTACCATCTCAGCAGTTGCTGGTCACTGTGCAGTGGAAAAGAGAGAAGTGCATGAAGTCAGCAATTATACTTGGCCTGGAAG
            +
            BP\ccceegfggfg`gfhhhhdhfgfgfhhfch`bcghhhhgacggfhdghheghfbgf__efhdgh\bgfdeba^cbce`dbbdbbc`bbbb]]`_aabb
            @HWI-ST413:193092FACXX:1:1101:1225:1915 1:N:0:
            NAACAGAATAAAGATTATAATTACATTTGATTTAGTTCCAAAAACGGAGTCAAAAATCTTAACCTTTGACAAGACCTGTGTAAAGAAGCTGAGGTAAGCAT
            +
            BPYac``aYcecaXdYdY[^hdbed`a`[dde__b^a^[eefhh[OYH^OXafedb]W\H\aWaS\HVMHV\Q\cHHVH\_a_aaaXU]UZ`aaTZ]_aaa
            @HWI-ST413:193092FACXX:1:1101:1201:1926 1:N:0:
            NAGGCTTCCTTCATCTCTCCTCTACACAATCTCTTCCTAGTCTTGCTATAGCCAAATTTGTCTCCTTGCTGTTTGTGAAGAAGCCAAACATATTTCTACCT
            +
            BP\c`cceggggcfhhebghffgfhcgdhihfdhfhhheffghihihiiichhdefhhiibghhhhiffgbgiihffhhfgfedd]bdedbbdddddbbb]
            @HWI-ST413:193092FACXX:1:1101:1176:1929 1:N:0:
            NCATCTCCAAGTTGCTAAAGCCTAATGAGAAAAAAAAATGGTAAATATCCATATCATCTCTTATGATGAAAAGCTATTATGTTTTCAAAACTTAACTAAAC
            +
            BPP`aZacccacadddad`ddadehhhddhhhhhhchhhdhddddddhdddbbddhhZV^ZZ`bbcccZ^c_`XUGZ]c]`]`c^`]`]YX`````````X
            @HWI-ST413:193092FACXX:1:1101:1249:1946 1:N:0:
            NAATTTAACCAACAAGGTGAAATATCTGTTATACCAAAAATTATAAAACATTGAGGAAATTGCCGATGACACAAATAAGTGGAAAGGTATCCCATGTTCAT
            +
            BPP`ccccgcgcgc`e`Q[^dceee[aggd_deddedcdggeaeebebfhhhehhXagbfgfdbffhdcgbbdgcaddddc`b_b`a]__bbb``b]bc_]
            @HWI-ST413:193092FACXX:1:1101:1227:1952 1:N:0:
            NTCTGCCTTTACCTTCAAAGTCTGAGCAAATATGATTTTATATCTTTTTAATTAGAGATTCTTTTAAAGACCAAGTTACTGCAGTCCTGTCTTGTTCTTCT
            +
            BP\ccceeggffgiifhifgdgggcegegegfhheehhhiibgdhhihfhbgdhhefhiiihfffhfbgbfhhihgdgbghfdggbdgead]_cddbc`b_
            @HWI-ST413:193092FACXX:1:1101:1157:1988 1:N:0:
            CNAGAAGCGCTAACAATTATTTTGTATGATCAATAGAGAATTGCAACAGTTTTTGTTGTGTTGATACTCAATGACTTATGATGCTGAAAAACTAGTGAGGA
            +
            _BP`ccccfeeggiiiiihihhihbfhhghfhdfbffggdgfhcghdgh_ehgihhhifgffhdfhbgdgddgbaZbede_bddbb`bcbccbccbbbb``
            @HWI-ST413:193092FACXX:1:1101:1225:2000 1:N:0:
            TTTGTTTACATTCTATTCGATTCCATTCCATTTGAATCAATTATATTGCAATTTATTGCATTGGAGTCCGTTCAAATGCACTCCATACCGTTCCATTCCAT
            +
            bbbeceeegggggiiiiiiiiiiiiiihiiiiiiifiiihiifhiiiihiiiiihiiihiiiiiiiiiighhiiiiiiiiiiiihgggggeeeedceeddd
            @HWI-ST413:193092FACXX:1:1101:1361:1913 1:N:0:
            NTCACAGTCCCAGTGGGCCTTGTCTGTCACTGAGTTACAAGCCACACTCAATCCCTGGAGATGCTGAGTGCTGTTAATGGACACGTGATGCCGGCTAAACA
            +
            BP\accdeac`gagfafff`fdg`eadghhh]df^b[cffh`g_efdfhbcgfhffbgggefffgghfbbgggeg_db]ababdb_aabbbb__aY[G]]b
            @HWI-ST413:193092FACXX:1:1101:1439:1915 1:N:0:
            NCATGTCAACTACTTGTGATGAGTTTCTGAGTCTAGCAAAGTCCGTAAACCCTAGTATTTCTCTCCTTTTTTCCCTGCAGAAAGGATCTTGCTCTGTGGCC
            +
            BPYc^caccaea^ef``ba[ba[[^ecgbagd`eadc[ee_fdfdedeccede_^e^_ecaaeffedfdefede_V^^_a_cR]`a`aaa``aaa`]`[_^
            @HWI-ST413:193092FACXX:1:1101:1383:1918 1:N:0:
            NAGTGATCCTCTTAACTAATGCTTAAGCTCCAATTTCTTGCCATAGTGCTTATCACAGATTGTACTCCTAAGACTGACCTCCAGATTTATCTCCTGAAGCA
            +

            Comment


            • #7
              Originally posted by mathew View Post
              Hi maubp,

              Thanks for your help I ran the command it did not gave me any error. Here is a part of file before running command (before, sanger) and after running comand (after Illumina). I dont see a difference. Am I missed something or did something wrong.
              I just inserted in put and out put file names. Any advice please.
              That has changed the data - look at the first record for instance,
              (Before, sanger)
              Code:
              @HWI-ST413:193:D092FACXX:1:1101:1180:1912 1:N:0:
              NATGTACCTGACGAAGCAGCTACCATCTCAGCAGTTGCTGGTCACTGTGCAGTGGAAAAGAGAGAAGTGCATGAAGTCAGCAATTATACTTGGCCTGGAAG
              +
              #1=DDDFFHGHHGHAHGIIIIEIGHGHGIIGDIACDHIIIIHBDHHGIEHIIFHIGCHG@@FGIEHI=CHGEFCB?DCDFAECCECCDACCCC>>A@BBCC
              After _ Illumina
              Code:
              @HWI-ST413:193:D092FACXX:1:1101:1180:1912 1:N:0:
              NATGTACCTGACGAAGCAGCTACCATCTCAGCAGTTGCTGGTCACTGTGCAGTGGAAAAGAGAGAAGTGCATGAAGTCAGCAATTATACTTGGCCTGGAAG
              +
              BP\ccceegfggfg`gfhhhhdhfgfgfhhfch`bcghhhhgacggfhdghheghfbgf__efhdgh\bgfdeba^cbce`dbbdbbc`bbbb]]`_aabb
              The fourth line which is the qualities has changed. I've not doubled checked, but it looks OK.

              Comment


              • #8
                SRA database fastq format

                Hello, I want to ask a quenstion:when I directly download FASTQ format from SRA database, it looks like this, as follows, I want to know how can I convert it to an available data to analyse it directly? I have no idea how to deal with it, can anybody help me ? Thank you!

                @SRR031126.1.1 SOLEXA-GA02_SRi_AK_BN_test:1:1:0:41.1 length=76
                NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
                +SRR031126.1.1 SOLEXA-GA02_SRi_AK_BN_test:1:1:0:41.1 length=76
                !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
                @SRR031126.2.1 SOLEXA-GA02_SRi_AK_BN_test:1:1:0:69.1 length=76
                NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
                +SRR031126.2.1 SOLEXA-GA02_SRi_AK_BN_test:1:1:0:69.1 length=76
                !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
                @SRR031126.3.1 SOLEXA-GA02_SRi_AK_BN_test:1:1:0:129.1 length=76
                NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
                +SRR031126.3.1 SOLEXA-GA02_SRi_AK_BN_test:1:1:0:129.1 length=76
                !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
                @SRR031126.4.1 SOLEXA-GA02_SRi_AK_BN_test:1:1:0:154.1 length=76
                NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
                +SRR031126.4.1 SOLEXA-GA02_SRi_AK_BN_test:1:1:0:154.1 length=76
                !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
                @SRR031126.5.1 SOLEXA-GA02_SRi_AK_BN_test:1:1:0:171.1 length=76
                NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
                +SRR031126.5.1 SOLEXA-GA02_SRi_AK_BN_test:1:1:0:171.1 length=76
                !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
                @SRR031126.6.1 SOLEXA-GA02_SRi_AK_BN_test:1:1:0:273.1 length=76
                NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
                +SRR031126.6.1 SOLEXA-GA02_SRi_AK_BN_test:1:1:0:273.1 length=76
                !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
                @SRR031126.7.1 SOLEXA-GA02_SRi_AK_BN_test:1:1:0:374.1 length=76
                NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
                +SRR031126.7.1 SOLEXA-GA02_SRi_AK_BN_test:1:1:0:374.1 length=76
                !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
                @SRR031126.8.1 SOLEXA-GA02_SRi_AK_BN_test:1:1:0:404.1 length=76
                NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

                Comment


                • #9
                  Something is very wrong with that data - all the bases are N and all the qualities are zero (the "!" is ASCII 33 means it encodes PHRED zero). Perhaps this is just an edge effect - the first and last reads on a Solexa/Illumina run are not as good as those in the middle of the slide.

                  How exactly did you get this data from the SRA?

                  Comment


                  • #10
                    Originally posted by maubp View Post
                    Something is very wrong with that data - all the bases are N and all the qualities are zero (the "!" is ASCII 33 means it encodes PHRED zero). Perhaps this is just an edge effect - the first and last reads on a Solexa/Illumina run are not as good as those in the middle of the slide.

                    How exactly did you get this data from the SRA?
                    I download the data with the selection of fltered download,and then select FASTQ format. BTW, there are two ways that we can get FASTQ format,the one is directly download FASTQ format like that from SRA;the other one is first download .sra files, and then convert to fastq format. Do anyone know the difference of FASTQ files between the two ways ? Thank you!

                    Comment


                    • #11
                      Can it go wrong if I do

                      fastq-dump --split-3 --gzip SRR012345.sra

                      ??

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        New Genomics Tools and Methods Shared at AGBT 2025
                        by seqadmin


                        This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                        The Headliner
                        The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                        03-03-2025, 01:39 PM
                      • seqadmin
                        Investigating the Gut Microbiome Through Diet and Spatial Biology
                        by seqadmin




                        The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
                        02-24-2025, 06:31 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, Today, 12:50 PM
                      0 responses
                      10 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-03-2025, 01:15 PM
                      0 responses
                      181 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 02-28-2025, 12:58 PM
                      0 responses
                      276 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 02-24-2025, 02:48 PM
                      0 responses
                      663 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X