Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • sklages
    Senior Member
    • May 2008
    • 628

    #76
    Originally posted by skruglyak View Post
    Yes, there were strong opinions on both sides of the read naming issue. At the time, unaligned BAM was not supported input to the popular aligners. The format has been getting wider acceptance and I see the value of providing it as an option in the future.
    What is the "other side" of "both sides"?

    We are running three HiSeqs and a few GAs; reading and rewriting a few hundred gigabytes of compressed sequence data just to fix a deficient header is quite annoying IMHO.

    I do agree SAM would be a nice option for data storage (it should probably not replace fastq yet, many people do still use fastq as input for their programs).
    If it very wise to use a binary (sequencing specific) storage format like BAM ... I don't know, just a bad feeling :-)

    Strange enough (never mentioned) ... lots of IT folks would appreciate if the "we create many, many files" madness would be limited to some reasonable number.
    1,629,325 files for a 2x120 run is by far too much ...

    just my 2p,
    Sven
    Last edited by sklages; 11-04-2011, 05:18 AM. Reason: typos

    Comment

    • afaghalavi
      Junior Member
      • Sep 2011
      • 6

      #77
      Hello Dear Sir/Madam

      We received our exome data and now i have 2 files (snps and indels) in text format.
      I copy and paste a part of that in below. Please let me know what is next stage for data analysis and what shall I do ??!!! can I use annovar for its analysis and anotation??

      #$ COLUMNS seq_name pos bcalls_used bcalls_filt ref Q(snp) max_gt Q(max_gt) max_gt|poly_site Q(max_gt|poly_site) A_used C_used G_used T_used
      chr1 12783 2 0 G 24 AA 5 AA 5 2 0 0 0
      chr1 13057 3 1 G 3 GG 4 CG 31 0 1 2 0
      chr1 13351 1 0 T 1 TT 10 GT 3 0 0 1 0
      chr1 14673 2 0 G 32 CC 5 CC 5 0 2 0 0


      Best

      Comment

      • Orr Shomroni
        Member
        • Oct 2011
        • 26

        #78
        Thanks for the tip on the filtering, dawe. Our previous filtering resulted with only headers for 'Y' reads and -- as body, and apperently that wasn't much of an issue. Still, the new command makes it look cleaner.

        One thing troubles me, though. I am trying to run the filtered files on FastQC, but I'm getting an error that the filtered fastq files are not in gz format. When I try to compress them, it says it cannot, because they are already in .gz format; when I try to decompress them, I get an error because the files are not GZIP files.

        I imagine there should be an easy way to modify the extension for the filtered fastq file, but I am not sure how to do that within the "for" loop
        "Though it may seem that all's been said and done, originality still lives on" - some unoriginal guy who had nothing better to write as his signature

        Comment

        • Orr Shomroni
          Member
          • Oct 2011
          • 26

          #79
          Ok, I solved the problem. Maybe I missed it, but this situation only applies if you are dealing with uncompressed fastq files to begin with. The filtering process necessarily returns an unzipped file, so the filename has to be adjusted and the file has to be compressed
          "Though it may seem that all's been said and done, originality still lives on" - some unoriginal guy who had nothing better to write as his signature

          Comment

          • olus
            Member
            • Aug 2008
            • 22

            #80
            Originally posted by sparks View Post
            Hi,
            V1.8 has some extra fields:
            <is filtered> is Y if the read is filtered, N otherwise.
            <control number> is 0 when none of the control bits are on, otherwise it is an even number.
            Does anyone know what these are for?
            Is is_filtered reminiscent of QSEQ quality flag and if so does 'Y' mean high or low quality?

            Colin
            Hi Colin.
            Did find out what
            <control number>
            in '@' FASTQ line is used for?

            Except the light definition in the official pdf I couldn't find any suggestion.

            If anybody could give me some hints it would be really appreciated!

            Gabriele
            gabriele bucci

            Comment

            • sparks
              Senior Member
              • Mar 2008
              • 126

              #81
              Hi Gabriele,
              I never found out about the control bits. The is_filtered is a flag that Illumina sets if they think the read might be from a polyclonal cluster.
              Colin
              Originally posted by olus View Post
              Hi Colin.
              Did find out what
              <control number>
              in '@' FASTQ line is used for?

              Except the light definition in the official pdf I couldn't find any suggestion.

              If anybody could give me some hints it would be really appreciated!

              Gabriele

              Comment

              • olus
                Member
                • Aug 2008
                • 22

                #82
                Originally posted by sparks View Post
                Hi Gabriele,
                I never found out about the control bits. The is_filtered is a flag that Illumina sets if they think the read might be from a polyclonal cluster.
                Colin
                Thank you for your reply.
                At the end I found some clues of what it could be.
                It seems that the bit value is inherited from the .control files and store the information about the eventual PhiX spike in, barcode mismatches etc...:

                Cheers

                Gabriele



                (look at OLB_UG_15009920C.pdf from illumina)
                gabriele bucci

                Comment

                • boetsie
                  Senior Member
                  • Feb 2010
                  • 245

                  #83
                  Hi all,

                  For our Illumina HiSeq2000 we use the phiX spike-in. However, we see after demultiplexing that around 0.05% of the produced reads can align to the phiX genome. We now have a script that filters out the reads/pairs out that can align to the phiX genome (with Bowtie). This works ok, but we are wondering if there is an automated way to do this within CASAVA or if there is some flag within the fastQ header that represents if a read comes from the phiX genome?

                  Regards,
                  Boetsie

                  Comment

                  • tahamasoodi
                    Success
                    • May 2012
                    • 130

                    #84
                    I am using CASAVA 1.8.2 on a separate machine and trying to convert bcl files generated by Hiseq 2000 to fastq but I am getting an error message that config.xml file does not exist at /usr/local/lib/CASAVA-1.8.2/perl/Casava/Demultiplex.pm line 111.
                    Can anyone please help me out from this issue.
                    Thanks
                    Thanks,

                    Comment

                    • sklages
                      Senior Member
                      • May 2008
                      • 628

                      #85
                      Originally posted by tahamasoodi View Post
                      I am using CASAVA 1.8.2 on a separate machine and trying to convert bcl files generated by Hiseq 2000 to fastq but I am getting an error message that config.xml file does not exist at /usr/local/lib/CASAVA-1.8.2/perl/Casava/Demultiplex.pm line 111.
                      Can anyone please help me out from this issue.
                      Thanks
                      Hmm, it tells you that there is no config.xml file found within the run directory you have supplied. What is the command line you used for bcl conversion? Do you have access to the whole run and all of its files?

                      Sven

                      Comment

                      • tahamasoodi
                        Success
                        • May 2012
                        • 130

                        #86
                        Hi Sklages,
                        Thanks a lot. I have used the following command
                        configureBclToFastq.pl --input-dir /home/tahashafi/NGS/illumina/Base_Calls/C1.1 --output-dir /home/tahashafi/Desktop/casava_example_dir --Sample-Sheet /home/tahashafi/NGS/illumina/Base_Calls/C1.1/SampleSheet.csv --mismatch=1 --force --use-bases-mask Y101,I7,Y101

                        No I don't have access to the whole run and its files. Actually I have copied only some bcl files from the instrument connected machine and now am trying to convert those files to fastq on a separate machine.
                        Thanks,

                        Comment

                        • tahamasoodi
                          Success
                          • May 2012
                          • 130

                          #87
                          Hi Sklages,
                          Thanks a lot. I have used the following command
                          configureBclToFastq.pl --input-dir /home/tahashafi/NGS/illumina/Base_Calls/C1.1 --output-dir /home/tahashafi/Desktop/casava_example_dir --Sample-Sheet /home/tahashafi/NGS/illumina/Base_Calls/C1.1/SampleSheet.csv --mismatch=1 --force --use-bases-mask Y101,I7,Y101

                          No I don't have access to the whole run and its files. Actually I have copied only some bcl files from the instrument connected machine and now am trying to convert those files to fastq on a separate machine.
                          Thanks,

                          Comment

                          • sklages
                            Senior Member
                            • May 2008
                            • 628

                            #88
                            Originally posted by tahamasoodi View Post
                            Hi Sklages,
                            Thanks a lot. I have used the following command
                            configureBclToFastq.pl --input-dir /home/tahashafi/NGS/illumina/Base_Calls/C1.1 --output-dir /home/tahashafi/Desktop/casava_example_dir --Sample-Sheet /home/tahashafi/NGS/illumina/Base_Calls/C1.1/SampleSheet.csv --mismatch=1 --force --use-bases-mask Y101,I7,Y101

                            No I don't have access to the whole run and its files. Actually I have copied only some bcl files from the instrument connected machine and now am trying to convert those files to fastq on a separate machine.
                            Well that's not enough. The software needs more than just the BCL files; e.g. it also needs config.xml (among others). You usually convert the whole flowcell from BCL to fastq as most people want to have fastq files at the end. So if you generated the data by yourself just copy the whole run and do the conversion or run the conversion on the machine where the complete data resides. If you have gotten the data from a sequencing provider just ask them to do the converision for you (including demultiplexing).


                            hth, Sven

                            Comment

                            • tahamasoodi
                              Success
                              • May 2012
                              • 130

                              #89
                              Originally posted by sklages View Post
                              Well that's not enough. The software needs more than just the BCL files; e.g. it also needs config.xml (among others). You usually convert the whole flowcell from BCL to fastq as most people want to have fastq files at the end. So if you generated the data by yourself just copy the whole run and do the conversion or run the conversion on the machine where the complete data resides. If you have gotten the data from a sequencing provider just ask them to do the converision for you (including demultiplexing).


                              hth, Sven

                              Thanks. Can I run it on a separate machine while connecting that machine via LAN to the the instrument machine where I have the whole flowcell data as it takes long time copying from the one machine to another?
                              Thanks,

                              Comment

                              • sklages
                                Senior Member
                                • May 2008
                                • 628

                                #90
                                Originally posted by tahamasoodi View Post
                                Thanks. Can I run it on a separate machine while connecting that machine via LAN to the the instrument machine where I have the whole flowcell data as it takes long time copying from the one machine to another?
                                Yes, that's possible. At least with NFS. Keep in mind that this work slower as for local storage as the whole data needs to be read.

                                Let us know if it worked for you.

                                Sven

                                Comment

                                Latest Articles

                                Collapse

                                • SEQadmin2
                                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                  by SEQadmin2


                                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                  ...
                                  06-02-2026, 10:05 AM
                                • SEQadmin2
                                  Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                  by SEQadmin2


                                  With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                  Introduction

                                  Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                  05-22-2026, 06:42 AM
                                • SEQadmin2
                                  Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                  by SEQadmin2

                                  Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                  Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                  05-06-2026, 09:04 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by SEQadmin2, 06-02-2026, 12:03 PM
                                0 responses
                                19 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-02-2026, 11:40 AM
                                0 responses
                                14 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 05-28-2026, 11:40 AM
                                0 responses
                                29 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 05-26-2026, 10:12 AM
                                0 responses
                                31 views
                                0 reactions
                                Last Post SEQadmin2  
                                Working...