Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • base composition and base calling

    I've read that the illumina basecalling software has problems calibrating itself if the base composition in the first few bases of the reads isn't a roughly equal mix of nucleotides. We're thinking of sequencing constructs that begin with our own barcode and were wondering what the parameters are for correct base calling:

    - how many positions are used to calibrate?
    - what are the bounds on acceptable nucleotide mixture? eg, how far off from 25% each can you be?
    - I believe you can calibrate on something other than the first four bases. How far into the read can you wait to calibrate?
    - can different lanes in a run be calibrated differently? eg, if our sample is one lane of a run, does that make this easier or harder for the sequencing facilitiy?
    - does any of this vary between the GAII and HiSeq?

    Thanks!
    Alex

  • #2
    Hi Alex,

    To my knowledge the Illumina pipeline performs its crosstalk matrix and phasing/prephasing calibration during the first 4 cycles by default, and this can be altered with --matrix-cycles=n. Similar to using many cycles for cluster detection this will probably mean that the workstation PC will need to store more Images until the intitial calibration calculations are done, at which point the real-time data analysis will start. Using lots of cycles will cause a back-log on the workstation, but this should be manageable for at least 10 or so cycles I would think (at elast on a GA, not so sure about the HiSeq as it generates so much more data).

    You can avoid these problems by specifying a control-lane with a relatively normal base composition (--control-lane=..), such as a lane of PhiX or whole genome shotgun sequencing. Alternatively it is also possible not to perform calibration on the sample and use a pre-formatted calibration table (probably slightly different ones for GA and HiSeq).

    Something else you should consider is that you might potentially lose a certain amount of data because the cluster detection does not work normally if you have low-diversity at the start of sequences, and this is completely independent of a skewed base composition. This depends mainly on the number of barcodes you have in your sample, and the cluster density. In summary, the fewer barcodes and the higher you cluster density the more data you are likely going to lose. Please refer to this post for more information (http://seqanswers.com/forums/showthr...light=bareback), or send me an email if you have any further questions.

    Comment


    • #3
      Thank you! This plus your paper is very helpful.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Recent Advances in Sequencing Analysis Tools
        by seqadmin


        The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
        Today, 07:48 AM
      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin




        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
        04-22-2024, 07:01 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Today, 07:17 AM
      0 responses
      11 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 05-02-2024, 08:06 AM
      0 responses
      19 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-30-2024, 12:17 PM
      0 responses
      20 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-29-2024, 10:49 AM
      0 responses
      28 views
      0 likes
      Last Post seqadmin  
      Working...
      X