Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Assessing a 454 run?

    Hi all,

    We recently got data from a 1/8th of a 454 run. The read length shows
    the typical distribution that I have seen at various meetings (see
    attached image). However, how should I go about assessing the 'overall
    quality' of the run (if such a clear cut concept exists)...

    So far I have plotted the distribution of quality per base and the
    distribution of mean quality per read. Of course the qualities will
    never be 'perfect', but without any experience or any other reference,
    I don't know what kind of distributions I should be looking for. i.e.
    we see about 20% of all bases with a quality score below 20... is that
    a) as good as we are likely to get, b) not bad, c) woah! ask for 20%
    of your money back ;-)

    It would be great to get any feedback from the experience on the forum.


    Note that we do not have a reference genome to align the reads to, but
    we do have a reasonable coverage of the chloroplast DNA, and a
    reference for that (estimated 2-4 % chloroplast contamination by read,
    giving approximately 10x coverage). What is a good tool to identify
    SNPs between our read data and that reference? (If I can first
    identify the SNPs, I can then estimate the per base error rate using
    the reference).

    (Actually I found I can do this with MAQ, but I'll leave the question in in case there are alternative suggestions).


    Thanks very much for any information,
    Dan.

    Homepage: Dan Bolser
    MetaBase the database of biological databases.

  • #2
    you might try Gabor Marth's lab's tools (http://bioinformatics.bc.edu/marthlab/Main_Page) ... use Mosaik to align the reads to the reference, and GigaBayes (evolution of their polyBayes tool) to call SNPs from that alignment. In my recollection, it gives you some better control over whether you're looking for SNPs between homozygous or heterozygous individuals, many individuals, etc, and has sound statistical underpinnings to its algorithms.

    ~Joe

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Essential Discoveries and Tools in Epitranscriptomics
      by seqadmin




      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
      Yesterday, 07:01 AM
    • seqadmin
      Current Approaches to Protein Sequencing
      by seqadmin


      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
      04-04-2024, 04:25 PM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, 04-11-2024, 12:08 PM
    0 responses
    58 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 10:19 PM
    0 responses
    53 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 09:21 AM
    0 responses
    45 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-04-2024, 09:00 AM
    0 responses
    55 views
    0 likes
    Last Post seqadmin  
    Working...
    X