Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • position of errors in solid read

    Does anybody know how to determine the positions of errors (in alignments, after mapping reads to a reference) in solid color space reads ?

    I couldn't figure out the "position_error tool" mentioned in the BioScope manual...

    Many thanks for help!

  • #2
    I use the dqc tool in the dna package (sourceforge). Pass your bam, the ref genome and it will dump the mismatches across the read length of the reads. Then you can plot the
    results with plot_dqc_postalignqc.R (I use gnuplot).
    -drd

    Comment


    • #3
      Thanks for your response.

      Is it right that the dqc tool requires bfast?
      I had a quick look at the code in the package and it seems that it needs the bfast libariries which I don't have and didn't use so far.

      Or, is there an independant version of dqc?

      I refer to

      Comment


      • #4
        Originally posted by DNAjunk View Post
        Thanks for your response.

        Is it right that the dqc tool requires bfast?
        I had a quick look at the code in the package and it seems that it needs the bfast libariries which I don't have and didn't use so far.

        Or, is there an independant version of dqc?

        I refer to
        http://dnaa.git.sourceforge.net/git/...148da9;hb=HEAD
        It does require BFAST to read in the reference etc.

        Comment


        • #5
          Oh interesting!
          I wasn't aware of a 3rd party tool that looks at position errors.

          is the output like this ? i.e. same as bioscope?
          Code:
          # Generated by: CountSamPositionErrors(7_F3.csfasta.ma.bam)
          # Version: Bioscope version: bioscope-v1.2-rBS120SRN_47044_20100429153132  
          # Date: 07/15/2010 10:45 AM
          #? Sampling period = 1
          #? Total read positions = 617183400
          #? Missing color calls = 56487
          ##position_[refCall][readCall]	nErrors	nReadOccurrences	Error frequency
          1_AC	9643	3521908	0.0027
          1_AG	9838	3966595	0.0025
          1_AT	5180	2258191	0.0023
          1_CA	79142	2556366	0.0310
          1_CG	15241	3966595	0.0038
          1_CT	11537	2258191	0.0051
          1_GA	107425	2556366	0.0420
          1_GC	56802	3521908	0.0161
          1_GT	14756	2258191	0.0065
          1_TA	54086	2556366	0.0212
          1_TC	27256	3521908	0.0077
          1_TG	22387	3966595	0.0056
          2_01	12029	2902352	0.0041
          http://kevin-gattaca.blogspot.com/

          Comment


          • #6
            Yes,

            -drd

            Comment


            • #7
              KevinLam:
              How can one generate the file you have posted previously?
              I am using BioScope command line vs 1.2

              # Generated by: CountSamPositionErrors(7_F3.csfasta.ma.bam)
              # Version: Bioscope version: bioscope-v1.2-rBS120SRN_47044_20100429153132
              # Date: 07/15/2010 10:45 AM
              #? Sampling period = 1
              #? Total read positions = 617183400
              #? Missing color calls = 56487
              ##position_[refCall][readCall] nErrors nReadOccurrences Error frequency
              1_AC 9643 3521908 0.0027
              1_AG 9838 3966595 0.0025

              Comment


              • #8
                Originally posted by DNAjunk View Post
                KevinLam:
                How can one generate the file you have posted previously?
                I am using BioScope command line vs 1.2
                The file gets generated if you run the 'posErrors.ini' part of the pipeline.

                Comment


                • #9
                  Originally posted by DNAjunk View Post
                  KevinLam:
                  How can one generate the file you have posted previously?
                  I am using BioScope command line vs 1.2
                  in the GUI, you need to enable it in the advanced settings on the left.


                  in the examples/demo it looks like this

                  Code:
                  ####################################
                  ####################################
                  ##
                  ##  global parameters
                  ##
                  import ../globals/global.ini
                  primer.set = F3,R3
                  reference=${reference.dir}/DH10B_WithDup_FinalEdit_validated.fasta
                  
                  
                  ##********************************************
                  ##	position errors pipeline
                  ##********************************************
                  # Parameter specifies whether to run or not position errors pipeline. [1: to run, 0:to not run]
                  position.errors.run = 1
                  
                  # Location of BAM file from pairing plugin
                  pairing.bam.dir=${output.dir}/pairing/
                  
                  # Name of the BAM file for which position errors are to be calculated. If this key is missing a wild card search will be used. 
                  # If there is more than one .gff3 file in the input directory a position errors file will be generated for each one. (Optional)
                  position.errors.input.bam.file = ${output.dir}/pairing/R3-F3-Paired.bam
                  
                  # Position error output directory
                  position.errors.output.dir=${output.dir}/position-errors
                  
                  # Position error output file name
                  #position.errors.output.file=positionErrors.txt
                  http://kevin-gattaca.blogspot.com/

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM
                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  22 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  24 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  19 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  50 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X