Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BWA infered external isize ?

    hello,everyone. I am a newer to Chip-seq analysis and have a question about BWA. I analyzed my paired-end data with BWA and wanted to use MACS when doing peak calling. I have asked the authors of MACS about the input format for paired-end read, and he answered me as following:
    "The implementation of SAM/BAM support follows the
    previous ELANDMULTIPET implementation, hence fragments are extended
    according to the MACS model (and not to the PE distance). We can think
    about including a correct PE estimate, but I guess you can properly
    deal with PE data just feeding macs with half of the average PE
    distance in the shiftsize, i.e. using the options:
    --nomodel, --shiftsize=PE/2
    You should already have the PE distance estimate from your aligner."

    I didn't know the PE distance and he told me that the information was on screen when doing BWA. I repeat BWA step and get the on-screen output. But because BWA processed 256k read pairs in a batch, there was so many information. I still couldn't find what the PE exactly is! Please help me, thanks very very much.
    Is the PE the average of "inferred external isize"?


    The BWA stderr:
    bwa_sai2sam_pe_core] convert to sequence coordinate...
    [infer_isize] (25, 50, 75) percentile: (350, 367, 383)
    [infer_isize] low and high boundaries: 284 and 449 for estimating avg and std
    [infer_isize] inferred external isize from 11191 pairs: 371.316 +/- 20.340
    [infer_isize] skewness: -0.213; kurtosis: 0.617; ap_prior: 3.49e-05
    [infer_isize] inferred maximum insert size: 507 (6.67 sigma)
    [bwa_sai2sam_pe_core] time elapses: 1.93 sec
    [bwa_sai2sam_pe_core] changing coordinates of 1273 alignments.
    [bwa_sai2sam_pe_core] align unmapped mate...
    [bwa_paired_sw] 1823 out of 3250 Q17 singletons are mated.
    [bwa_paired_sw] 40 out of 354 Q17 discordant pairs are fixed.
    [bwa_sai2sam_pe_core] time elapses: 0.79 sec
    [bwa_sai2sam_pe_core] refine gapped alignments... 0.21 sec
    [bwa_sai2sam_pe_core] print alignments... 1.67 sec
    [bwa_sai2sam_pe_core] 262144 sequences have been processed.
    [bwa_sai2sam_pe_core] convert to sequence coordinate...
    [infer_isize] (25, 50, 75) percentile: (349, 366, 383)
    [infer_isize] low and high boundaries: 281 and 451 for estimating avg and std
    [infer_isize] inferred external isize from 10673 pairs: 370.950 +/- 20.597
    [infer_isize] skewness: -0.258; kurtosis: 0.748; ap_prior: 2.58e-05
    [infer_isize] inferred maximum insert size: 508 (6.67 sigma)
    [bwa_sai2sam_pe_core] time elapses: 1.78 sec
    [bwa_sai2sam_pe_core] changing coordinates of 1206 alignments.
    [bwa_sai2sam_pe_core] align unmapped mate...
    [bwa_paired_sw] 1989 out of 3409 Q17 singletons are mated.
    [bwa_paired_sw] 45 out of 297 Q17 discordant pairs are fixed.
    [bwa_sai2sam_pe_core] time elapses: 0.83 sec
    [bwa_sai2sam_pe_core] refine gapped alignments... 0.20 sec
    [bwa_sai2sam_pe_core] print alignments... 1.40 sec
    [bwa_sai2sam_pe_core] 524288 sequences have been processed.
    [bwa_sai2sam_pe_core] convert to sequence coordinate...
    [infer_isize] (25, 50, 75) percentile: (351, 367, 383)
    [infer_isize] low and high boundaries: 287 and 447 for estimating avg and std
    [infer_isize] inferred external isize from 10731 pairs: 371.640 +/- 20.047
    [infer_isize] skewness: -0.169; kurtosis: 0.458; ap_prior: 2.90e-05
    [infer_isize] inferred maximum insert size: 505 (6.67 sigma)
    [bwa_sai2sam_pe_core] time elapses: 1.79 sec
    [bwa_sai2sam_pe_core] changing coordinates of 1230 alignments.
    [bwa_sai2sam_pe_core] align unmapped mate...
    [bwa_paired_sw] 1980 out of 3295 Q17 singletons are mated.
    [bwa_paired_sw] 40 out of 323 Q17 discordant pairs are fixed.
    [bwa_sai2sam_pe_core] time elapses: 0.80 sec
    [bwa_sai2sam_pe_core] refine gapped alignments... 0.22 sec
    [bwa_sai2sam_pe_core] print alignments... 1.47 sec
    [bwa_sai2sam_pe_core] 786432 sequences have been processed.

  • #2
    It is here:

    Code:
    ..
    [infer_isize] (25, 50, 75) percentile: (351, 367, 383)
    [infer_isize] low and high boundaries: 287 and 447 for estimating avg and std
    [infer_isize] inferred external isize from 10731 pairs: 371.640 +/- 20.047
    [infer_isize] skewness: -0.169; kurtosis: 0.458; ap_prior: 2.90e-05
    [infer_isize] inferred maximum insert size: 505 (6.67 sigma)
    ..
    If you prefer, you can redirect the output of your processes to a file and
    grab whatever info you need.
    -drd

    Comment


    • #3
      Thanks very much. As you said, should I approximately use 185 (371.64/2) as parameter of --shiftsize, because the inferred external isize is 371,64 in average?


      Originally posted by drio View Post
      It is here:

      Code:
      ..
      [infer_isize] (25, 50, 75) percentile: (351, 367, 383)
      [infer_isize] low and high boundaries: 287 and 447 for estimating avg and std
      [infer_isize] inferred external isize from 10731 pairs: 371.640 +/- 20.047
      [infer_isize] skewness: -0.169; kurtosis: 0.458; ap_prior: 2.90e-05
      [infer_isize] inferred maximum insert size: 505 (6.67 sigma)
      ..
      If you prefer, you can redirect the output of your processes to a file and
      grab whatever info you need.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin




        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
        04-22-2024, 07:01 AM
      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Today, 11:49 AM
      0 responses
      10 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, Yesterday, 08:47 AM
      0 responses
      16 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      61 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      60 views
      0 likes
      Last Post seqadmin  
      Working...
      X