Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Human whole-genome sequencing data analysis with low mapping rate

    Hi, everyone
    I got four samples' Human WGS data few days before to identify the variants as well as CNVs. After QC and mapping steps of my analysis workflow, I found each sample's mapping rate is in a vary low level which listed as follows:

    samples mapping rate

    sample1_H04C3ALXX_L4 57.62%
    sample1_H04C3ALXX_L5 8.67%
    sample1_H04C3ALXX_L6 13.68%
    sample1_H04C3ALXX_L7 26.78%
    sample1_H04C3ALXX_L8 28.19%

    sample2_H04C3ALXX_L1 2.49%
    sample2_H04C3ALXX_L2 2.17%
    sample2_H04C3ALXX_L3 31.80%
    sample2_H04C3ALXX_L4 32.57%
    sample2_H04C3ALXX_L5 31.81%
    sample2_H04C3ALXX_L6 31.63%
    sample2_H04C3ALXX_L7 31.87%
    sample2_H04C3ALXX_L8 31.81%

    sample3_H04B1ALXX_L3 4.36%
    sample3_H04B1ALXX_L4 59.36%
    sample3_H04B1ALXX_L5 2.49%
    sample3_H04B1ALXX_L6 3.21%

    sample4_H04C3ALXX_L5 27.06%
    sample4_H04C3ALXX_L6 26.67%
    sample4_H04C3ALXX_L7 27.52%
    sample4_H04C3ALXX_L8 27.79%
    sample4_H04C3ALXX_L1 14.82%
    sample4_H04C3ALXX_L2 13.96%
    sample4_H04C3ALXX_L3 24.75%
    sample4_H04C3ALXX_L4 24.75%

    The mapping software was BWA with its version 0.7.10-r789

    To figure out why so little rate was generated, I randomly picked 1000 unmaped reads and performed a blast analysis against nt library. Each read output a best hit result, and most aligned sequences are human clone fragments like:

    Human DNA sequence from clone RP3-376K6, complete sequence
    Homo sapiens Chromosome 16 BAC clone CIT987SK-A-926E7, complete sequence
    Homo sapiens chromosome 18, clone RP11-529J17, complete sequence
    Homo sapiens chromosome 18, clone CTD-2504O24, complete sequence
    ...

    So my question is :
    what are these sequences?(cds or genome seq?)
    Are my samples contaminated?


    what causes the extreme low mapping rate from sample
    sample2_H04C3ALXX_L1 2.49%
    sample2_H04C3ALXX_L2 2.17%
    sample3_H04B1ALXX_L5 2.49%
    sample3_H04B1ALXX_L6 3.21%
    , samples or software?

    Any comment will be greatly appreciated, thank you very much!
    Last edited by zinky; 11-05-2014, 05:43 AM.

  • #2
    It would help if you run FastQC and post the output, as well as your QC steps, and mapping command line. As it stands, the reason could be anything.

    Comment


    • #3
      I use NGS QC Toolkit to do QC, and the result shows that more than 80% of reads are high quality filtered reads. So I do the mapping step. My mapping commond lines are:
      bwa aln -t 5 genome.fa file_1.fastq > file_1.fastq.sai
      bwa aln -t 5 genome.fa file_2.fastq > file_2.fastq.sai
      bwa sampe -A -a 600 -r '@RG\tID:noID\tPL:ILLUMINA\tLB:noLB\tSM:"file"' genome file_1.fastq.sai file_2.fastq.sai file_1.fastq file_2.fastq > file.sam

      Comment


      • #4
        You may have short inserts and thus high adapter contamination. You can get an insert size distribution with BBMerge, like this:

        bbmerge.sh in1=file_1.fastq in2=file_2.fastq ihist=ihist.txt

        If a lot of reads have insert sizes shorter than read length, that will indicate adapter contamination which needs to be removed (e.g. with BBDuk).

        Also, I don't recommend bwa aln, particularly in recent versions of bwa. You will achieve higher speed and accuracy with bwa mem or BBMap, which can also generate some useful diagnostic plots (such as mhist).

        But I still recommend you post FastQC results.

        Comment


        • #5
          thanks for your suggestion,I have asked the sequence stuff and got insert size information : 350bp .so my parameter -a was set 600 to tolerate extra larger insert size aiming improve mapping rate. before that,i used fastQc to estimate reads quality either. the qc report was good,which suggested no index contamination(green kmer distribution and green overrepresent sequence)and high sequencing quality.
          ps:i don't know why mypictures can not be uploaded here.

          so i doubt whether the sample was mixed with none human-soured DNA as i metioned above(actually,i don't what they are).
          Also, i will try the tools you suggested,thanks Brain .

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Best Practices for Single-Cell Sequencing Analysis
            by seqadmin



            While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
            06-06-2024, 07:15 AM
          • seqadmin
            Latest Developments in Precision Medicine
            by seqadmin



            Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

            Somatic Genomics
            “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
            05-24-2024, 01:16 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 06-07-2024, 06:58 AM
          0 responses
          13 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 06-06-2024, 08:18 AM
          0 responses
          20 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 06-06-2024, 08:04 AM
          0 responses
          20 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 06-03-2024, 06:55 AM
          0 responses
          14 views
          0 likes
          Last Post seqadmin  
          Working...
          X