Header Leaderboard Ad

Collapse

Combine 1000genomes bams to get better coverage?

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Combine 1000genomes bams to get better coverage?

    Hi all,

    I downloaded the bams from this 1000genomes ftp site:

    ftp://ftp.1000genomes.ebi.ac.uk/vol1...878/alignment/

    I only used the illumina data for my application. I found that the illumina data was about 20x which was not good enough for my application. I noticed that there are also bams from 454 and SoLid. Can I use samtools merge to get a combined bam such that I can get better overall coverage???

    Thanks!

    PS I am not sure if doing this will give me enough coverage even if successful. Does anyone know other places I can download high coverage human fastqs or bams?

  • #2
    It seems like Broad Institute has bams for NA12878 at 40x internally. Is this data available to outsiders?

    Comment


    • #3
      What are you trying to achieve. For variant calling many callers can consider more than one bam at once ?

      Comment


      • #4
        Originally posted by laura View Post
        What are you trying to achieve. For variant calling many callers can consider more than one bam at once ?
        I am trying the now unsupported HLA Caller form the GATK package.

        Supposedly you should get the following HLA calls if you use NA12878.bam from Broad and human_b36_both.fasta:
        ===============================================
        Locus A1 A2 Geno Phase Frq1 Frq2 L Prob Reads1 Reads2 Locus EXP White Black Asian
        A 0101 1101 -1229.5 -15.2 -0.82 -0.73 -1244.7 1.00 180 191 229 1.62 -1.99 -3.13 -2.07
        B 0801 5601 -832.3 -37.3 -1.01 -2.15 -872.1 1.00 58 59 100 1.17 -3.31 -4.10 -3.95
        C 0102 0701 -1344.8 -37.5 -0.87 -0.86 -1384.2 1.00 91 139 228 1.01 -2.35 -2.95 -2.31
        DPA1 0103 0201 -842.1 -1.8 -0.12 -0.79 -846.7 1.00 72 48 120 1.00 -0.90 -INF -1.27
        DPB1 0401 1401 -991.5 -18.4 -0.45 -1.55 -1010.7 1.00 64 48 113 0.99 -2.24 -3.14 -2.64
        DQA1 0101 0501 -1077.5 -15.9 -0.90 -0.62 -1095.4 1.00 160 77 247 0.96 -1.53 -1.60 -1.87
        DQB1 0201 0501 -709.6 -18.6 -0.77 -0.76 -729.7 0.95 50 87 137 1.00 -1.76 -1.54 -2.23
        DRB1 0101 0301 -1513.8 -317.3 -1.06 -0.94 -1832.6 1.00 52 32 101 0.83 -1.99 -2.83 -2.34
        ==============================================

        But if I use the aforementioned three bams and human_g1k_v37.fasta with updated HLA_EXONS.intervals, HLA_DICTIONARY.txt and HLA_POLYMORPHIC_SITES.txt, I got

        =============================================
        Locus A1 A2 Geno Phase Frq1 Frq2 L Prob Reads1 Reads2 Locus EXP White Black Asian
        A 0101 1104 -1133.2 -40.7 -0.82 -6.00 -1173.9 1.00 133 138 177 1.53 -6.82 -7.31 -7.34
        B 0820 5601 -1156.2 -43.5 -6.00 -2.15 -1201.4 1.00 62 71 111 1.20 -8.30 -8.70 -8.15
        C 0102 0701 -1718.5 -150.9 -0.87 -0.86 -1871.5 1.00 46 106 155 0.98 -2.35 -2.95 -2.31
        DPA1 0103 0201 -1443.8 -4.8 -0.12 -0.79 -1451.4 1.00 43 19 62 1.00 -0.90 -INF -1.27
        DPB1 0401 1401 -1102.9 -35.2 -0.45 -1.55 -1139.0 1.00 41 9 52 0.96 -2.24 -3.14 -2.64
        DQA1 0105 0501 -1549.3 -26.2 -1.24 -0.62 -1582.4 1.00 145 57 202 1.00 -2.62 -1.94 -2.72
        DQB1 0203 0501 -1266.4 -145.1 -2.05 -0.76 -1413.4 1.00 33 73 127 0.83 -3.68 -2.80 -3.82
        DRB1 0101 0301 -1683.0 -279.3 -1.06 -0.94 -1965.9 0.83 20 41 96 0.64 -1.99 -2.83 -2.34
        DRB1 0120 0301 -1678.8 -279.3 -6.00 -0.94 -1963.3 0.17 20 41 96 0.64 -6.94 -7.15 -7.00
        ========================================

        The result is close but not exactly. I suspect the reason might be the Broad NA12878.bam is 40x but the combined bam I used is about 35x
        Last edited by ymc; 04-22-2012, 10:38 PM.

        Comment


        • #5
          hi, ymc

          I also try sth. about HLA caller. I want to ask you a question. You say you have updated the file HLA_DICTIONARY.txt. How to get an updated HLA_DICTIONARY.txt? I find all the alleles sequences in the primary HLA_DICTIONARY.txt have the same length, but in the IGMT/HLA database the alleles' lengths are actually different. How to do that?

          Thanks.

          Comment


          • #6
            Originally posted by glede View Post
            hi, ymc

            I also try sth. about HLA caller. I want to ask you a question. You say you have updated the file HLA_DICTIONARY.txt. How to get an updated HLA_DICTIONARY.txt? I find all the alleles sequences in the primary HLA_DICTIONARY.txt have the same length, but in the IGMT/HLA database the alleles' lengths are actually different. How to do that?

            Thanks.
            I only updated the positions. I don't know if the allele sequences also need to be updated.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              How RNA-Seq is Transforming Cancer Studies
              by seqadmin



              Cancer research has been transformed through numerous molecular techniques, with RNA sequencing (RNA-seq) playing a crucial role in understanding the complexity of the disease. Maša Ivin, Ph.D., Scientific Writer at Lexogen, and Yvonne Goepel Ph.D., Product Manager at Lexogen, remarked that “The high-throughput nature of RNA-seq allows for rapid profiling and deep exploration of the transcriptome.” They emphasized its indispensable role in cancer research, aiding in biomarker...
              09-07-2023, 11:15 PM
            • seqadmin
              Methods for Investigating the Transcriptome
              by seqadmin




              Ribonucleic acid (RNA) represents a range of diverse molecules that play a crucial role in many cellular processes. From serving as a protein template to regulating genes, the complex processes involving RNA make it a focal point of study for many scientists. This article will spotlight various methods scientists have developed to investigate different RNA subtypes and the broader transcriptome.

              Whole Transcriptome RNA-seq
              Whole transcriptome sequencing...
              08-31-2023, 11:07 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 09-22-2023, 09:05 AM
            0 responses
            20 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 09-21-2023, 06:18 AM
            0 responses
            14 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 09-20-2023, 09:17 AM
            0 responses
            14 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 09-19-2023, 09:23 AM
            0 responses
            29 views
            0 likes
            Last Post seqadmin  
            Working...
            X