Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    @choishingwan ,

    Thanks for your prompt reply. I would like to tell you that I am using the version 2.3 not the 2.7 one so is it wrong to use the DBSNP_137.vcf file with that? I hope this should not be a problem for the GATK toolkit to work with the latest DBSNP file right as this is just the upated data set also in a step I see you have used the print reads command and it is usually done when you merge all you samples together for and generate a single bam file and then try to call the SNP and INDEL but since am just developing a new pipeline with a test sample from the 1000 G project I have only one sample ( paired-end) where am trying to understand each step of the pipeline and so that once I have my samples I can use that pipeline on them. Please can you clarify the the doubts about the print reads step or can I use the below script when am not merging all the samples together and use Table Recalibrator

    ava -Xmx14g -jar /data/PGP/gmelloni/GenomeAnalysisTK-2.3-4-g57ea19f/GenomeAnalysisTK.jar -T TableRecalibration -R /scratch/GT/vdas/test_exome/exome/hg19.fa -I SRR062634.realigned.bam -o SRR062634.realigned.bam.recal.bam -BQSR SRR062634.realigned.recal.csv -S LENIENT

    Comment


    • #17
      There's definitely no problem in using the latest dbsnp file with the old gatk pipeline. As for the print read problem, I think that's when the version different take place. In my case, in the latest gatk program, the recalibrate will not produce a new bam file as with 2.3, instead it produce a file to use for printread to print the recalibrated file. As for your question regarding merging the file. From my experience, it is unnecessary as you can input multiple files for unifiedgenotyper using multiple -I. Though I think it is advised to call all samples together ( not sure if that's the case if the capture kit are different / the sequencing platform for each sample is different, hope someone can answer that for me)


      Side note, if you are doing exome sequencing, you should get the bed file from the company stating the capture area and you won't need to make one yourself.

      Comment


      • #18
        @choishingwan

        I guess then I can use the option of TableRecalibration, I am yet to run it , if it produces a bam file good enough else I will use the print read option which will surely produce a bam file. As for multiple sample with merged bam file, I will be getting my samples say (4 samples are sequenced in one lane) then I guess I might have to use the merged bam file concept or might simply work with individual samples and during the unified genotyper I can use all of them together (what do you think). My sample for project will be 2 Tumor and 2 IPSC from the same 2 tumors. So it is 2 tumors from 2 different patients and their respective IPSC lines and I will try to work it out to find the SNP INDELs SNVs that will confer that the mutations between tumor and IPSC are same so that they have same genetic background but then in that case during the SNP INDEL calling using unified genotyper is it advisable to use all the recaliberated bam files from 4 samples to call together and get the SNP and INDELS right? Any thoughts on this?

        Bedfile query:
        This is what I was worried about regarding the bed file, as for the test analysis I am doing I can recreate a bed file myself from the genome browser for testing my analysis? Let me know your thoughts on this.

        Comment


        • #19
          From my memory, you should be able to get a bam file from the TableRecalibration. From what I understand, you only have to merge samples (as in, really merge samples into one single bam file) when you sequence the same sample on multiple lane. You can refer to the following:
          http://gatkforums.broadinstitute.org...he-same-sample

          And also the best practice link that I have previously attached. According to my labmates, they said merging the bam file made it difficult to analyse downstream as it will be a huge file (e.g. 4 6GB bam merged will become ~24GB). All in all, I'd really recommend you reading the best practice guide from GATK, they cover most stuff you need to know.

          As for the bed file, it is just for removing SNPs in areas that shouldn't be captured by the kit. So most of the time, you should focus only on the capture areas. However, if you don't have that information and couldn't get it from the people, then you can consider using only the exome part from the UCSC. I have never tried that but I guess you can, my recommendation is to always get the capture information from the people and make sure you know what you are working with.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Exploring the Dynamics of the Tumor Microenvironment
            by seqadmin




            The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
            07-08-2024, 03:19 PM
          • seqadmin
            Exploring Human Diversity Through Large-Scale Omics
            by seqadmin


            In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
            06-25-2024, 06:43 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 07-19-2024, 07:20 AM
          0 responses
          32 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 07-16-2024, 05:49 AM
          0 responses
          44 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 07-15-2024, 06:53 AM
          0 responses
          54 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 07-10-2024, 07:30 AM
          0 responses
          43 views
          0 likes
          Last Post seqadmin  
          Working...
          X