Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem in exome data analysis

    Hi,
    I'm biology student doing my masters thesis in NGS. I have whole exome data from FIMM and have been filtering it. I have a excel table with the SNP's and .bam files for visualizing.
    The case is a rare cartilage syndrome (only one patient) and I have exome data from the one patient, his mom, and three controls. I have filtered the patients exome data using the other datas, and restricted the SNP's to ones not found in public databases. To further narrow the search down, I also dis-included ones not located in known genes... Now I still have ~400 SNP's left. I already manually picked out and looked at SNP's in genes I know to be important in cartilage (collagens etc.). However, I would like to continue the analysis with rest of the 400 SNP's...

    I would like to do some functional annotation, so that i could only look at the non synonymous mutations. How can I do this when I don't have vcf-files?

  • #2
    You can use Annovar : http://www.openbioinformatics.org/annovar/, you don't need a VCF file to do so.

    Have you done a filter on the read counts ?

    Comment


    • #3
      With just 400 candidates to go, you could also use the ENSEMBL variant effect predictor.

      Maybe it is possible to contact the person who did the SNP-calling for the VCFs? Or head over to the GATK web site, read their best practice for exomes recommendation and do your own, since you have the bam files (careful, this might be a painful process).

      Comment


      • #4
        Jane M and Baseless thank you for your answers!
        The data has been filtered on the read counts.
        I didn't know that annovar doesn't require vcf files, so this has been useful information and i have to find out more about annovar. The only problem with ENSEMBL variant effect predictor seems to be that I have to check the +/- (forward/reverse) information from ensembl individually for each gene (?).

        I did ask about the vcf file from the company we bought the exome sequencing but they only offered to do the bioinformatics for me for the right price of course. So it seems the only way I could get VCF file is by GATK. I suppose I try the Annovar or variant effect predictor first.

        Comment


        • #5
          Sini,

          if you dont want to pay in money, look at this thread:
          Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


          I went through what Ulz_Peter summarizes there on my own last year and I whish he had published this at the time, it would have saved a lot of time and nerves.

          Its a good explanation how to setup GATK and call your own, but be warned, I am also only a biologist and setting these things up was not exactly a trivial task. You need access to a computer with decent RAM, disk and a bit of skill for setting the program up and fixing some prerequisites first.

          Comment


          • #6
            You can blast your exom data and get GO annotations. BLAST2GO is an easy to use software for this.
            Considering you are working on humans it should not be a problem finding out where your SNPs lay. Check out the Galaxy portal also. They have some video tutorials which use human exom data. https://main.g2.bx.psu.edu/root
            Using BLAST2go you can uses tblastx against the nr data base which will do a a 6 way reading frame translation inorder to identify the reading frame. You can also use OrfPredictor to ID the open reading frame of your exom: http://www.ncbi.nlm.nih.gov/pubmed/15980561
            Last edited by JackieBadger; 06-14-2012, 05:05 AM.

            Comment


            • #7
              ANNOVAR is very easy to use. For SNPs, your input file format is:

              chr1 1234 1234 A G

              chromosome base base reference_base variant_base

              all tab-delimited

              Then look into using VAAST for further analysis (I love VAAST).

              Comment


              • #8
                If you have your data in vcf format you could also give GeneTalk a try. It basically uses annovar for functional annotations. Besides you can also filter for pathogenic variants annotated by the expert community of GeneTalk (www.gene-talk.de). http://www.youtube.com/watch?v=z1TqiXP-gEo

                Comment


                • #9
                  I think I'll get the Annovar installed to my computer and learn to use that. It'll probably come handy in future also. VAAST is a new program to me, but it seems interesting and I'll definitely look into it.
                  Thank you all again for your answers! I have learnt a lot.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM
                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  27 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  31 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  27 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  52 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X