Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • tahamasoodi
    Success
    • May 2012
    • 130

    Illumina final result analysis

    Hi all,

    I received the final result of illumina data in xlsx file format containing around 3,768,494 SNPs, 10,557 nsSNPS, 535,826 indels, 474 coding indels and much more. May I know how to find which SNPs are significant as their number is enourmous? Is there any software for analysing this?

    Thanks.
    Thanks,
  • swNGS
    Member
    • Nov 2011
    • 83

    #2
    That's a really unhelpful sequencing core that you have there....
    You might find you get a better response if you are more specific about what you are trying to do...

    Comment

    • swbarnes2
      Senior Member
      • May 2008
      • 910

      #3
      There are programs where you can feed them SNP data, and they will at least tell you what amino acid changes the make.

      Off the top of my head, there's some ensembl variant predictor, a program called SNPeff, and a program called annovar. I use annovar on mouse SNPs, seems to work fine.

      Comment

      • tahamasoodi
        Success
        • May 2012
        • 130

        #4
        Hi swbarnes2,
        Thanks for your response, I tried SNPeff but it is accepting SVF format input files while my data is in xlsx file. When I tries annovar, it shows me the error message when I give any command starting with annovar.pl, I get the error message command not found. I tried many things but failed.
        Last edited by tahamasoodi; 11-03-2012, 04:56 AM.
        Thanks,

        Comment

        • ulz_peter
          Senior Member
          • Feb 2010
          • 219

          #5
          Are these SNPs annotated in any way (e.g.: Allele frequencies in 1000genomes project, Exome sequencing project, Prediction values of SIFT, Conservation Score, AminoAcid Change, gene affected)?
          IF yes, then that's something to start with.
          Filter out all common variants
          If there's a special region you interested in, take out only those SNPs,

          If not, get a annotation program running (I recommend annovar as well, but it needs a certain format of your input file, but since it is text-based you should be able to create that from the Excel file)

          If you can't get it done, you also might have a look here:


          Hope that helps

          Comment

          • tahamasoodi
            Success
            • May 2012
            • 130

            #6
            Thanks Peter,

            The excel file contains a number of fields as given below. I want to know the significant SNPs in the whole genome. Can I do it in excel itself or I have to use any tool for it? I tried to use annovar but i m getting an error in it.

            Regards,

            #chr_name chr_start chr_end ref_base alt_base hom_het snp_quality tot_depth
            chr10 61373 61373 A - hom 189 28
            chr10 62082 62082 G T het 52 33
            chr10 65878 65878 C G hom 31 3


            alt_depth region gene
            28 intergenic NONE(dist=NONE),TUBB8(dist=31455)
            11 intergenic NONE(dist=NONE),TUBB8(dist=30746)
            3 intergenic NONE(dist=NONE),TUBB8(dist=26950)

            dbSNP135_full dbSNP135_common 1000G_2011Oct_allele_freq
            rs9329307 . .
            rs2271275 rs2271275 0.55
            rs6901 rs6901 0.73

            annotation
            TUBB8:NM_177987:exon4:c.A314G.H105R,
            ADARB2:NM_018702:exon9:c.G1876A.A626T,
            PITRM1:NM_001242307:exon27:c.A3113G.Q1038R,PITRM1:NM_014889:exon27:c.A3110G.Q1037R,PITRM1:NM_001242309:exon24:c.A2816G.Q939R,
            Thanks,

            Comment

            • ulz_peter
              Senior Member
              • Feb 2010
              • 219

              #7
              what do you mean by significant SNPs?

              It seems that your SNPs are already annotated.
              So, in case you search for the cause of a rare disease you could limit yourself to SNPs having an allele frequence < 0.01 in 1000G_2011Oct_allele_freq and have no entry i n the dbSNP135_common fields and variants that are possibly deleterious (in your case it is stated in the annotation part, e.g..H105R)

              You could do that in Excel, but again,
              if you do not specify your problem we cannot specify the solution

              Comment

              • tahamasoodi
                Success
                • May 2012
                • 130

                #8
                Actually, I have around 80 samples of CRC patients and equal controls of whole genome and I got around 3,768,494 SNPs, 10,557 nsSNPS, 535,826 indels, 474 coding indels for one case sample and almost a similar figure for the controls. Now I want to know which SNPs/indels are responsible for the disease by filtering these huge number of SNPs. How can i give the filtering criteria? Can you give a full description of the annotations field?
                Last edited by tahamasoodi; 09-13-2012, 03:13 AM.
                Thanks,

                Comment

                • xied75
                  Senior Member
                  • Feb 2012
                  • 129

                  #9
                  I was just guessing that he might be feeding whatever programs you have mentioned with the excel file directly, other than creating new text files in a format that these programs can read. (But if I'm wrong, then ignore this.)

                  Best,

                  dong

                  Comment

                  • ulz_peter
                    Senior Member
                    • Feb 2010
                    • 219

                    #10
                    So you've got 160 Excel files each having about 4million entries?

                    I guess you'll need some programming here...
                    I don't know of any program which could compute significance of certain SNPs when they show up in a significant portion of samples. Maybe someone else can help here...

                    What you might do is filtering out the synonymous SNPs and SNPs showing higher allele frequencies just by using an Excel filter, but for 160 huge Excel files that may not be what you want.

                    Since I am in a good mood today I'm gonna explain you the flags:

                    chr_name: Name of the chromosome
                    chr_start: SNP position (starting point for in/dels)
                    chr_end : SNP position (end point for indels)
                    ref_base: human reference at that exact position
                    alt_base : base detected in your sample at that position
                    hom_het : whether the mutation showed up homozygus or heterozygous
                    snp_quality: a quality value of how likely it is, that your SNP is real or just a sequencing artifact (no idea about the scale they use for assigning the SNP quality value)
                    tot_depth: Sequencing depth at that position (i.e.: how many reads cover this position)
                    alt_depth: sequencing reads at that position that show the mutated allele
                    region: Obviously shows if that mutation lies within a gene/exon/intron or elsewhere
                    gene: gene affected
                    dbSNP135_full: dbSNP version 135 reference
                    dbSNP135_common: dbSNP version 135 reference in case that SNP has an allele frequency >1%
                    1000G_2011Oct_allele_freq: Allele frequency determined by the 1000Genomes (October 2011 version) project
                    annotation: nomenclature for the mutation- c.XXX is the cDNA position of the NM_xxx isoform and p.xxx is the protein substitution nomenclature for that mutation

                    Since I did not create the files I cannot guarantee that this is absolutely true, but these are the most likely explanations.

                    Best regards,
                    Peter

                    Comment

                    • ulz_peter
                      Senior Member
                      • Feb 2010
                      • 219

                      #11
                      Originally posted by xied75 View Post
                      I was just guessing that he might be feeding whatever programs you have mentioned with the excel file directly, other than creating new text files in a format that these programs can read. (But if I'm wrong, then ignore this.)

                      Best,

                      dong
                      That's what I am guessing too, however his files seem to be annotated already...

                      Comment

                      • tahamasoodi
                        Success
                        • May 2012
                        • 130

                        #12
                        If I select the particular genes involved in CRC, I think then excel filter can help in screening the deleterious SNPs.
                        Thanks,

                        Comment

                        • swbarnes2
                          Senior Member
                          • May 2008
                          • 910

                          #13
                          There is no perfect algorithm that goes from primary amino acid change -> functional effect. So you'll want to use a combo of programs ike polyPhen-2, pathway analysis, comparison to the 1K Genomes SNP set, stuff like that.

                          Comment

                          Latest Articles

                          Collapse

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by SEQadmin2, Yesterday, 10:09 AM
                          0 responses
                          10 views
                          0 reactions
                          Last Post SEQadmin2  
                          Started by SEQadmin2, 06-04-2026, 08:59 AM
                          0 responses
                          21 views
                          0 reactions
                          Last Post SEQadmin2  
                          Started by SEQadmin2, 06-02-2026, 12:03 PM
                          0 responses
                          27 views
                          0 reactions
                          Last Post SEQadmin2  
                          Started by SEQadmin2, 06-02-2026, 11:40 AM
                          0 responses
                          22 views
                          0 reactions
                          Last Post SEQadmin2  
                          Working...