Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to deal with multi-sample NGS data?

    Hello, everyone, i'm a fresh guy to NGS my boss gave me the fq files of about 180 different samples. and i need to do alignment and snp/indels calling for these samples. i need help on 2 questions:

    1. Is there any way more efficient to do the alignment and variants calling for those 180 samples, or should i analysis them one by one?

    2. each sample may produce a single vcf file. and how can i combine the calling of 180 samples and get the frequency for each snp?

    Please help, thanks a lot

  • #2
    You can use the methods:
    1.samtools mpileup function http://samtools.sourceforge.net/mpileup.shtml
    2.SOAPSNP. you can write email to require the mulitple individual version.

    You can analyze the 180 samples together then the software can create only one combined vcf file.

    Comment


    • #3
      Originally posted by zhanglu295 View Post
      You can use the methods:
      1.samtools mpileup function http://samtools.sourceforge.net/mpileup.shtml
      2.SOAPSNP. you can write email to require the mulitple individual version.

      You can analyze the 180 samples together then the software can create only one combined vcf file.
      Oh, that's great, i'll try as your advice. Thank you very much

      Comment


      • #4
        You can also use the GATK.

        In case you're new to the whole thing: I recommend making templates for each of the analysis steps you take, and then run some script to replace the placeholders with your sampleinfo. I do hope you have some compute power at your disposal, 180 samples may take a while to analyse :P

        I also recommend to incorporate some decent logging so you can easily find where things went wrong. Include versioning (version of the tool used, but also version of your complete analysis pipeline), that will help too. It might be a bigger job than you expect! Also think about how you would like to structure files and directories in advance and which intermediate files are worth keeping or not.

        In case you're not new to the whole thing: perhaps it helps others

        Comment


        • #5
          Has anyone looked at using generic databases for this kind of question ? It seems a lot of people are doing large scale exon or whole genome analysis these days.

          Comment


          • #6
            Originally posted by Bruins View Post
            You can also use the GATK.

            In case you're new to the whole thing: I recommend making templates for each of the analysis steps you take, and then run some script to replace the placeholders with your sampleinfo. I do hope you have some compute power at your disposal, 180 samples may take a while to analyse :P

            I also recommend to incorporate some decent logging so you can easily find where things went wrong. Include versioning (version of the tool used, but also version of your complete analysis pipeline), that will help too. It might be a bigger job than you expect! Also think about how you would like to structure files and directories in advance and which intermediate files are worth keeping or not.

            In case you're not new to the whole thing: perhaps it helps others
            Thanks, the suggestion is very very useful. I have the unforgettable experience of pipeline debug. That really cost me plenty of time

            Comment


            • #7
              If those fq's are from mammalian samples, the alignment alone is going to take forever.

              It would be worth it to spend some time asking around and looking around yourself if someone has already done the alignments. If if takes you a week to find them, you will probably save yourself a lot of time.

              And yes, you can give a pile of .bams to samtools' mpileup command, and it will give you a combined .vcf file. The lines look something like this (with 11 samples, all on one line, of course):

              chr3 23987415 . A C 999 . DP=6832;AF1=0.475;CI95=0.2727,0.6364;DP4=315,231,3814,1978;MQ=37;FQ=28.2;PV4=0.00017,0,1.5e-147,1
              GT:PL:GQ
              0/0:0,107,0:3
              1/1:76,255,0:75
              1/1:112,255,0:99
              1/1:71,255,0:70
              0/0:0,172,29:30
              0/0:0,181,18:19
              0/0:0,158,43:44
              0/0:0,188,12:13
              1/1:16,255,0:15
              1/1:5,236,0:6
              0/0:0,224,16:17

              Learning what all that means is a whole other project.

              Comment


              • #8
                Originally posted by swbarnes2 View Post
                If those fq's are from mammalian samples, the alignment alone is going to take forever.

                It would be worth it to spend some time asking around and looking around yourself if someone has already done the alignments. If if takes you a week to find them, you will probably save yourself a lot of time.

                And yes, you can give a pile of .bams to samtools' mpileup command, and it will give you a combined .vcf file. The lines look something like this (with 11 samples, all on one line, of course):

                chr3 23987415 . A C 999 . DP=6832;AF1=0.475;CI95=0.2727,0.6364;DP4=315,231,3814,1978;MQ=37;FQ=28.2;PV4=0.00017,0,1.5e-147,1
                GT:PL:GQ
                0/0:0,107,0:3
                1/1:76,255,0:75
                1/1:112,255,0:99
                1/1:71,255,0:70
                0/0:0,172,29:30
                0/0:0,181,18:19
                0/0:0,158,43:44
                0/0:0,188,12:13
                1/1:16,255,0:15
                1/1:5,236,0:6
                0/0:0,224,16:17

                Learning what all that means is a whole other project.
                i'm afraid i have to do the alignment by myself. This information is really wonderful, we need those frequency data to conduct follow up genotyping in larger samples. does the genotype order is the same as the inputted bams?

                Thanks

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Recent Advances in Sequencing Analysis Tools
                  by seqadmin


                  The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                  05-06-2024, 07:48 AM
                • seqadmin
                  Essential Discoveries and Tools in Epitranscriptomics
                  by seqadmin




                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                  04-22-2024, 07:01 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 06:57 AM
                0 responses
                12 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 05-06-2024, 07:17 AM
                0 responses
                16 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 05-02-2024, 08:06 AM
                0 responses
                19 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-30-2024, 12:17 PM
                0 responses
                24 views
                0 likes
                Last Post seqadmin  
                Working...
                X