Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • new to exome analysis what software

    Hi Guys

    I am new to exome analysis and was hoping for guidance as to what software is accepted as the most robust pipeline for finding SNPs

    cheers

    Julian

    Comment


    • Need some suggestions for downstream analysis

      Hi, I need some advice regarding the downstream analysis and calling of the variants , I am trying to establish a pipeline for my lab and am new to exome sequecing data analysis. I am now trying to use it with a single sample(paired end) from the 1000 genome project. Sample 96 to be precise. I have already done the alignment and also done till the indexing of the sorted bam file. So the next step would be using the GATK tool
      and identify the target regions for realignment then realign the BAM to get better INDEL calling and then calling the different packages of GATK to call the SNP and INDELs, I want to ask , I downloaded the latest version of GATK, is it advisable to work with that or with older versions? also in some forums I see the MarkDuplicates step is skipped and since the realignment is done with the GATK again for better INDEL calling , it seems we can skip this step. Suggestions should be welcome.

      Comment


      • Hi, I need some advice regarding the downstream analysis and calling of the variants , I am trying to establish a pipeline for my lab and am new to exome sequecing data analysis. I am now trying to use it with a single sample(paired end) from the 1000 genome project. Sample 96 to be precise. I have already done the alignment and also done till the indexing of the sorted bam file. So the next step would be using the GATK tool
        and identify the target regions for realignment then realign the BAM to get better INDEL calling and then calling the different packages of GATK to call the SNP and INDELs, I want to ask , I downloaded the latest version of GATK, is it advisable to work with that or with older versions? also in some forums I see the MarkDuplicates step is skipped and since the realignment is done with the GATK again for better INDEL calling , it seems we can skip this step. Suggestions should be welcome.

        Comment


        • Thank you for the nice how-to guide.

          I have a couple questions about it.

          1. The second step of 2.2 Actual Alignment uses the -f flag. Is this to specify what the output .sai file is called? I've looked through the bwa manual page (http://bio-bwa.sourceforge.net/bwa.shtml) and it doesn't mention -f.

          2. This question is about the -r flag that is used on the same command and the next. The guide has:
          Code:
          bwa sampe -f out.sam -r "@RQ\tID:<ID>\tLB:<LIBRARY_NAME>\tSM:<SAMPLE_NAME>\tPL:ILLUMINA" hg19 input1.sai input2.sai input1.fq input2.fq
          I don't know how much it matters, but should there be "@RG" instead of "@RQ" after the open quotes?

          Thanks,
          Blake

          Comment


          • Hi ulz_peter and everybody,

            I have a problem when I try to execute the next command (step in my exome analysis):



            java -Xmx4g -Djava.io.tmpdir=/tmp \
            -jar picard/SortSam.jar \
            SO=coordinate \
            INPUT=input.sam \
            OUTPUT=output.bam \
            VALIDATION_STRINGENCY=LENIENT \
            CREATE_INDEX=true

            The output to this (above)is a error message: It doesn´t find or load the jarfile SamSort.jar.Also ,sometimes, the error message is : it has not loaded the main class (or similar).
            But I have seen the SortSam.jar file inside my picards-tools folder.I have downloaded well the picard-tools with SortSam.jar included.I have tried with differents paths for SamSort.jar,but the problem is the same.

            What could I do?.Somebody could help me,please, to go on with my exome analysis?

            Waiting for your answer,Thank you so much .

            JM

            Comment


            • Hi JM,

              Newer versions of Picard do not come with a bunch of jar files per command, but with a unified jar file where the command you want to execute can be specified (the version I use currently is picard 1.128.

              So the command would now look something like:

              java -jar picard.jar SortSam SO=coordinate INPUT=input.sam OUTPUT=output.bam VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true

              Besides of that, it is hard to tell where the actual problem lies. You could try to post the actual code you are trying to run.

              Comment


              • Thank you ulz_peter for your answer.I send/copy you my input (with your instructions) and the respective output:

                ubuntu@ubuntu-Compaq-CQ58-Notebook-PC:~/Escritorio/picard-tools-1.128$ java -jar picard.jar SortSam SO=coordinate INPUT=input.sam OUTPUT=output.bam VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true
                [Mon May 18 19:25:34 CEST 2015] picard.sam.SortSam INPUT=input.sam OUTPUT=output.bam SORT_ORDER=coordinate VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_MD5_FILE=false
                [Mon May 18 19:25:34 CEST 2015] Executing as ubuntu@ubuntu-Compaq-CQ58-Notebook-PC on Linux 3.13.0-52-generic amd64; OpenJDK 64-Bit Server VM 1.7.0_79-b14; Picard version: 1.128(c8e12338d226532b30e9ecdbf33180a073c3ffc7_1421081159) IntelDeflater
                [Mon May 18 19:25:34 CEST 2015] picard.sam.SortSam done. Elapsed time: 0,01 minutes.
                Runtime.totalMemory()=60293120
                To get help, see http://broadinstitute.github.io/pica...ml#GettingHelp
                Exception in thread "main" htsjdk.samtools.SAMException: Cannot read non-existent file: /home/ubuntu/Escritorio/picard-tools-1.128/input.sam
                at htsjdk.samtools.util.IOUtil.assertFileIsReadable(IOUtil.java:308)
                at picard.sam.SortSam.doWork(SortSam.java:71)
                at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:187)
                at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95)
                at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)
                ubuntu@ubuntu-Compaq-CQ58-Notebook-PC:~/Escritorio/picard-tools-1.128$

                What is your opinion about this output?
                I don t understand :...........Cannot read non-existent file:/home/ubuntu/Escritorio/picard-tools-1.128/input.sam. I say it because I did well the early steps belong to A short guide to Exome seq. analysis using Illumina technology (your analysis pdf guide),until this step(SAM to BAM conversion),with 2 fastq files (paired end) mine. I think the conversion to SAM files was without problems.Then, why cannot read non existent file:/home/ubuntu/Escritorio/picard-tools-1.128/input.sam ? What is this input.sam?

                Do you think I can go on with the next steps(Marking PCR duplicates and the rest of steps) of this pipeline/analysis short guide and with the exactly same code (for the following steps) than the included in this same analysis short guide?.If now the code is different for all steps remaining,please, could you send me this one corrected for all steps?

                Waiting for your answer,please, thank you so much for your help.(Sorry but I haven t experience with this pipeline).

                Juan M.

                Comment


                • @Juan M.: Error indicates that file does not exist.

                  Can you see it with a directory listing? Post the output of

                  Code:
                  $ ls -lh /home/ubuntu/Escritorio/picard-tools-1.128/*.sam
                  Note: Just because you went through the steps does not mean that the process worked right. Did you see any other errors upstream of this step?
                  Last edited by GenoMax; 05-18-2015, 10:45 AM.

                  Comment


                  • Hi GenoMax.Thank you for your answer.

                    You are right. This is the output:

                    ubuntu@ubuntu-Compaq-CQ58-Notebook-PC:~$ ls -lh /home/ubuntu/Escritorio/picard-tools-1.128/*.sam
                    ls: no se puede acceder a /home/ubuntu/Escritorio/picard-tools-1.128/*.sam: No existe el archivo o el directorio

                    The output says : It doesn t exist the file or directory.

                    However, when I look for into the Escritorio directory (Desktop), the output is:

                    ubuntu@ubuntu-Compaq-CQ58-Notebook-PC:~$ ls -lh /home/ubuntu/Escritorio/*.sam
                    -rw-rw-r-- 1 ubuntu ubuntu 0 may 17 21:28 /home/ubuntu/Escritorio/INPUT=input.sam
                    -rw-rw-r-- 1 ubuntu ubuntu 253M may 11 23:07 /home/ubuntu/Escritorio/out.sam

                    Then, what can I do with the step SAM to BAM conversion? Or must I start again all the steps from the beginning with the BWA alignment?

                    Until now I applied step by step the exome analysis short guide(pdf),Do you know any easy method or pipeline to exome analysis?

                    Waiting for your answer,please, thank you so much.

                    Juan M.

                    Comment


                    • Originally posted by j6163m View Post
                      Hi GenoMax.Thank you for your answer.

                      You are right. This is the output:

                      ubuntu@ubuntu-Compaq-CQ58-Notebook-PC:~$ ls -lh /home/ubuntu/Escritorio/picard-tools-1.128/*.sam
                      ls: no se puede acceder a /home/ubuntu/Escritorio/picard-tools-1.128/*.sam: No existe el archivo o el directorio

                      The output says : It doesn t exist the file or directory.
                      So that mystery is solved. We know that file is not there so you are getting the error.
                      However, when I look for into the Escritorio directory (Desktop), the output is:

                      ubuntu@ubuntu-Compaq-CQ58-Notebook-PC:~$ ls -lh /home/ubuntu/Escritorio/*.sam
                      -rw-rw-r-- 1 ubuntu ubuntu 0 may 17 21:28 /home/ubuntu/Escritorio/INPUT=input.sam
                      -rw-rw-r-- 1 ubuntu ubuntu 253M may 11 23:07 /home/ubuntu/Escritorio/out.sam
                      Can you show the first 10 lines of out.sam by doing this?

                      Code:
                      $ head -10 out.sam
                      I am not sure why the above ls command is showing full paths in your file listing (perhaps your system is setup that way). Which PDF guide are you following? Is it at the beginning of this thread?

                      Comment


                      • Hi again,

                        This is the output:

                        ubuntu@ubuntu-Compaq-CQ58-Notebook-PC:~$ cd /home/ubuntu/Escritorio
                        ubuntu@ubuntu-Compaq-CQ58-Notebook-PC:~/Escritorio$ head -10 out.sam
                        @SQ SN:chr10 LN:135534747
                        @SQ SN:chr11 LN:135006516
                        @SQ SN:chr12 LN:133851895
                        @SQ SN:chr13 LN:115169878
                        @SQ SN:chr14 LN:107349540
                        @SQ SN:chr15 LN:102531392
                        @SQ SN:chr16 LN:90354753
                        @SQ SN:chr17 LN:81195210
                        @SQ SN:chr18 LN:78077248
                        @SQ SN:chr19 LN:59128983

                        Yes, the pdf guide is at the beginning of this thread.

                        What is the next step I have to do?

                        Thank you

                        Juan M.

                        Comment


                        • That sam file looks ok. Are you at the start of section 2.3? In any case don't blindly follow steps in the document if you don't understand what is happening at that step.
                          Last edited by GenoMax; 05-18-2015, 02:52 PM.

                          Comment


                          • Yes, I am at the begining of section 2.3(SAM to BAM conversión)

                            If you look at the section 2.2 there is a code like:

                            bwa samse -f out.sam -r
                            "@RQ\tID:<ID>\tLB:<LIBRARY_NAME>\tSM:<SAMPLE_NAME>\tPL:ILLUMIN A" hg19 input1.sai
                            input2.sai input1.fq input2.fq

                            This is the code I used (pair end data).If you see apear out.sam.Is this out.sam I have?

                            Then, Can I go on with step/section 2.4 and the same and exactly code included in the following steps/sections until the end (without change anything)?.If was necessary to change some of the codes of the differents steps,please, send me it.


                            Waiting for the answers to these questions,please, thank you so much for your help.

                            Juan M.

                            Comment


                            • Originally posted by j6163m View Post
                              Yes, I am at the begining of section 2.3(SAM to BAM conversión)

                              If you look at the section 2.2 there is a code like:

                              bwa samse -f out.sam -r
                              "@RQ\tID:<ID>\tLB:<LIBRARY_NAME>\tSM:<SAMPLE_NAME>\tPL:ILLUMIN A" hg19 input1.sai
                              input2.sai input1.fq input2.fq

                              This is the code I used (pair end data).If you see apear out.sam.Is this out.sam I have?

                              Then, Can I go on with step/section 2.4 and the same and exactly code included in the following steps/sections until the end (without change anything)?.If was necessary to change some of the codes of the differents steps,please, send me it.


                              Waiting for the answers to these questions,please, thank you so much for your help.

                              Juan M.
                              @ulz_peter: The manual looks a bit confusing. File from step 2.3 is called out.sam but then in 2.4 it is being referred to as input.sam? Since the syntax for Picard has changed perhaps you should consider updating your manual.

                              @Juan M: That looks to be the right file (but make a note that the name is not the same as in manual). Since you are using a newer version of picard you should use the command @ulz_peter provided in post #126.

                              Comment


                              • Hi again,

                                This is my input and output for the step/section 2.4 of the guide (marking PCR duplicates);

                                ubuntu@ubuntu-Compaq-CQ58-Notebook-PC:~/Escritorio/picard-tools-1.128$ java -jar picard.jar MarkDuplicates INPUT=input.bam OUTPUT=input.marked.bam METRICS_FILE=metrics VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true
                                [Tue May 19 19:42:48 CEST 2015] picard.sam.markduplicates.MarkDuplicates INPUT=[input.bam] OUTPUT=input.marked.bam METRICS_FILE=metrics VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates REMOVE_DUPLICATES=false ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9][0-9]+)[0-9]+)[0-9]+).* OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_MD5_FILE=false
                                [Tue May 19 19:42:48 CEST 2015] Executing as ubuntu@ubuntu-Compaq-CQ58-Notebook-PC on Linux 3.13.0-52-generic amd64; OpenJDK 64-Bit Server VM 1.7.0_79-b14; Picard version: 1.128(c8e12338d226532b30e9ecdbf33180a073c3ffc7_1421081159) IntelDeflater
                                [Tue May 19 19:42:48 CEST 2015] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 0,01 minutes.
                                Runtime.totalMemory()=60293120
                                To get help, see http://broadinstitute.github.io/pica...ml#GettingHelp
                                Exception in thread "main" htsjdk.samtools.SAMException: Cannot read non-existent file: /home/ubuntu/Escritorio/picard-tools-1.128/input.bam
                                at htsjdk.samtools.util.IOUtil.assertFileIsReadable(IOUtil.java:308)
                                at htsjdk.samtools.util.IOUtil.assertFilesAreReadable(IOUtil.java:325)
                                at picard.sam.markduplicates.MarkDuplicates.doWork(MarkDuplicates.java:108)
                                at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:187)
                                at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95)
                                at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)


                                And this is my searching of bam files in my system:

                                ubuntu@ubuntu-Compaq-CQ58-Notebook-PC:~/Escritorio/picard-tools-1.128$ ls -lh /home/ubuntu/Escritorio/picard-tools-1.128/*.bam
                                -rw-rw-r-- 1 ubuntu ubuntu 0 may 19 19:27 /home/ubuntu/Escritorio/picard-tools-1.128/OUTPUT=input.marked.bam

                                What is your opinion about the code I have used?. Thatś right or wrong?.Could you help me and to correct it,please ?.

                                Thank you so much.

                                Juan M.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Choosing Between NGS and qPCR
                                  by seqadmin



                                  Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                                  10-18-2024, 07:11 AM
                                • seqadmin
                                  Non-Coding RNA Research and Technologies
                                  by seqadmin




                                  Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                                  Nobel Prize for MicroRNA Discovery
                                  This week,...
                                  10-07-2024, 08:07 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 05:31 AM
                                0 responses
                                10 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-24-2024, 06:58 AM
                                0 responses
                                20 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-23-2024, 08:43 AM
                                0 responses
                                48 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-17-2024, 07:29 AM
                                0 responses
                                58 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X