Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • carolW
    replied
    I don't know the following issue could be related to my bwa samse problem:

    I tried to run gatk for local realignment of a bam file. I used a 1000genomes bam file and NA12878.HiSeq.WGS.bwa.cleaned.recal.b37.20.bam that is recommended on gatk documentation. I have an incompatibles contigues between the input reads and the reference files (see below). Basically, in the reference file downloaded from the cufflinks website, chromosomes are annotated as chr_nbOfChr like chr1, chrX etc and in the bam file as 1, X etc.

    Does it mean that because of this (problem with gatk and bwa samse), the index file downloaded from cufflinks is not really usable with different data sets although the problem types are different (location of index file and chr annotation)?

    Look forward to your advices,

    Carol
    ------------------------------------------------------
    ERROR MESSAGE: Input files reads and reference have incompatible contigs: No overlapping contigs found.
    ##### ERROR reads contigs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, Y, MT, GL000207.1, GL000226.1, GL000229.1, GL000231.1, GL000210.1, GL000239.1, GL000235.1, GL000201.1, GL000247.1, GL000245.1, GL000197.1, GL000203.1, GL000246.1, GL000249.1, GL000196.1, GL000248.1, GL000244.1, GL000238.1, GL000202.1, GL000234.1, GL000232.1, GL000206.1, GL000240.1, GL000236.1, GL000241.1, GL000243.1, GL000242.1, GL000230.1, GL000237.1, GL000233.1, GL000204.1, GL000198.1, GL000208.1, GL000191.1, GL000227.1, GL000228.1, GL000214.1, GL000221.1, GL000209.1, GL000218.1, GL000220.1, GL000213.1, GL000211.1, GL000199.1, GL000217.1, GL000216.1, GL000215.1, GL000205.1, GL000219.1, GL000224.1, GL000223.1, GL000195.1, GL000212.1, GL000222.1, GL000200.1, GL000193.1, GL000194.1, GL000225.1, GL000192.1, NC_007605]
    ##### ERROR reference contigs = [chrM, chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY]

    Leave a comment:


  • carolW
    replied
    I found the problem but the solution doesn't help and causes another problem:

    When I uncompressed the index file from cufflinks web site, it created different folders, in one of them, there was a symbolic link to the index file. So I was using the symbolic link which caused the segfault.

    Now I use the right index file but bwa samse is unable to locate the index file although the path is correct:

    ../pgm/bwa-0.7.3a/bwa samse ~/NGS/hg19/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa SRR062641.filt.sai SRR062641.filt.fastq > SRR062641.filt.sam
    [bwa_sai2sam_se] fail to locate the index
    [main] Version: 0.7.3a-r367
    [main] CMD: ../pgm/bwa-0.7.3a/bwa samse ~/NGS/hg19/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa SRR062641.filt.sai SRR062641.filt.fastq

    ls -lt /home/carolw/NGS/hg19/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa
    -rwxrwxr-x 1 carolw carolw 3157279232 Apr 26 10:52 /home/yasrebih/NGS/hg19/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa

    ls -lt /home/carolw/NGS/hg19/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/
    total 3083296
    -rwxrwxr-x 1 carolw carolw 3157279232 Apr 26 10:52 genome.fa
    -rwxrwxr-x 1 carolw carolw 3099 Apr 13 2012 genome.dict
    -rwxrwxr-x 1 carolw carolw 783 Mar 15 2012 genome.fa.fai

    Leave a comment:


  • GenoMax
    replied
    Originally posted by carolW View Post
    No to your first question.
    Not much left to try except to make sure that your copy of bwa is working right with a small dataset. You should try to find another machine with more memory.

    Leave a comment:


  • carolW
    replied
    No to your first question.

    Yes, I am admin of the machine

    I tried to install the related package to limit, but seems that it's not complete

    vlimit
    vlimit: vc_get_task_xid(): Function not implemented
    limit
    No command 'limit' found, did you mean:
    Command 'vlimit' from package 'util-vserver' (universe)
    limit: command not found

    Leave a comment:


  • GenoMax
    replied
    Originally posted by carolW View Post
    Note that no core file is generated after segmentation fault if this could answer your question.
    So we know that the process is not aborting resulting in a core dump.
    Originally posted by carolW View Post
    Just a question, to compile and generated the bw exec file, I just invoked "make". Was it sufficient to compile the files?
    That should be all you need as long as you have the compiler and libraries available.

    Have you been able to successfully use the bwa program on a different data set (look for some E coli data from SRA if you need a test case) so we are sure it works otherwise.

    Are you the "administrator" of this machine? What does the output of command "limit" show?

    Leave a comment:


  • carolW
    replied
    Note that no core file is generated after segmentation fault if this could answer your question.

    Just a question, to compile and generated the bw exec file, I just invoked "make". Was it sufficient to compile the files?

    Leave a comment:


  • carolW
    replied
    I'm not sure to have understood your question. I don't generate files that contain "core" in the name. This is what I did

    ../pgm/bwa-0.7.3a/bwa samse ../hg19/Homo_sapiens/UCSC/hg19/Sequence/BWAIndex/version0.6.0/genome.fa ./SRR062641.filt.sai ./SRR062641.filt.fastq > ./SRR062641.filt.sam
    Segmentation fault (core dumped)

    Leave a comment:


  • GenoMax
    replied
    Are you generating files that contain word "core" in the name after you get the seg fault?

    Leave a comment:


  • carolW
    replied
    yes, I added ./ and get segmentation fault right away

    Leave a comment:


  • GenoMax
    replied
    Just wanted to verify that the bwa is executing properly on your system .. which it seems to be doing.

    Are you issuing the samse command from the directory where you have the "SRR062641*" files? Have you tried to add ./SRR062641.filt.sai and ./SRR062641.filt.fastq to explicitly locate the files as being in current directory?

    Leave a comment:


  • carolW
    replied
    I get the same output

    ../pgm/bwa-0.7.3a/bwa samse
    Usage: bwa samse [-n max_occ] [-f out.sam] [-r RG_line] <prefix> <in.sai> <in.fq>

    ../pgm/bwa-0.7.3a/bwa

    Program: bwa (alignment via Burrows-Wheeler transformation)
    Version: 0.7.3a-r367
    Contact: Heng Li <[email protected]>

    Usage: bwa <command> [options]

    Command: index index sequences in the FASTA format
    mem BWA-MEM algorithm
    fastmap identify super-maximal exact matches
    pemerge merge overlapping paired ends (EXPERIMENTAL)
    aln gapped/ungapped alignment
    samse generate alignment (single ended)
    sampe generate alignment (paired ended)
    bwasw BWA-SW for long queries

    fa2pac convert FASTA to PAC format
    pac2bwt generate BWT from PAC
    pac2bwtgen alternative algorithm for generating BWT
    bwtupdate update .bwt to the new format
    bwt2sa generate SA from BWT and Occ


    I downloaded the HG index from http://cufflinks.cbcb.umd.edu/igenomes.html. I used it with bwa aln without any problem. I use only the following file from the uncompressed untared file
    hg19/Homo_sapiens/UCSC/hg19/Sequence/BWAIndex/version0.6.0/genome.fa

    Leave a comment:


  • GenoMax
    replied
    If you just issue the following commands (highlighted in red) do you see an output similar to what is posted here:

    Code:
    $ [COLOR="Red"]bwa samse[/COLOR]
    Usage: bwa samse [-n max_occ] [-f out.sam] [-r RG_line] <prefix> <in.sai> <in.fq>
    $ [COLOR="Red"]bwa[/COLOR]
    
    Program: bwa (alignment via Burrows-Wheeler transformation)
    Version: 0.7.4-r385
    Contact: Heng Li <[email protected]>
    
    Usage:   bwa <command> [options]
    
    Command: index         index sequences in the FASTA format
             mem           BWA-MEM algorithm
             fastmap       identify super-maximal exact matches
             pemerge       merge overlapping paired ends (EXPERIMENTAL)
    (output for second command truncated)

    Did you create the human genome index or download it from somewhere else?

    Leave a comment:


  • carolW
    replied
    I executed the following command many times, I didn't see bwa in the list of processes. So it doesn't even start. Is it able to evaluate the memory before starting? What prevents to start?

    ../pgm/bwa-0.7.3a/bwa samse ../hg19/Homo_sapiens/UCSC/hg19/Sequence/BWAIndex/version0.6.0/genome.fa SRR062641.filt.sai SRR062641.filt.fastq > SRR062641.filt.sam

    Thanks,

    Carol

    Leave a comment:


  • carolW
    replied
    Is there anything else that could be done?

    Look forward to your reply,

    Carol

    Leave a comment:


  • carolW
    replied
    Originally posted by GenoMax View Post
    Are you using 32-bit or 64-bit OS? Did you compile bwa on this machine yourself?

    BWA-MEM is a new alignment algorithm (for long reads). See pre-print here: http://arxiv.org/abs/1303.3997
    64-bit and I compiled bwa on this machine.

    Mastal thought that the segmentation fault comes from the number of bp and suggested me to use bwa mem because there are more than 100bp in the seqence based on the output that was generated from bwa aln (see below). Do you agree with this argument?

    [bwa_aln] 17bp reads: max_diff = 2
    [bwa_aln] 38bp reads: max_diff = 3
    [bwa_aln] 64bp reads: max_diff = 4
    [bwa_aln] 93bp reads: max_diff = 5
    [bwa_aln] 124bp reads: max_diff = 6
    [bwa_aln] 157bp reads: max_diff = 7
    [bwa_aln] 190bp reads: max_diff = 8
    [bwa_aln] 225bp reads: max_diff = 9
    [bwa_aln_core] 109811 sequences have been processed.

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Genetic Variation in Immunogenetics and Antibody Diversity
    by seqadmin



    The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
    11-06-2024, 07:24 PM
  • seqadmin
    Choosing Between NGS and qPCR
    by seqadmin



    Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
    10-18-2024, 07:11 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 11-01-2024, 06:09 AM
0 responses
30 views
0 likes
Last Post seqadmin  
Started by seqadmin, 10-30-2024, 05:31 AM
0 responses
21 views
0 likes
Last Post seqadmin  
Started by seqadmin, 10-24-2024, 06:58 AM
0 responses
26 views
0 likes
Last Post seqadmin  
Started by seqadmin, 10-23-2024, 08:43 AM
0 responses
57 views
0 likes
Last Post seqadmin  
Working...
X