Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • dpryan
    replied
    Hi brentp, both should work fine. Try removing "-lmpich -lmpl" and replacing that with "-lmpi". I just replaced mpich2 with open-mpi and that then compiled (i'll add this to my list of things to fix). Of course it now segfaults for me . I'll have to play around with openMPI since I guess I've never used that. Let me know if this works for you, since that'll just mean that openMPI isn't completely installed on my system.

    BTW, I think the most recent version of bismark has an option to output directly to BAM (it just pipes to samtools).

    Leave a comment:


  • brentp
    replied
    Hi Devon, this seems very interesting. I have been having trouble with the amount of plain-text output from Bismark.

    this seems to require exactly mpich, openmpi doesn't seem to work, is that correct? Once I get past that, I get:

    Code:
    mpicc -c -Wall -O3   main.c -o main.o
    mpicc -Wall -O3  aux.o fastq.o genome.o slurp.o master.o common.o MPI_packing.o worker.o main.o -o bison -L/home/brentp/src/samtools  -lm -lpthread -lmpich -lmpl -lbam -lz
    aux.o: In function `quit':
    aux.c:(.text+0xcf2): undefined reference to `ompi_mpi_comm_world'
    aux.o: In function `effective_nodes':
    aux.c:(.text+0xef5): undefined reference to `ompi_mpi_comm_world'
    slurp.o: In function `slurp':
    slurp.c:(.text+0x1f9): undefined reference to `ompi_mpi_comm_world'
    slurp.c:(.text+0x1fe): undefined reference to `ompi_mpi_int'
    slurp.c:(.text+0x23d): undefined reference to `ompi_mpi_comm_world'
    slurp.c:(.text+0x24d): undefined reference to `ompi_mpi_int'
    slurp.c:(.text+0x277): undefined reference to `ompi_mpi_comm_world'
    slurp.c:(.text+0x287): undefined reference to `ompi_mpi_byte'
    slurp.c:(.text+0x347): undefined reference to `ompi_mpi_byte'
    slurp.c:(.text+0x34d): undefined reference to `ompi_mpi_comm_world'
    slurp.c:(.text+0x3a6): undefined reference to `ompi_mpi_comm_world'
    slurp.c:(.text+0x3c3): undefined reference to `ompi_mpi_byte'
    slurp.c:(.text+0x3ff): undefined reference to `ompi_mpi_byte'
    slurp.c:(.text+0x405): undefined reference to `ompi_mpi_comm_world'
    worker.o: In function `worker_node':
    worker.c:(.text+0x1c2): undefined reference to `ompi_mpi_comm_world'
    worker.c:(.text+0x1cc): undefined reference to `ompi_mpi_int'
    worker.c:(.text+0x390): undefined reference to `ompi_mpi_comm_world'
    worker.c:(.text+0x39b): undefined reference to `ompi_mpi_byte'
    worker.c:(.text+0x3dd): undefined reference to `ompi_mpi_comm_world'
    worker.c:(.text+0x3ea): undefined reference to `ompi_mpi_byte'
    worker.c:(.text+0x453): undefined reference to `ompi_mpi_comm_world'
    worker.c:(.text+0x45e): undefined reference to `ompi_mpi_byte'
    worker.c:(.text+0x5be): undefined reference to `ompi_mpi_comm_world'
    worker.c:(.text+0x5c9): undefined reference to `ompi_mpi_int'
    worker.c:(.text+0x5e3): undefined reference to `ompi_mpi_comm_world'
    worker.c:(.text+0x5ee): undefined reference to `ompi_mpi_byte'
    collect2: ld returned 1 exit status
    make: *** [align] Error 1
    How can I get past that error?

    Leave a comment:


  • dpryan
    replied
    Version 0.2.0 now available

    I've just posted version 0.2.0 to sourceforge. This big change in this release is the inclusion of bison_herd, which can use a semi-arbitrary number of nodes (e.g., I'm using 17 nodes to simultaneously align some samples at the moment). bison_herd can also accept a list of input files and will write each of their alignments to separate files. This is useful when you have a number of samples and want to skip the overhead of loading the bowtie2 index and the genome into memory more than once. bison_herd also skips writing in-silico converted reads to a file, further increasing performance. Other changes:
    • Added a note to the methylation summary statistics output at the end of a run that the numbers will include double counting of any site covered by both mates in a pair. These metrics are only meant for general information and not further analysis, so I don't consider that a bug (it's actually a design decision for the sake of performance).
    • --ignore-quals is no longer passed to bowtie2 by default. Specifying this will marginally decrease both correct and incorrect alignments. It will also generally decrease the alignment rate.
    • Fixed --unmapped, which are now written to the directory specified by -o
    • --maxins was already 500 by default, so it is no longer set by default.
    • The methylation extractor now has a -phred option, to exclude methylation calls from low confidence base-calls. The default threshold is 20.
    • Added a script to convert bedGraph files to a format suitable for BSseq.
    • Fixed a bug in bison_merge_CpGs


    The only thing left on my "To Do" list is to add support for filename globbing (e.g. sample_*_1.fastq.gz and sample_*_2.fastq.gz) to make feeding bison_herd (and the auxiliary scripts) with multiple files easier.

    Leave a comment:


  • dpryan
    replied
    version 0.1.1 is now available

    I've finally gotten around to updating the version of bison on sourceforge (now 0.1.1) so it's current with my local version. Changes are as follows:
    • Fixed a number of minor bugs.
    • Added support for uncompressed fastq files, as well as bzipped files (previously, only gzipped fastq files worked properly).
    • --score-min is now parsed by bison prior to being sent to bowtie2, read MAPQ scores are recalculated accordingly by the same algorithm used by bowtie2 (N.B., this only bears a vague correspondence to -10*log10(probability the mapping position is wrong)!).
    • Added a bison_mbias function, to process the aligned BAM file and create a text file containing the percentage of methylated C's as a function of read position. For the utility of this, see: Hansen KD, Langmead B and Irizarry RA, BSmooth: from whole genome bisulfite sequencing to differentially methylated regions. Genome Biol 2012; 13(10):R83.
    • The methylation extractor now accepts the -q options, which sets the MAPQ threshold for a read to be included in the methylation results. The default is a minimum MAPQ of 20, which seems to be a reasonable threshold from a few simulations.
    • In DEBUG mode, the output BAM files used to have fixed names. This was a problem in cases where debugging was being performed on multiple input files. Now, the OT/OB/CTOT/CTOB.bam filename is prepended with an appropriate prefix (extracted from the input file name). In addition, the output directory is now respected in DEBUG mode.
    • Included an "auxiliary" directory, that includes functions for making an RRBS genome and other possibly useful functions.


    Unless some bugs crop up, I expect the next release will support a semi-arbitrary number of nodes. As is, only 3 or 5 nodes can be used at a time. This is fine for me, since I'm usually processing a number of samples in parallel and our cluster is relatively small. I can easily envision others finding more nodes useful. I have a few implementation ideas for this.

    Leave a comment:


  • dpryan
    started a topic Bison: BISlfite alignment On Nodes of a cluster

    Bison: BISlfite alignment On Nodes of a cluster

    Greetings all,

    I'd like to announce the general availability of a program that I've recently written called Bison (BISulfite alignment On Nodes of a cluster), which is intended for those who need to align bisulfite-converted reads and have access to a computer cluster. Bison is quite similar to Bismark (I'm a former Bismark user and wrote Bison to get my alignments sooner), with major differences as follows:
    • Bison is often 5-10x faster, due largely to the fact that it allocates individual cluster nodes to aligning reads to each strand. It combines information from the “aligner nodes” on a separate “master node”.
    • Bison uses the samtools C API to output alignments directly to BAM format, thereby saving space and disk I/O.
    • Bison is written purely in C, results in a bit more of a speed gain.
    • Bison also decides upon the correct alignment in a slightly different way than Bismark, resulting in fewer misaligned reads (0.02-0.03% versus ~0.6% for Bismark).
    • Bison requires only enough RAM for a single instance of bowtie2, as opposed to enough for 2-4 instances.

    Otherwise, Bison will be quite familiar to those of you already accustomed to using Bismark. Both directional and non-directional libraries are supported. Bowtie2 is used for alignment on each of the nodes. Both paired-end and single-end libraries are supported. A methylation extractor is included that outputs into bedGraph format (if people would like a different format or different information, just ask). For those doing RRBS, I should note that the methylation extractor can be told that you are doing RRBS (currently MspI and TaqI digested libraries are supported) and it will then ignore methylation calls of bases added experimentally during fragment end-repair (this avoids needing to trim them off prior to alignment).

    I should note that Bison does not currently support color-space reads, as I've never actually had any. Further, it is generally less flexible than Bismark, so I encourage users interested in Bison to try some test data to see if Bison meets their needs.

    Bison source code and directions for compilation and usage are available via sourceforge. Samtools and Bowtie2 are required for installation. Likewise, MPI is required for compilation and usage, though you can run Bison on a single computer if desired. If people run into installation or usage problems, please feel free to post in this thread or submit a ticket on sourceforge.

    Devon Ryan

Latest Articles

Collapse

  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin


    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
    Yesterday, 07:01 AM
  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
39 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
41 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 09:21 AM
0 responses
35 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-04-2024, 09:00 AM
0 responses
55 views
0 likes
Last Post seqadmin  
Working...
X