Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Michael Robinson
    Junior Member
    • Jul 2010
    • 7

    Crossbow 1.0.0 help please

    I am very very new with Crossbow and all its tools.

    Following Crossbow 1.0.0 manual instructions I installed it and all the required tools. I am running Ubuntu in a 4 gig laptop.

    I would like to run it in a single node without Hadoop for the moment.

    Per the manual, the following are the commands that I am using and the error that I received.


    michael@michael-laptop:~/crossbow_1/crossbow-1.0.0-beta4/example/e_coli$
    perl $CROSSBOW_HOME/cb_local.pl -input=small.manifest -preprocess
    -pre-output=preproc_small -reference=$CROSSBOW_REFS/e_coli
    -output=output_small -cpus=1
    Died at /home/michael/crossbow_1/crossbow-1.0.0-beta4/cb_emr.pl line 1290.

    Any help will be appreciated.

    Michael
  • Ben Langmead
    Senior Member
    • Sep 2008
    • 200

    #2
    Hi Michael,

    Hmmm... Where did you get that version of Crossbow? I didn't release any versions between 0.1.3 and 1.0.4 .

    At any rate, please try the latest version (1.0.4) available from the crossbow page:



    And let me know if there's still a problem,
    Ben

    Comment

    • Michael Robinson
      Junior Member
      • Jul 2010
      • 7

      #3
      Thank very much for your help.

      I downloaded version 1.0.4, installed it and all corresponding programs, run it in a single computer using e_coli, and everything worked fine. Then I created a Virtual Machine (ubuntu) and repeated the same step with the same results.

      Now I am trying to run the same job using Hadoop (cb_hadoop), but I think I am missing at least one step.

      Following the Crossbow manual I run cb_hadoop getting:

      michael@HOST:~/crossbow_1.0.4/crossbow-1.0.4$ cb_hadoop
      Must specify -reference

      then I run:

      cb_hadoop.pl -reference=$CROSSBOW_REFS/e_coli.jar

      which is the location of the jar files for e_coli, then I got this error:

      -------------------
      michael@HOST:~/crossbow_1.0.4/crossbow-1.0.4$ cb_hadoop.pl -reference=$CROSSBOW_REFS/e_coli.jar
      Crossbow expects 'bowtie' to be at path /home/michael/crossbow_1.0.4/crossbow-1.0.4/bin/linux32/bowtie on the workers
      Crossbow expects 'soapsnp' to be at path /home/michael/crossbow_1.0.4/crossbow-1.0.4/bin/linux32/soapsnp on the workers

      Crossbow job
      ------------
      Hadoop streaming commands in: /tmp/crossbow/invoke.scripts/cb.22704.hadoop.sh
      Running...
      ==========================
      Stage 1 of 3. Align
      ==========================
      Sun Aug 15 17:54:31 EDT 2010
      packageJobJar: [/home/michael/crossbow_1.0.4/crossbow-1.0.4/Get.pm, /home/michael/crossbow_1.0.4/crossbow-1.0.4/Util.pm, /home/michael/crossbow_1.0.4/
      crossbow-1.0.4/Tools.pm, /home/michael/crossbow_1.0.4/crossbow-1.0.4/AWS.pm] [] /tmp/streamjob3580240183983830958.jar tmpDir=null
      10/08/15 17:54:32 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
      10/08/15 17:54:32 ERROR streaming.StreamJob: Error Launching job : Incomplete HDFS URI, no host: hdfs:/crossbow/intermediate/22704/align
      Streaming Job Failed!
      Non-zero exitlevel from Align streaming job
      michael@HOST:~/crossbow_1.0.4/crossbow-1.0.4$
      -------------------

      Could you please tell where can I find documentation about what step(s) I am missing?

      My goal is to run crossbow using multiple Virtual Machines using hadoop.

      Thank you

      Michael

      Comment

      • Ben Langmead
        Senior Member
        • Sep 2008
        • 200

        #4
        Hi Michael,

        Originally posted by Michael Robinson View Post
        Following the Crossbow manual I run cb_hadoop getting:

        michael@HOST:~/crossbow_1.0.4/crossbow-1.0.4$ cb_hadoop
        Must specify -reference

        then I run:

        cb_hadoop.pl -reference=$CROSSBOW_REFS/e_coli.jar

        which is the location of the jar files for e_coli, then I got this error:

        -------------------
        michael@HOST:~/crossbow_1.0.4/crossbow-1.0.4$ cb_hadoop.pl -reference=$CROSSBOW_REFS/e_coli.jar
        Crossbow expects 'bowtie' to be at path /home/michael/crossbow_1.0.4/crossbow-1.0.4/bin/linux32/bowtie on the workers
        Crossbow expects 'soapsnp' to be at path /home/michael/crossbow_1.0.4/crossbow-1.0.4/bin/linux32/soapsnp on the workers

        Crossbow job
        ------------
        Hadoop streaming commands in: /tmp/crossbow/invoke.scripts/cb.22704.hadoop.sh
        Running...
        ==========================
        Stage 1 of 3. Align
        ==========================
        Sun Aug 15 17:54:31 EDT 2010
        packageJobJar: [/home/michael/crossbow_1.0.4/crossbow-1.0.4/Get.pm, /home/michael/crossbow_1.0.4/crossbow-1.0.4/Util.pm, /home/michael/crossbow_1.0.4/
        crossbow-1.0.4/Tools.pm, /home/michael/crossbow_1.0.4/crossbow-1.0.4/AWS.pm] [] /tmp/streamjob3580240183983830958.jar tmpDir=null
        10/08/15 17:54:32 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
        10/08/15 17:54:32 ERROR streaming.StreamJob: Error Launching job : Incomplete HDFS URI, no host: hdfs:/crossbow/intermediate/22704/align
        Streaming Job Failed!
        Non-zero exitlevel from Align streaming job
        michael@HOST:~/crossbow_1.0.4/crossbow-1.0.4$
        -------------------
        You'll have to specify input and output directories using --input and --output as well. Depending on your version of Hadoop and how it's set up, you may need to specify HDFS URLs that include your namenode's address and port; e.g.: -input= hdfs://localhost:9000/my/input.

        Hope this helps,
        Ben

        Comment

        • Michael Robinson
          Junior Member
          • Jul 2010
          • 7

          #5
          Crossbow 1.1.0 with Hadoop 0.20.2 Help

          Hi,

          I am a newbie.

          I have Hadoop 0.20.2 running on a multi-node cluster, one server two nodes

          Following Crossbow 1.1.0 installation instructions in the manual, I installed it in the server and tested it. no problems.
          Now I want to install it (Bowtie and SOAPsnp) in the nodes following the same instructions:

          "If you plan to run on a Hadoop cluster, you may need to manually copy the bowtie and soapsnp files to the same path on each of your Hadoop cluster nodes. You can avoid this step by installing bowtie and soapsnp on a filesystem shared by all Hadoop nodes (e.g. an NFS share). You can also skip this step if Hadoop is installed in pseudo distributed mode, meaning that the cluster really consists of one node whose CPUs are treated as distinct slaves."

          Could you please tell me: when you say "copy the bowtie and soapsnp files to the same path on each of your Hadoop cluster nodes." how are they related to the server install, do you mean an exact path as the Crossbow path in the server?

          Could you give an example of "You can avoid this step by installing bowtie and soapsnp on a filesystem shared by all Hadoop nodes (e.g. an NFS share)."

          Also, testing previous Crossbow versions I needed to install other programs such as R, bioconductor, samtools, etc, are those programs not needed anymore?


          Thank you

          Michael

          Comment

          • Ben Langmead
            Senior Member
            • Sep 2008
            • 200

            #6
            Hi Michael,

            Originally posted by Michael Robinson View Post
            I have Hadoop 0.20.2 running on a multi-node cluster, one server two nodes

            Following Crossbow 1.1.0 installation instructions in the manual, I installed it in the server and tested it. no problems.
            Now I want to install it (Bowtie and SOAPsnp) in the nodes following the same instructions:

            "If you plan to run on a Hadoop cluster, you may need to manually copy the bowtie and soapsnp files to the same path on each of your Hadoop cluster nodes. You can avoid this step by installing bowtie and soapsnp on a filesystem shared by all Hadoop nodes (e.g. an NFS share). You can also skip this step if Hadoop is installed in pseudo distributed mode, meaning that the cluster really consists of one node whose CPUs are treated as distinct slaves."

            Could you please tell me: when you say "copy the bowtie and soapsnp files to the same path on each of your Hadoop cluster nodes." how are they related to the server install, do you mean an exact path as the Crossbow path in the server?
            Yes, it's best to install 'bowtie' and 'soapsnp' at the same path on all nodes, including the server. It's not strictly necessary to install those tools on the server at all, but if you don't the "cb_hadoop --test" command will fail when run from the server.

            Could you give an example of "You can avoid this step by installing bowtie and soapsnp on a filesystem shared by all Hadoop nodes (e.g. an NFS share)."
            All I really mean is that you can set up an NFS share so that all computers in the cluster "see" the same files in certain directories. E.g. you might set up your cluster so that the '/share/crossbow' directory contains a Crossbow install and is NFS-shared across all nodes in the cluster. If you do so, the path '/share/crossbow/bin/linux64/bowtie', for example, will be present on all nodes and you can specify that path using the --bowtie option.

            Also, testing previous Crossbow versions I needed to install other programs such as R, bioconductor, samtools, etc, are those programs not needed anymore?
            You don't need samtools, no. You never needed R/Bioconductor for Crossbow - just for Myrna (a different though similar tool).

            Hope this helps,
            Ben

            Comment

            • Michael Robinson
              Junior Member
              • Jul 2010
              • 7

              #7
              Crossbow 1.1.0 with Hadoop 0.20.2 Help

              Hi Ben,

              I am impressed how fast you replied.

              Thanks very much

              Michael

              Comment

              • Michael Robinson
                Junior Member
                • Jul 2010
                • 7

                #8
                Hi Ben,

                I went the NFS route I think is best because I will only need to modify the server with future updates of Crossbow. I can see the Crossbow folders from the client. thanks

                I also added to my .profile on the server and the nodes
                export $CROSSBOW_HOME=location where I installed Crossbow

                Now I have a new challenge. when I run cb_hadoop --test i get "program not found"

                I can see cb_hadoop and I can also do a cat on it and read the code.

                hadoop@Hadoop-Server:~/crossbow/crossbow$ ls
                ?? contrib ??H@@ ReduceWrap.pl
                Align.pl Copy.pl LICENSE reftools
                AWS.pm Counters.pl LICENSE_APACHE2 soapsnp
                bin Counters.pm LICENSE_ARTISTIC Soapsnp.pl
                BinSort.pl crossbow-1.1.0.zip LICENSE_GPL2 Tools.pm
                cb_emr CrossbowIface.pm LICENSE_GPL3 TUTORIAL
                CBFinish.pl crossbow-manual-v1-1-0.odt LICENSES Util.pm
                cb_hadoop doc MANUAL VERSION
                cb_local example MapWrap.pl Wrap.pm
                CheckDirs.pl Get.pm NEWS
                hadoop@Hadoop-Server:~/crossbow/crossbow$


                I can see cb_hadoop and I can also do a cat on it and read the code.


                Please tell me what I am doing wrong?

                Thanks

                Michael

                Comment

                • Michael Robinson
                  Junior Member
                  • Jul 2010
                  • 7

                  #9
                  I found the solutions to the cb_hadoop error
                  I needed to add to my path the location where I install hadoop

                  I am running the crossbow using the e_coli data sample

                  Thanks

                  Comment

                  • carze
                    Junior Member
                    • Nov 2009
                    • 2

                    #10
                    Hi Ben,

                    Sorry to hijack this thread but seeing as you have already answered questions in here I was wondering if it is possible to get bowtie to produce SAM output within the crossbow pipeline. Whenever I pass the '--sam' flag to bowtie using the '--bowtie-args' flag I get a segmentation fault during the align step.

                    Thanks!

                    Comment

                    • rtgood
                      Junior Member
                      • May 2009
                      • 1

                      #11
                      Hi Ben
                      I've installed crossbow on a sun 64 bit server runnng fedora 11 and I'm getting this error
                      i.e no shellscript was produced
                      Got any idea what I've done wrong???

                      Rob
                      [rtgood1@imokurok CROSSBOW_HOME]$ cb_local --input=RAL306.fq --preprocess --reference=$CROSSBOW_REFS/d_mel --output=testcb --all-haploids --cpus=2
                      print() on closed filehandle JSON at /home/rtgood1/Data/CROSSBOW_HOME/CrossbowIface.pm line 1329.
                      print() on closed filehandle SH at /home/rtgood1/Data/CROSSBOW_HOME/CrossbowIface.pm line 1331.
                      print() on closed filehandle HADOOP at /home/rtgood1/Data/CROSSBOW_HOME/CrossbowIface.pm line 1333.

                      Crossbow job
                      ------------
                      Local commands in: /tmp/crossbow/invoke.scripts/cb.28975.sh
                      Running...
                      sh: /tmp/crossbow/invoke.scripts/cb.28975.sh: No such file or directory

                      [rtgood1@imokurok tmp]$ cd crossbow/
                      [rtgood1@imokurok crossbow]$ ls
                      invoke.scripts
                      [rtgood1@imokurok crossbow]$ cd invoke.scripts/
                      [rtgood1@imokurok invoke.scripts]$ ls
                      [rtgood1@imokurok invoke.scripts]$

                      Comment

                      • av_d
                        Member
                        • Sep 2009
                        • 12

                        #12
                        crossbow error

                        I got some errors while running crossbow.
                        I've tried both cb_local and cb_hadoop with example ecoli dataset provided by crossbow.

                        cmd and parameter:

                        "cb_local --input=reads --output=out_small --reference=e_coli --all-haploid"

                        Its giving following error:


                        Align.pl: Retrived 0 counters from previous stages
                        * Align.pl: Read first line of stdin:
                        * @SRR014475.1 :1:1:108:111
                        * Bad number of read tokens ; expected 3 or 5:
                        * @SRR014475.1 :1:1:108:111
                        ******
                        Fatal error 1.1.0:M140: Aborting because child with PID 15271 exited abnormally



                        Any Suggestion?

                        Comment

                        • karve
                          Member
                          • Feb 2011
                          • 12

                          #13
                          Similar error in Hadoop - can make it work there

                          Well, another newbie here, to this stuff at least, but not to IT, so take my suggestions FWIW - on the other hand, I have got it to work all thru the 4 stages so..

                          I'm using Crossbow 1.1.1 btw.

                          I tried preprocess in both single machine and Hadoop modes and got this

                          Bad number of read tokens ; expected 3 or 5:

                          error in both modes as well. The output ahead and after that message was different for me though:
                          Mine said:

                          Written 8909572 spots

                          From that it was easy to figure out what's happening. In Hadoop mode, for me, the input gut bacteria ( is that right?) file is broken up in 21 files, 18 are legit with data, 2 are empty but still benign, but one file, part_00002 didn't have proper data in it, it had that above text string. So, 20 tasks worked just fine but the one trying to process that part_00002 file failed. So I just deleted that file, edited the shell script to pick up at that point, and voila in hadoop mode it went all the way to the end.

                          I'm doing everything with keep-all option so the intermediate files are all kept, and I used dry-run mode so that shell-scripts that run things are all kept so I can peek at them and edit them as needed.

                          Now for me, its on to the next step and to figure out what this all means in the biology aspect :-)

                          Enjoy.

                          -Shantanu
                          Last edited by karve; 02-17-2011, 09:41 AM.

                          Comment

                          • narain
                            Banned
                            • Aug 2011
                            • 73

                            #14
                            Here is the command i am using:

                            $CROSSBOW_HOME/cb_local --input=small.manifest --preprocess --reference=/home/abi/bioinfo/crossbow/crossbow-1.2.0/crossbow-1.2.0/CROSSBOW_REFS/e_coli --output=output_small --all-haploids --cpus=1 --preprocess-output=preprocess_output --keep-all --fastq-dump=/home/abi/bioinfo/sratoolkit/sratoolkit.2.3.1-centos_linux64/bin/fastq-dump

                            (I tried it for version 1.1.1 as well) .

                            I get problems with SRAtoolkit, though I do have it in the path specified in the command line. And I have tested my SRAtoolkit to work well.

                            ******
                            * Copy.pl: Retrived 0 counters from previous stages
                            * Copy.pl: Line: ftp://ftp-trace.ncbi.nih.gov/sra/sra...14475.lite.sra 0
                            * Copy.pl: Not a comment line
                            * Copy.pl: Doing unpaired entry SRR014475.lite.sra
                            * Copy.pl: Fetching ftp://ftp-trace.ncbi.nih.gov/sra/sra...14475.lite.sra SRR014475.lite.sra 0
                            * reporter:counter:Short read preprocessor,Read data fetched,0
                            * fastq-dump could not be found in SRATOOLKIT_HOME or PATH; please specify --sraconv
                            ******
                            Fatal error 1.1.1:M140: Aborting because child with PID 17272 exited abnormally

                            When requesting support, please include the full output printed here.
                            If a child process was the cause of the error, the output should
                            include the relevant error message from the child's error log. You may
                            be asked to provide additional files as well.
                            Non-zero exitlevel from Preprocess stage

                            Comment

                            • narain
                              Banned
                              • Aug 2011
                              • 73

                              #15
                              Okay, I fixed that error. I changed the code TOOLS.PM at relevant point.

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM
                              • SEQadmin2
                                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                by SEQadmin2


                                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                Introduction

                                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                05-22-2026, 06:42 AM
                              • SEQadmin2
                                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                by SEQadmin2

                                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                05-06-2026, 09:04 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Yesterday, 08:59 AM
                              0 responses
                              13 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              22 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 11:40 AM
                              0 responses
                              19 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-28-2026, 11:40 AM
                              0 responses
                              32 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...