Unconfigured Ad

**Ben Langmead** · 07-22-2010, 09:16 AM

Hi Michael,

Hmmm... Where did you get that version of Crossbow? I didn't release any versions between 0.1.3 and 1.0.4

.

At any rate, please try the latest version (1.0.4) available from the crossbow page:

Crossbow: Whole Genome Resequencing Analysis in the Clouds

http://bowtie-bio.sourceforge.net/crossbow/index.shtml

And let me know if there's still a problem,
Ben

**Michael Robinson** · 08-15-2010, 03:46 PM

Thank very much for your help.

I downloaded version 1.0.4, installed it and all corresponding programs, run it in a single computer using e_coli, and everything worked fine. Then I created a Virtual Machine (ubuntu) and repeated the same step with the same results.

Now I am trying to run the same job using Hadoop (cb_hadoop), but I think I am missing at least one step.

Following the Crossbow manual I run cb_hadoop getting:

michael@HOST:~/crossbow_1.0.4/crossbow-1.0.4$ cb_hadoop
Must specify -reference

then I run:

cb_hadoop.pl -reference=$CROSSBOW_REFS/e_coli.jar

which is the location of the jar files for e_coli, then I got this error:

-------------------
michael@HOST:~/crossbow_1.0.4/crossbow-1.0.4$ cb_hadoop.pl -reference=$CROSSBOW_REFS/e_coli.jar
Crossbow expects 'bowtie' to be at path /home/michael/crossbow_1.0.4/crossbow-1.0.4/bin/linux32/bowtie on the workers
Crossbow expects 'soapsnp' to be at path /home/michael/crossbow_1.0.4/crossbow-1.0.4/bin/linux32/soapsnp on the workers

Crossbow job
------------
Hadoop streaming commands in: /tmp/crossbow/invoke.scripts/cb.22704.hadoop.sh
Running...
==========================
Stage 1 of 3. Align
==========================
Sun Aug 15 17:54:31 EDT 2010
packageJobJar: [/home/michael/crossbow_1.0.4/crossbow-1.0.4/Get.pm, /home/michael/crossbow_1.0.4/crossbow-1.0.4/Util.pm, /home/michael/crossbow_1.0.4/
crossbow-1.0.4/Tools.pm, /home/michael/crossbow_1.0.4/crossbow-1.0.4/AWS.pm] [] /tmp/streamjob3580240183983830958.jar tmpDir=null
10/08/15 17:54:32 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
10/08/15 17:54:32 ERROR streaming.StreamJob: Error Launching job : Incomplete HDFS URI, no host: hdfs:/crossbow/intermediate/22704/align
Streaming Job Failed!
Non-zero exitlevel from Align streaming job
michael@HOST:~/crossbow_1.0.4/crossbow-1.0.4$
-------------------

Could you please tell where can I find documentation about what step(s) I am missing?

My goal is to run crossbow using multiple Virtual Machines using hadoop.

Thank you

Michael

**Ben Langmead** · 08-16-2010, 06:53 AM

Hi Michael,

Originally posted by Michael Robinson View Post

Following the Crossbow manual I run cb_hadoop getting:

michael@HOST:~/crossbow_1.0.4/crossbow-1.0.4$ cb_hadoop
Must specify -reference

then I run:

cb_hadoop.pl -reference=$CROSSBOW_REFS/e_coli.jar

which is the location of the jar files for e_coli, then I got this error:

-------------------
michael@HOST:~/crossbow_1.0.4/crossbow-1.0.4$ cb_hadoop.pl -reference=$CROSSBOW_REFS/e_coli.jar
Crossbow expects 'bowtie' to be at path /home/michael/crossbow_1.0.4/crossbow-1.0.4/bin/linux32/bowtie on the workers
Crossbow expects 'soapsnp' to be at path /home/michael/crossbow_1.0.4/crossbow-1.0.4/bin/linux32/soapsnp on the workers

Crossbow job
------------
Hadoop streaming commands in: /tmp/crossbow/invoke.scripts/cb.22704.hadoop.sh
Running...
==========================
Stage 1 of 3. Align
==========================
Sun Aug 15 17:54:31 EDT 2010
packageJobJar: [/home/michael/crossbow_1.0.4/crossbow-1.0.4/Get.pm, /home/michael/crossbow_1.0.4/crossbow-1.0.4/Util.pm, /home/michael/crossbow_1.0.4/
crossbow-1.0.4/Tools.pm, /home/michael/crossbow_1.0.4/crossbow-1.0.4/AWS.pm] [] /tmp/streamjob3580240183983830958.jar tmpDir=null
10/08/15 17:54:32 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
10/08/15 17:54:32 ERROR streaming.StreamJob: Error Launching job : Incomplete HDFS URI, no host: hdfs:/crossbow/intermediate/22704/align
Streaming Job Failed!
Non-zero exitlevel from Align streaming job
michael@HOST:~/crossbow_1.0.4/crossbow-1.0.4$
-------------------

You'll have to specify input and output directories using --input and --output as well. Depending on your version of Hadoop and how it's set up, you may need to specify HDFS URLs that include your namenode's address and port; e.g.: -input= hdfs://localhost:9000/my/input.

Hope this helps,
Ben

**Michael Robinson** · 10-19-2010, 03:32 PM

Crossbow 1.1.0 with Hadoop 0.20.2 Help

Hi,

I am a newbie.

I have Hadoop 0.20.2 running on a multi-node cluster, one server two nodes

Following Crossbow 1.1.0 installation instructions in the manual, I installed it in the server and tested it. no problems.
Now I want to install it (Bowtie and SOAPsnp) in the nodes following the same instructions:

"If you plan to run on a Hadoop cluster, you may need to manually copy the bowtie and soapsnp files to the same path on each of your Hadoop cluster nodes. You can avoid this step by installing bowtie and soapsnp on a filesystem shared by all Hadoop nodes (e.g. an NFS share). You can also skip this step if Hadoop is installed in pseudo distributed mode, meaning that the cluster really consists of one node whose CPUs are treated as distinct slaves."

Could you please tell me: when you say "copy the bowtie and soapsnp files to the same path on each of your Hadoop cluster nodes." how are they related to the server install, do you mean an exact path as the Crossbow path in the server?

Could you give an example of "You can avoid this step by installing bowtie and soapsnp on a filesystem shared by all Hadoop nodes (e.g. an NFS share)."

Also, testing previous Crossbow versions I needed to install other programs such as R, bioconductor, samtools, etc, are those programs not needed anymore?

Thank you

Michael

**Ben Langmead** · 10-19-2010, 03:49 PM

Hi Michael,

Originally posted by Michael Robinson View Post

I have Hadoop 0.20.2 running on a multi-node cluster, one server two nodes

Following Crossbow 1.1.0 installation instructions in the manual, I installed it in the server and tested it. no problems.
Now I want to install it (Bowtie and SOAPsnp) in the nodes following the same instructions:

"If you plan to run on a Hadoop cluster, you may need to manually copy the bowtie and soapsnp files to the same path on each of your Hadoop cluster nodes. You can avoid this step by installing bowtie and soapsnp on a filesystem shared by all Hadoop nodes (e.g. an NFS share). You can also skip this step if Hadoop is installed in pseudo distributed mode, meaning that the cluster really consists of one node whose CPUs are treated as distinct slaves."

Could you please tell me: when you say "copy the bowtie and soapsnp files to the same path on each of your Hadoop cluster nodes." how are they related to the server install, do you mean an exact path as the Crossbow path in the server?

Yes, it's best to install 'bowtie' and 'soapsnp' at the same path on all nodes, including the server. It's not strictly necessary to install those tools on the server at all, but if you don't the "cb_hadoop --test" command will fail when run from the server.

Could you give an example of "You can avoid this step by installing bowtie and soapsnp on a filesystem shared by all Hadoop nodes (e.g. an NFS share)."

All I really mean is that you can set up an NFS share so that all computers in the cluster "see" the same files in certain directories. E.g. you might set up your cluster so that the '/share/crossbow' directory contains a Crossbow install and is NFS-shared across all nodes in the cluster. If you do so, the path '/share/crossbow/bin/linux64/bowtie', for example, will be present on all nodes and you can specify that path using the --bowtie option.

Also, testing previous Crossbow versions I needed to install other programs such as R, bioconductor, samtools, etc, are those programs not needed anymore?

You don't need samtools, no. You never needed R/Bioconductor for Crossbow - just for Myrna (a different though similar tool).

Hope this helps,
Ben

**Michael Robinson** · 10-19-2010, 04:21 PM

Crossbow 1.1.0 with Hadoop 0.20.2 Help

Hi Ben,

I am impressed how fast you replied.

Thanks very much

Michael

**Michael Robinson** · 10-23-2010, 03:59 PM

Hi Ben,

I went the NFS route I think is best because I will only need to modify the server with future updates of Crossbow. I can see the Crossbow folders from the client. thanks

I also added to my .profile on the server and the nodes
export $CROSSBOW_HOME=location where I installed Crossbow

Now I have a new challenge. when I run cb_hadoop --test i get "program not found"

I can see cb_hadoop and I can also do a cat on it and read the code.

hadoop@Hadoop-Server:~/crossbow/crossbow$ ls
?? contrib ??H@@ ReduceWrap.pl
Align.pl Copy.pl LICENSE reftools
AWS.pm Counters.pl LICENSE_APACHE2 soapsnp
bin Counters.pm LICENSE_ARTISTIC Soapsnp.pl
BinSort.pl crossbow-1.1.0.zip LICENSE_GPL2 Tools.pm
cb_emr CrossbowIface.pm LICENSE_GPL3 TUTORIAL
CBFinish.pl crossbow-manual-v1-1-0.odt LICENSES Util.pm
cb_hadoop doc MANUAL VERSION
cb_local example MapWrap.pl Wrap.pm
CheckDirs.pl Get.pm NEWS
hadoop@Hadoop-Server:~/crossbow/crossbow$

I can see cb_hadoop and I can also do a cat on it and read the code.

Please tell me what I am doing wrong?

Thanks

Michael

**Michael Robinson** · 10-24-2010, 02:12 PM

I found the solutions to the cb_hadoop error
I needed to add to my path the location where I install hadoop

I am running the crossbow using the e_coli data sample

Thanks

**carze** · 10-27-2010, 01:26 PM

Hi Ben,

Sorry to hijack this thread but seeing as you have already answered questions in here I was wondering if it is possible to get bowtie to produce SAM output within the crossbow pipeline. Whenever I pass the '--sam' flag to bowtie using the '--bowtie-args' flag I get a segmentation fault during the align step.

Thanks!

**rtgood** · 11-03-2010, 08:16 PM

Hi Ben
I've installed crossbow on a sun 64 bit server runnng fedora 11 and I'm getting this error
i.e no shellscript was produced
Got any idea what I've done wrong???

Rob
[rtgood1@imokurok CROSSBOW_HOME]$ cb_local --input=RAL306.fq --preprocess --reference=$CROSSBOW_REFS/d_mel --output=testcb --all-haploids --cpus=2
print() on closed filehandle JSON at /home/rtgood1/Data/CROSSBOW_HOME/CrossbowIface.pm line 1329.
print() on closed filehandle SH at /home/rtgood1/Data/CROSSBOW_HOME/CrossbowIface.pm line 1331.
print() on closed filehandle HADOOP at /home/rtgood1/Data/CROSSBOW_HOME/CrossbowIface.pm line 1333.

Crossbow job
------------
Local commands in: /tmp/crossbow/invoke.scripts/cb.28975.sh
Running...
sh: /tmp/crossbow/invoke.scripts/cb.28975.sh: No such file or directory

[rtgood1@imokurok tmp]$ cd crossbow/
[rtgood1@imokurok crossbow]$ ls
invoke.scripts
[rtgood1@imokurok crossbow]$ cd invoke.scripts/
[rtgood1@imokurok invoke.scripts]$ ls
[rtgood1@imokurok invoke.scripts]$

**av_d** · 12-30-2010, 02:16 AM

crossbow error

I got some errors while running crossbow.
I've tried both cb_local and cb_hadoop with example ecoli dataset provided by crossbow.

cmd and parameter:

"cb_local --input=reads --output=out_small --reference=e_coli --all-haploid"

Its giving following error:

Align.pl: Retrived 0 counters from previous stages
* Align.pl: Read first line of stdin:
* @SRR014475.1 :1:1:108:111
* Bad number of read tokens ; expected 3 or 5:
* @SRR014475.1 :1:1:108:111
******
Fatal error 1.1.0:M140: Aborting because child with PID 15271 exited abnormally

Any Suggestion?

**karve** · 02-17-2011, 09:13 AM

Similar error in Hadoop - can make it work there

Well, another newbie here, to this stuff at least, but not to IT, so take my suggestions FWIW - on the other hand, I have got it to work all thru the 4 stages so..

I'm using Crossbow 1.1.1 btw.

I tried preprocess in both single machine and Hadoop modes and got this

Bad number of read tokens ; expected 3 or 5:

error in both modes as well. The output ahead and after that message was different for me though:
Mine said:

Written 8909572 spots

From that it was easy to figure out what's happening. In Hadoop mode, for me, the input gut bacteria ( is that right?) file is broken up in 21 files, 18 are legit with data, 2 are empty but still benign, but one file, part_00002 didn't have proper data in it, it had that above text string. So, 20 tasks worked just fine but the one trying to process that part_00002 file failed. So I just deleted that file, edited the shell script to pick up at that point, and voila in hadoop mode it went all the way to the end.

I'm doing everything with keep-all option so the intermediate files are all kept, and I used dry-run mode so that shell-scripts that run things are all kept so I can peek at them and edit them as needed.

Now for me, its on to the next step and to figure out what this all means in the biology aspect :-)

Enjoy.

-Shantanu

**narain** · 05-06-2013, 02:22 PM

Here is the command i am using:

$CROSSBOW_HOME/cb_local --input=small.manifest --preprocess --reference=/home/abi/bioinfo/crossbow/crossbow-1.2.0/crossbow-1.2.0/CROSSBOW_REFS/e_coli --output=output_small --all-haploids --cpus=1 --preprocess-output=preprocess_output --keep-all --fastq-dump=/home/abi/bioinfo/sratoolkit/sratoolkit.2.3.1-centos_linux64/bin/fastq-dump

(I tried it for version 1.1.1 as well) .

I get problems with SRAtoolkit, though I do have it in the path specified in the command line. And I have tested my SRAtoolkit to work well.

******
* Copy.pl: Retrived 0 counters from previous stages
* Copy.pl: Line: ftp://ftp-trace.ncbi.nih.gov/sra/sra...14475.lite.sra 0
* Copy.pl: Not a comment line
* Copy.pl: Doing unpaired entry SRR014475.lite.sra
* Copy.pl: Fetching ftp://ftp-trace.ncbi.nih.gov/sra/sra...14475.lite.sra SRR014475.lite.sra 0
* reporter:counter:Short read preprocessor,Read data fetched,0
* fastq-dump could not be found in SRATOOLKIT_HOME or PATH; please specify --sraconv
******
Fatal error 1.1.1:M140: Aborting because child with PID 17272 exited abnormally

When requesting support, please include the full output printed here.
If a child process was the cause of the error, the output should
include the relevant error message from the child's error log. You may
be asked to provide additional files as well.
Non-zero exitlevel from Preprocess stage

**narain** · 05-07-2013, 08:44 AM

Okay, I fixed that error. I changed the code TOOLS.PM at relevant point.

Topics	Statistics	Last Post
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, Yesterday, 08:59 AM	0 responses 13 views 0 reactions	Last Post by SEQadmin2 Yesterday, 08:59 AM
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, 06-02-2026, 12:03 PM	0 responses 22 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 12:03 PM
DNA Methylation Study Reveals How Epigenetic Changes Pass Between Generations by SEQadmin2 Started by SEQadmin2, 06-02-2026, 11:40 AM	0 responses 19 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 11:40 AM
MetaBeeAI Helps Scientists Process Research Literature Faster by SEQadmin2 Started by SEQadmin2, 05-28-2026, 11:40 AM	0 responses 32 views 0 reactions	Last Post by SEQadmin2 05-28-2026, 11:40 AM

Unconfigured Ad

Crossbow 1.0.0 help please

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News