Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Ashwinub
    replied
    Quick update to previous post

    Hello,

    I would like to update the previous post. I managed to complete 3 steps of Crossbow via EMR command line. But I am getting error in the final step 'Get Counters'.

    I hope someone can help me out with this.

    controller
    Code:
    2013-10-08T03:39:40.661Z INFO Fetching jar file.
    2013-10-08T03:39:42.169Z INFO Working dir /mnt/var/lib/hadoop/steps/5
    2013-10-08T03:39:42.169Z INFO Executing /usr/lib/jvm/java-6-sun/bin/java -cp /home/hadoop/conf:/usr/lib/jvm/java-6-sun/lib/tools.jar:/home/hadoop:/home/hadoop/hadoop-tools.jar:/home/hadoop/hadoop-core.jar:/home/hadoop/hadoop-core-0.20.205.jar:/home/hadoop/hadoop-tools-0.20.205.jar:/home/hadoop/lib/*:/home/hadoop/lib/jetty-ext/* -Xmx1000m -Dhadoop.log.dir=/mnt/var/log/hadoop/steps/5 -Dhadoop.log.file=syslog -Dhadoop.home.dir=/home/hadoop -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,DRFA -Djava.io.tmpdir=/mnt/var/lib/hadoop/steps/5/tmp -Djava.library.path=/home/hadoop/native/Linux-amd64-64 org.apache.hadoop.util.RunJar /home/hadoop/contrib/streaming/hadoop-streaming-0.20.205.jar -D mapred.reduce.tasks=1 -input s3n://crossbow-emr/dummy-input -output s3n://ashwin-test/crossbow-emr-cli_crossbow_counters/ignoreme1 -mapper cat -reducer s3n://crossbow-emr/1.2.1/Counters.pl  --output=S3N://ashwin-test/crossbow-emr-cli_crossbow_counters -cacheFile s3n://crossbow-emr/1.2.1/Get.pm#Get.pm -cacheFile s3n://crossbow-emr/1.2.1/Counters.pm#Counters.pm -cacheFile s3n://crossbow-emr/1.2.1/Util.pm#Util.pm -cacheFile s3n://crossbow-emr/1.2.1/Tools.pm#Tools.pm -cacheFile s3n://crossbow-emr/1.2.1/AWS.pm#AWS.pm
    2013-10-08T03:39:45.175Z INFO Execution ended with ret val 1
    2013-10-08T03:39:45.176Z WARN Step failed with bad retval
    2013-10-08T03:39:46.681Z INFO Step created jobs:
    syslog

    Code:
    2013-10-08 03:39:42,458 WARN org.apache.hadoop.conf.Configuration (main): DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively
    2013-10-08 03:39:43,393 INFO org.apache.hadoop.mapred.JobClient (main): Default number of map tasks: null
    2013-10-08 03:39:43,393 INFO org.apache.hadoop.mapred.JobClient (main): Setting default number of map tasks based on cluster size to : 56
    2013-10-08 03:39:43,393 INFO org.apache.hadoop.mapred.JobClient (main): Default number of reduce tasks: 1
    2013-10-08 03:39:44,940 INFO com.hadoop.compression.lzo.GPLNativeCodeLoader (main): Loaded native gpl library
    2013-10-08 03:39:44,943 WARN com.hadoop.compression.lzo.LzoCodec (main): Could not find build properties file with revision hash
    2013-10-08 03:39:44,943 INFO com.hadoop.compression.lzo.LzoCodec (main): Successfully loaded & initialized native-lzo library [hadoop-lzo rev UNKNOWN]
    2013-10-08 03:39:44,950 WARN org.apache.hadoop.io.compress.snappy.LoadSnappy (main): Snappy native library is available
    2013-10-08 03:39:44,951 INFO org.apache.hadoop.io.compress.snappy.LoadSnappy (main): Snappy native library loaded
    2013-10-08 03:39:45,047 INFO org.apache.hadoop.mapred.JobClient (main): Cleaning up the staging area hdfs://10.159.25.174:9000/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201310080306_0004
    stderr
    Code:
    Exception in thread "main" Status Code: 403, AWS Request ID: 2977B25629DD5007, AWS Error Code: null, AWS Error Message: Forbidden, S3 Extended Request ID: OcPQrMLKUHBKHfdh4ICR5BgEWNzDtUEzc8H2km55h0nCL92RKph4rFXSCEY9y6vq
    	at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:544)
    	at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:284)
    	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:169)
    	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:2619)
    	at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:708)
    	at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:688)
    	at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:100)
    	at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    	at java.lang.reflect.Method.invoke(Method.java:597)
    	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
    	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
    	at org.apache.hadoop.fs.s3native.$Proxy3.retrieveMetadata(Unknown Source)
    	at org.apache.hadoop.fs.s3native.NativeS3FileSystem.listStatus(NativeS3FileSystem.java:730)
    	at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:783)
    	at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:808)
    	at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:185)
    	at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:208)
    	at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1026)
    	at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1018)
    	at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:172)
    	at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:934)
    	at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:887)
    	at java.security.AccessController.doPrivileged(Native Method)
    	at javax.security.auth.Subject.doAs(Subject.java:396)
    	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
    	at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:887)
    	at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:861)
    	at org.apache.hadoop.streaming.StreamJob.submitAndMonitorJob(StreamJob.java:1010)
    	at org.apache.hadoop.streaming.StreamJob.run(StreamJob.java:127)
    	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    	at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:50)
    	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    	at java.lang.reflect.Method.invoke(Method.java:597)
    	at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
    stdout

    Code:
    packageJobJar: [/mnt/var/lib/hadoop/tmp/hadoop-unjar9002137556695792672/] [] /mnt/var/lib/hadoop/steps/5/tmp/streamjob4081705531014015666.jar tmpDir=null
    Thanks

    Leave a comment:


  • Ashwinub
    replied
    Originally posted by Ben Langmead View Post
    Hi all,

    The Crossbow paper, Searching for SNPs with cloud computing came out in provisional form today. Take a look if you're interested.

    Thanks,
    Ben
    Hello Ben,

    I am trying to use Crossbow using the web interface with 5 instances. My emr job keeps failing at Crossbow Step 1: Align with Bowtie. Can you please suggest me something so that I can make it work.

    Total of 7 tasks were run. Out of which 2 tasks failed with 4 attempts each.
    Here are the syslog and stderr of failed task

    stderr
    Code:
    Warning: No TOOLNAME file in tool directory: Bin
    Align.pl: s3cmd: found: /usr/bin/s3cmd, given: 
    Align.pl: jar: found: /usr/lib/jvm/java-6-sun/bin/jar, given: 
    Align.pl: hadoop: found: /home/hadoop/.versions/0.20.205/libexec/../bin/hadoop, given: 
    Align.pl: wget: found: /usr/bin/wget, given: 
    Align.pl: s3cfg: 
    Align.pl: bowtie: found: ./bowtie, given: 
    Align.pl: partition len: 1000000
    Align.pl: ref: S3N://crossbow-refs/hg18.jar
    Align.pl: quality: phred33
    Align.pl: truncate at: 0
    Align.pl: discard mate: 0
    Align.pl: discard reads < truncate len: 0
    Align.pl: SAM passthrough: 0
    Align.pl: Straight through: 0
    Align.pl: local index path: 
    Align.pl: counters: 
    Align.pl: dest dir: /mnt/15049
    Align.pl: bowtie args: --partition 1000000 --mm -t --hadoopout --startverbose -m 1
    Align.pl: ls -al
    Align.pl: total 4
    drwxr-xr-x 3 hadoop hadoop 4096 Oct  8 00:12 .
    drwxr-xr-x 3 hadoop hadoop   17 Oct  8 00:12 ..
    lrwxrwxrwx 1 hadoop hadoop   94 Oct  8 00:12 .job.jar.crc -> /mnt/var/lib/hadoop/mapred/taskTracker/hadoop/jobcache/job_201310072347_0002/jars/.job.jar.crc
    lrwxrwxrwx 1 hadoop hadoop  118 Oct  8 00:12 AWS.pm -> /mnt2/var/lib/hadoop/mapred/taskTracker/distcache/-4647442751582941863_-1803196130_294487920/crossbow-emr/1.2.1/AWS.pm
    lrwxrwxrwx 1 hadoop hadoop  117 Oct  8 00:12 Align.pl -> /mnt3/var/lib/hadoop/mapred/taskTracker/distcache/-949217784113535430_-79486178_294487920/crossbow-emr/1.2.1/Align.pl
    lrwxrwxrwx 1 hadoop hadoop  122 Oct  8 00:12 Counters.pm -> /mnt3/var/lib/hadoop/mapred/taskTracker/distcache/3768974508281659116_-1504224494_294492920/crossbow-emr/1.2.1/Counters.pm
    lrwxrwxrwx 1 hadoop hadoop  116 Oct  8 00:12 Get.pm -> /mnt2/var/lib/hadoop/mapred/taskTracker/distcache/1742673893144693291_-703943522_294499920/crossbow-emr/1.2.1/Get.pm
    lrwxrwxrwx 1 hadoop hadoop   90 Oct  8 00:12 META-INF -> /mnt/var/lib/hadoop/mapred/taskTracker/hadoop/jobcache/job_201310072347_0002/jars/META-INF
    lrwxrwxrwx 1 hadoop hadoop  119 Oct  8 00:12 Tools.pm -> /mnt1/var/lib/hadoop/mapred/taskTracker/distcache/2561649607146715239_-2000054114_294501920/crossbow-emr/1.2.1/Tools.pm
    lrwxrwxrwx 1 hadoop hadoop  118 Oct  8 00:12 Util.pm -> /mnt/var/lib/hadoop/mapred/taskTracker/distcache/-6230906815329997085_-1944427886_294502920/crossbow-emr/1.2.1/Util.pm
    lrwxrwxrwx 1 hadoop hadoop  119 Oct  8 00:12 bowtie -> /mnt2/var/lib/hadoop/mapred/taskTracker/distcache/8717311255330235240_-1327012900_303598920/crossbow-emr/1.2.1/bowtie64
    lrwxrwxrwx 1 hadoop hadoop   89 Oct  8 00:12 job.jar -> /mnt/var/lib/hadoop/mapred/taskTracker/hadoop/jobcache/job_201310072347_0002/jars/job.jar
    lrwxrwxrwx 1 hadoop hadoop   85 Oct  8 00:12 org -> /mnt/var/lib/hadoop/mapred/taskTracker/hadoop/jobcache/job_201310072347_0002/jars/org
    drwxr-xr-x 2 hadoop hadoop    6 Oct  8 00:12 tmp
    Align.pl: Read first line of stdin:
    FN:human.125.1M.1.fq;RN:@chr10_77883106_77883509_0:0:0_4:0:0_0/2	GTTTCTGAGATGCTGCAGAATGCTGCCTCACATCCACCTCTGAGTGAAAGAATTCCTTCACAGATTATATATATTCAGAGAAGGACTATCCTAACCTACAGTTTCGAAGCTTTTATGTCTAAAGA	22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222	AAAAAAAAAAAATTAGCCAGGTATGGCGGCTCACACCTGCGGTCCCAGCTACTTGGGAGACTAAGGTGGGAGGATCACCTGAGCCTGGGAGGTCGAGGCTGCAGTGAGCTGTGATTGTGCCACTG	22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
    Align.pl: Read last line of stdin:
    FN:human.125.1M.1.fq;RN:@chr2_196190269_196190796_1:0:0_4:0:0_9041/2	TATTAAAGCCAGGTGGAGAATAAAACCTGCCTACATTAATTCTATCACCTTCCCTAATTCCTAATTGCCATTTAACCATGGGAAGCCATAACTACCAAAAAGCGGGGCAGAGAAAGCAGAAGATA	22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222	GATAGGAATTGTTAAATTATTTATATAAGTCAAATGAAGCTTTGCAGTCCTGTACTAAAACACTATTTAGTGGGAATAGAATGTAAGAAGCTCTAGAAAATCAATTTGCCACAGTACTCTTATTT	22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
    Align.pl: 500000 reads downloaded
    Align.pl: 500000 reads downloaded
    Align.pl: head -4 .tmp.11427:
    Align.pl: r	GTTTCTGAGATGCTGCAGAATGCTGCCTCACATCCACCTCTGAGTGAAAGAATTCCTTCACAGATTATATATATTCAGAGAAGGACTATCCTAACCTACAGTTTCGAAGCTTTTATGTCTAAAGA	22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222	AAAAAAAAAAAATTAGCCAGGTATGGCGGCTCACACCTGCGGTCCCAGCTACTTGGGAGACTAAGGTGGGAGGATCACCTGAGCCTGGGAGGTCGAGGCTGCAGTGAGCTGTGATTGTGCCACTG	22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
    r	TGCCTGCTCTAGGGTTGAGTAAGCGCAGAAAACTCCTAGCTCACCCTCCATCCTCTGCTGCATTTATTGGGGTGGAGTGGGGAACAGGGAGTTGGACCTTGATAAACTGGGACAGCTGGGCTGAG	22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222	AGTAACCACCTCCTCTCCTTCAACACATCCTTACCTCCCTCCCACCCCAGGTGCCATGGAGAGGTGGGAGGGAGGCAGTGGGCCAGGCAGGGAGATCGATGGCATTCGTGGCCTCTGGCCCAGGG	22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
    r	GATAGAATGAAATCGAAAAAAATCAGTGAAAGGAATCTAATGGAATCATCATCGAATGGAATCGAATGGAATCATCATCGAATAGAATCGAATGGAATCCTCAAAAGTAATTGAATGGAAAAAAC	22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222	ATTCGAGGATTCCATTCGTTTCCACTCGATGTTTATTCCATTCGATTAACTTTGACGATTCCATTCAATTCATTCGTTGATGATTCCTTTCGATTCCATTTGATGATGATTGCATTAGGTATCAT	22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
    r	GACATCTATCTTCTAGATGACCCCCTGTCTTCAGTGGATGCTCATGTACGAAAACGTATTTTTAATAAGGTCTTGGGCCCCAATGGCCTGTTCAAAGGCAAGGTGAGAAATCATTGACCATGATG	22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222	GTCTTAGAAATGTCTTCAGATTCTTAGCAAGCTCTCCTTTTTTGGCCAGGAGAGCACTGTAGGATCCTTTCTCTACAATTGTTCCATTCCCCAGAACTACAATCTCATCCACTTGAGGAAGAAAG	22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
    Align.pl: tail -4 .tmp.11427:
    Align.pl: r	AGAGTGGAGAGTTCTCCTCTCCGTGCTTAAAAACCCCTGAGACTTCAAGAATACTCAAAATAGTACAGATCAAAAGCCCTAAAAATGCATGTACTCCCAGAACAACATAAGCAACTTTAAGAAAT	22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222	GGTTGTCTCTAATTTTTCTTCTGTTGCAAACAGGGGGGCAAAGAATAAACTCATACATGTCATTGCCTACAAGTACAGATGTATTGCTAAGATAAATTCCTAGAAGTGGAATTGTTGAGTCAAAG	22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
    r	GAGAAAAGGTCAAATGGAAATGTTAGAAATAAAAAAAACATGATATCAAGCATAAAGGATTCTATTAATGAGTTCATCCATAACTTTGGCACAGTTGAAGAAGAATCAGTAAAACTGAAGATAGG	22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222	TTATGATGAGATATCTGCCATGATTCTTATTTTTTTCCTTTGTATATTATGTATCTTTTTCTGTCTGCTGTCCAGATTTTTTCTTTGGGTTTTAGAAGTTTGCTGTGATGTGTCTAGGGGTGTGT	22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
    r	TGGCAAGCTGGAGACCCAGGTGAACTGATGGTGTAGTTCCATACGGAATGCCGGCAGGCTTGAGACCCAGAAAGAGCTGATGTTTAGTCTGAGGCTGAAGGCAGGAAGGAACTGATGTCCCAGCT	22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222	GGCAGCAATTAACTCAAGTAAATGCTCTCCTCTGAAGCCTGAAGACCAGAGTTCTGTAGGGTTGATTGCAGAAAGAATCATCAGCTTTTGTTGCAAAACTACTTGGCCGAAAAGTTGGCCTTCTG	22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
    r	TATTAAAGCCAGGTGGAGAATAAAACCTGCCTACATTAATTCTATCACCTTCCCTAATTCCTAATTGCCATTTAACCATGGGAAGCCATAACTACCAAAAAGCGGGGCAGAGAAAGCAGAAGATA	22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222	GATAGGAATTGTTAAATTATTTATATAAGTCAAATGAAGCTTTGCAGTCCTGTACTAAAACACTATTTAGTGGGAATAGAATGTAAGAAGCTCTAGAAAATCAATTTGCCACAGTACTCTTATTT	22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
    Align.pl:   ...ensuring reference jar is installed first
    Get.pm:ensureFetched: called on "S3N://crossbow-refs/hg18.jar"
    Get.pm:ensureFetched: base name "hg18.jar"
    ls -al /mnt/15049/*hg18.jar* /mnt/15049/.*hg18.jar*
    -rw-r--r-- 1 hadoop hadoop          0 Oct  8 00:02 /mnt/15049/.hg18.jar.done
    -rw-r--r-- 1 hadoop hadoop          0 Oct  7 23:55 /mnt/15049/.hg18.jar.lock
    -rw-r--r-- 1 hadoop hadoop 3896493171 Oct  7 23:57 /mnt/15049/hg18.jar
    Pid 11427: Checking for done file /mnt/15049/.hg18.jar.done
    Pid 11427: done file /mnt/15049/.hg18.jar.done was there already; continuing
    Align.pl: Running: ./bowtie --partition 1000000 --mm -t --hadoopout --startverbose -m 1 --12 .tmp.11427 /mnt/15049/index/index 2>.tmp.Align.pl.11427.err
    ./bowtie --partition 1000000 --mm -t --hadoopout --startverbose -m 1 --12 .tmp.11427 /mnt/15049/index/index 2>.tmp.Align.pl.11427.err
    java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 143
    	at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:372)
    	at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:582)
    	at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:135)
    	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
    	at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
    	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:441)
    	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:377)
    	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    	at java.security.AccessController.doPrivileged(Native Method)
    	at javax.security.auth.Subject.doAs(Subject.java:396)
    	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
    	at org.apache.hadoop.mapred.Child.main(Child.java:249)
    syslog
    Code:
    2013-10-08 00:12:31,100 WARN org.apache.hadoop.conf.Configuration (main): DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively
    2013-10-08 00:12:31,779 INFO org.apache.hadoop.util.NativeCodeLoader (main): Loaded the native-hadoop library
    2013-10-08 00:12:31,842 INFO org.apache.hadoop.mapred.TaskRunner (main): Creating symlink: /mnt2/var/lib/hadoop/mapred/taskTracker/distcache/8717311255330235240_-1327012900_303598920/crossbow-emr/1.2.1/bowtie64 <- /mnt3/var/lib/hadoop/mapred/taskTracker/hadoop/jobcache/job_201310072347_0002/attempt_201310072347_0002_m_000001_2/work/bowtie
    2013-10-08 00:12:31,848 INFO org.apache.hadoop.mapred.TaskRunner (main): Creating symlink: /mnt2/var/lib/hadoop/mapred/taskTracker/distcache/1742673893144693291_-703943522_294499920/crossbow-emr/1.2.1/Get.pm <- /mnt3/var/lib/hadoop/mapred/taskTracker/hadoop/jobcache/job_201310072347_0002/attempt_201310072347_0002_m_000001_2/work/Get.pm
    2013-10-08 00:12:31,852 INFO org.apache.hadoop.mapred.TaskRunner (main): Creating symlink: /mnt3/var/lib/hadoop/mapred/taskTracker/distcache/3768974508281659116_-1504224494_294492920/crossbow-emr/1.2.1/Counters.pm <- /mnt3/var/lib/hadoop/mapred/taskTracker/hadoop/jobcache/job_201310072347_0002/attempt_201310072347_0002_m_000001_2/work/Counters.pm
    2013-10-08 00:12:31,856 INFO org.apache.hadoop.mapred.TaskRunner (main): Creating symlink: /mnt/var/lib/hadoop/mapred/taskTracker/distcache/-6230906815329997085_-1944427886_294502920/crossbow-emr/1.2.1/Util.pm <- /mnt3/var/lib/hadoop/mapred/taskTracker/hadoop/jobcache/job_201310072347_0002/attempt_201310072347_0002_m_000001_2/work/Util.pm
    2013-10-08 00:12:31,859 INFO org.apache.hadoop.mapred.TaskRunner (main): Creating symlink: /mnt1/var/lib/hadoop/mapred/taskTracker/distcache/2561649607146715239_-2000054114_294501920/crossbow-emr/1.2.1/Tools.pm <- /mnt3/var/lib/hadoop/mapred/taskTracker/hadoop/jobcache/job_201310072347_0002/attempt_201310072347_0002_m_000001_2/work/Tools.pm
    2013-10-08 00:12:31,863 INFO org.apache.hadoop.mapred.TaskRunner (main): Creating symlink: /mnt2/var/lib/hadoop/mapred/taskTracker/distcache/-4647442751582941863_-1803196130_294487920/crossbow-emr/1.2.1/AWS.pm <- /mnt3/var/lib/hadoop/mapred/taskTracker/hadoop/jobcache/job_201310072347_0002/attempt_201310072347_0002_m_000001_2/work/AWS.pm
    2013-10-08 00:12:31,867 INFO org.apache.hadoop.mapred.TaskRunner (main): Creating symlink: /mnt3/var/lib/hadoop/mapred/taskTracker/distcache/-949217784113535430_-79486178_294487920/crossbow-emr/1.2.1/Align.pl <- /mnt3/var/lib/hadoop/mapred/taskTracker/hadoop/jobcache/job_201310072347_0002/attempt_201310072347_0002_m_000001_2/work/Align.pl
    2013-10-08 00:12:31,874 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager (main): Creating symlink: /mnt/var/lib/hadoop/mapred/taskTracker/hadoop/jobcache/job_201310072347_0002/jars/job.jar <- /mnt3/var/lib/hadoop/mapred/taskTracker/hadoop/jobcache/job_201310072347_0002/attempt_201310072347_0002_m_000001_2/work/job.jar
    2013-10-08 00:12:31,878 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager (main): Creating symlink: /mnt/var/lib/hadoop/mapred/taskTracker/hadoop/jobcache/job_201310072347_0002/jars/.job.jar.crc <- /mnt3/var/lib/hadoop/mapred/taskTracker/hadoop/jobcache/job_201310072347_0002/attempt_201310072347_0002_m_000001_2/work/.job.jar.crc
    2013-10-08 00:12:31,881 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager (main): Creating symlink: /mnt/var/lib/hadoop/mapred/taskTracker/hadoop/jobcache/job_201310072347_0002/jars/META-INF <- /mnt3/var/lib/hadoop/mapred/taskTracker/hadoop/jobcache/job_201310072347_0002/attempt_201310072347_0002_m_000001_2/work/META-INF
    2013-10-08 00:12:31,885 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager (main): Creating symlink: /mnt/var/lib/hadoop/mapred/taskTracker/hadoop/jobcache/job_201310072347_0002/jars/org <- /mnt3/var/lib/hadoop/mapred/taskTracker/hadoop/jobcache/job_201310072347_0002/attempt_201310072347_0002_m_000001_2/work/org
    2013-10-08 00:12:32,166 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl (main): Source name ugi already exists!
    2013-10-08 00:12:32,264 INFO org.apache.hadoop.mapred.MapTask (main): Host name: ip-10-181-4-225.ec2.internal
    2013-10-08 00:12:32,282 INFO org.apache.hadoop.util.ProcessTree (main): setsid exited with exit code 0
    2013-10-08 00:12:32,290 INFO org.apache.hadoop.mapred.Task (main):  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1b980630
    2013-10-08 00:12:32,418 INFO com.hadoop.compression.lzo.GPLNativeCodeLoader (main): Loaded native gpl library
    2013-10-08 00:12:32,431 WARN com.hadoop.compression.lzo.LzoCodec (main): Could not find build properties file with revision hash
    2013-10-08 00:12:32,431 INFO com.hadoop.compression.lzo.LzoCodec (main): Successfully loaded & initialized native-lzo library [hadoop-lzo rev UNKNOWN]
    2013-10-08 00:12:32,440 WARN org.apache.hadoop.io.compress.snappy.LoadSnappy (main): Snappy native library is available
    2013-10-08 00:12:32,441 INFO org.apache.hadoop.io.compress.snappy.LoadSnappy (main): Snappy native library loaded
    2013-10-08 00:12:32,450 INFO org.apache.hadoop.io.compress.zlib.ZlibFactory (main): Successfully loaded & initialized native-zlib library
    2013-10-08 00:12:32,451 INFO org.apache.hadoop.mapred.MapTask (main): numReduceTasks: 0
    2013-10-08 00:12:32,537 INFO org.apache.hadoop.streaming.PipeMapRed (main): PipeMapRed exec [/mnt3/var/lib/hadoop/mapred/taskTracker/hadoop/jobcache/job_201310072347_0002/attempt_201310072347_0002_m_000001_2/work/./Align.pl, --discard-reads=0, --ref=S3N://crossbow-refs/hg18.jar, --destdir=/mnt/15049, --partlen=1000000, --qual=phred33, --truncate=0, --, --partition, 1000000, --mm, -t, --hadoopout, --startverbose, -m, 1]
    2013-10-08 00:12:32,631 INFO org.apache.hadoop.streaming.PipeMapRed (main): R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
    2013-10-08 00:12:32,632 INFO org.apache.hadoop.streaming.PipeMapRed (main): R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s]
    2013-10-08 00:12:32,637 INFO org.apache.hadoop.streaming.PipeMapRed (main): R/W/S=100/0/0 in:NA [rec/s] out:NA [rec/s]
    2013-10-08 00:12:33,169 INFO org.apache.hadoop.streaming.PipeMapRed (main): R/W/S=1000/0/0 in:NA [rec/s] out:NA [rec/s]
    2013-10-08 00:12:36,782 INFO org.apache.hadoop.streaming.PipeMapRed (main): R/W/S=10000/0/0 in:2500=10000/4 [rec/s] out:0=0/4 [rec/s]
    2013-10-08 00:13:13,024 INFO org.apache.hadoop.streaming.PipeMapRed (main): R/W/S=100000/0/0 in:2500=100000/40 [rec/s] out:0=0/40 [rec/s]
    2013-10-08 00:13:52,968 INFO org.apache.hadoop.streaming.PipeMapRed (main): R/W/S=200000/0/0 in:2500=200000/80 [rec/s] out:0=0/80 [rec/s]
    2013-10-08 00:14:32,916 INFO org.apache.hadoop.streaming.PipeMapRed (main): R/W/S=300000/0/0 in:2500=300000/120 [rec/s] out:0=0/120 [rec/s]
    2013-10-08 00:15:12,966 INFO org.apache.hadoop.streaming.PipeMapRed (main): R/W/S=400000/0/0 in:2500=400000/160 [rec/s] out:0=0/160 [rec/s]
    2013-10-08 00:15:52,985 INFO org.apache.hadoop.streaming.PipeMapRed (main): R/W/S=500000/0/0 in:2500=500000/200 [rec/s] out:0=0/200 [rec/s]
    2013-10-08 00:25:58,776 INFO org.apache.hadoop.streaming.PipeMapRed (Thread-14): MRErrorThread done
    2013-10-08 00:25:58,777 INFO org.apache.hadoop.streaming.PipeMapRed (main): PipeMapRed failed!
    Looking forward to a reply.

    Thanks

    Leave a comment:


  • narain
    replied
    How to work with Local files in Crossbow

    Dear All

    I am using the version 1.2.0 of Crossbow. I have managed to get it working for a file specified in .manifest input file that points to an FTP server.

    I am wondering how would you get it to work for a input data that is locally already on your computer, such as .fastq or .fastq.gz file or .sra file ? This will avoid internet dependency.

    Also, I tried the bowtie option by specifying the path to bowtie2 and crossbow did not work. Does Crossbow work only for bowtie1 and no other aligner ?

    Narain

    Leave a comment:


  • Serena Rhie
    replied
    My job fails on Step 3. Yeah, it's quite confusable- I started a new Job for simplicity.

    I've got 4 attempts on "task attempts", and each stderr and syslog gives almost the same message. Seems like hadoop is throwing a null pointer exception.

    stderr:
    Code:
    log4j:WARN No appenders could be found for logger (org.apache.hadoop.conf.Configuration).
    log4j:WARN Please initialize the log4j system properly.
    s3cmd: found: /usr/bin/s3cmd, given: 
    jar: found: /usr/lib/jvm/java-6-sun/bin/jar, given: 
    hadoop: found: /home/hadoop/bin/../bin/hadoop, given: 
    wget: found: /usr/bin/wget, given: 
    s3cfg: 
    cmap_file: 
    cmap_jar: S3N://crossbow-refs/e_coli.jar
    local destination dir: /mnt/14270
    Output dir: S3N://pings-ewha/e-coli/crossbow-mediated
    Ensuring cmap jar is installed
    Get.pm:ensureFetched: called on "S3N://crossbow-refs/e_coli.jar"
    Get.pm:ensureFetched: base name "e_coli.jar"
    mkdir -p /mnt/14270 >&2 2>/dev/null
    ls -al /mnt/14270/*e_coli.jar* /mnt/14270/.*e_coli.jar*
    -rw-r--r-- 1 hadoop hadoop 9990385 2010-08-17 01:16 /mnt/14270/e_coli.jar
    -rw-r--r-- 1 hadoop hadoop       0 2010-08-17 01:16 /mnt/14270/.e_coli.jar.done
    -rw-r--r-- 1 hadoop hadoop       0 2010-08-17 01:16 /mnt/14270/.e_coli.jar.lock
    Pid 22358: Checking for done file /mnt/14270/.e_coli.jar.done
    Pid 22358: done file /mnt/14270/.e_coli.jar.done was there already; continuing
    Examining extracted files
    find /mnt/14270
    /mnt/14270
    /mnt/14270/META-INF
    /mnt/14270/META-INF/MANIFEST.MF
    /mnt/14270/snps
    /mnt/14270/sequences
    /mnt/14270/sequences/chr0.fa
    /mnt/14270/index
    /mnt/14270/index/index.rev.1.ebwt
    /mnt/14270/index/index.1.ebwt
    /mnt/14270/index/index.rev.2.ebwt
    /mnt/14270/index/index.2.ebwt
    /mnt/14270/index/index.4.ebwt
    /mnt/14270/index/index.3.ebwt
    /mnt/14270/.e_coli.jar.lock
    /mnt/14270/cmap.txt
    /mnt/14270/.e_coli.jar.done
    /mnt/14270/e_coli.jar
    java.lang.RuntimeException: java.lang.NullPointerException
    	at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:386)
    	at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:582)
    	at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137)
    	at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:477)
    	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:415)
    	at org.apache.hadoop.mapred.Child.main(Child.java:170)
    Caused by: java.lang.NullPointerException
    	at org.apache.hadoop.streaming.PipeMapRed$MRErrorThread.incrCounter(PipeMapRed.java:549)
    	at org.apache.hadoop.streaming.PipeMapRed$MRErrorThread.run(PipeMapRed.java:490)
    And here are the syslog:

    Code:
    2010-08-17 01:29:12,507 INFO org.apache.hadoop.metrics.jvm.JvmMetrics (main): Initializing JVM Metrics with processName=SHUFFLE, sessionId=
    2010-08-17 01:29:12,701 INFO org.apache.hadoop.mapred.ReduceTask (main): Host name: domU-12-31-39-0C-68-41.compute-1.internal
    2010-08-17 01:29:14,770 INFO com.hadoop.compression.lzo.GPLNativeCodeLoader (main): Loaded native gpl library
    2010-08-17 01:29:14,771 INFO com.hadoop.compression.lzo.LzoCodec (main): Successfully loaded & initialized native-lzo library
    2010-08-17 01:29:14,781 INFO org.apache.hadoop.mapred.ReduceTask (main): ShuffleRamManager: MemoryLimit=488066240, MaxSingleShuffleLimit=122016560
    2010-08-17 01:29:14,786 INFO org.apache.hadoop.io.compress.CodecPool (main): Got brand-new decompressor
    2010-08-17 01:29:14,786 INFO org.apache.hadoop.io.compress.CodecPool (main): Got brand-new decompressor
    2010-08-17 01:29:14,787 INFO org.apache.hadoop.io.compress.CodecPool (main): Got brand-new decompressor
    2010-08-17 01:29:14,788 INFO org.apache.hadoop.io.compress.CodecPool (main): Got brand-new decompressor
    2010-08-17 01:29:14,789 INFO org.apache.hadoop.io.compress.CodecPool (main): Got brand-new decompressor
    2010-08-17 01:29:14,790 INFO org.apache.hadoop.io.compress.CodecPool (main): Got brand-new decompressor
    2010-08-17 01:29:14,791 INFO org.apache.hadoop.io.compress.CodecPool (main): Got brand-new decompressor
    2010-08-17 01:29:14,792 INFO org.apache.hadoop.io.compress.CodecPool (main): Got brand-new decompressor
    2010-08-17 01:29:14,792 INFO org.apache.hadoop.io.compress.CodecPool (main): Got brand-new decompressor
    2010-08-17 01:29:14,793 INFO org.apache.hadoop.io.compress.CodecPool (main): Got brand-new decompressor
    2010-08-17 01:29:14,794 INFO org.apache.hadoop.io.compress.CodecPool (main): Got brand-new decompressor
    2010-08-17 01:29:14,795 INFO org.apache.hadoop.io.compress.CodecPool (main): Got brand-new decompressor
    2010-08-17 01:29:14,796 INFO org.apache.hadoop.io.compress.CodecPool (main): Got brand-new decompressor
    2010-08-17 01:29:14,796 INFO org.apache.hadoop.io.compress.CodecPool (main): Got brand-new decompressor
    2010-08-17 01:29:14,798 INFO org.apache.hadoop.io.compress.CodecPool (main): Got brand-new decompressor
    2010-08-17 01:29:14,798 INFO org.apache.hadoop.io.compress.CodecPool (main): Got brand-new decompressor
    2010-08-17 01:29:14,799 INFO org.apache.hadoop.io.compress.CodecPool (main): Got brand-new decompressor
    2010-08-17 01:29:14,800 INFO org.apache.hadoop.io.compress.CodecPool (main): Got brand-new decompressor
    2010-08-17 01:29:14,801 INFO org.apache.hadoop.io.compress.CodecPool (main): Got brand-new decompressor
    2010-08-17 01:29:14,802 INFO org.apache.hadoop.io.compress.CodecPool (main): Got brand-new decompressor
    2010-08-17 01:29:14,805 INFO org.apache.hadoop.mapred.ReduceTask (Thread for merging on-disk files): attempt_201008170108_0004_r_000001_0 Thread started: Thread for merging on-disk files
    2010-08-17 01:29:14,805 INFO org.apache.hadoop.mapred.ReduceTask (Thread for merging on-disk files): attempt_201008170108_0004_r_000001_0 Thread waiting: Thread for merging on-disk files
    2010-08-17 01:29:14,806 INFO org.apache.hadoop.mapred.ReduceTask (Thread for merging in memory files): attempt_201008170108_0004_r_000001_0 Thread started: Thread for merging in memory files
    2010-08-17 01:29:14,807 INFO org.apache.hadoop.mapred.ReduceTask (main): attempt_201008170108_0004_r_000001_0 Need another 32 map output(s) where 0 is already in progress
    2010-08-17 01:29:14,807 INFO org.apache.hadoop.mapred.ReduceTask (main): attempt_201008170108_0004_r_000001_0 Scheduled 0 outputs (0 slow hosts and0 dup hosts)
    2010-08-17 01:29:14,807 INFO org.apache.hadoop.mapred.ReduceTask (Thread for polling Map Completion Events): attempt_201008170108_0004_r_000001_0 Thread started: Thread for polling Map Completion Events
    2010-08-17 01:29:14,815 INFO org.apache.hadoop.mapred.ReduceTask (Thread for polling Map Completion Events): attempt_201008170108_0004_r_000001_0: Got 9 new map-outputs
    2010-08-17 01:29:17,828 INFO org.apache.hadoop.mapred.ReduceTask (Thread for polling Map Completion Events): attempt_201008170108_0004_r_000001_0: Got 1 new map-outputs
    2010-08-17 01:29:19,822 INFO org.apache.hadoop.mapred.ReduceTask (main): attempt_201008170108_0004_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
    2010-08-17 01:29:19,829 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.0): header: attempt_201008170108_0004_m_000000_0, compressed len: 18, decompressed len: 2
    2010-08-17 01:29:19,829 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.0): Shuffling 2 bytes (18 raw bytes) into RAM from attempt_201008170108_0004_m_000000_0
    2010-08-17 01:29:19,833 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.0): Read 2 bytes from map-output for attempt_201008170108_0004_m_000000_0
    2010-08-17 01:29:19,833 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.0): Rec #1 from attempt_201008170108_0004_m_000000_0 -> (-1, -1) from domU-12-31-39-0C-68-41.compute-1.internal
    2010-08-17 01:29:19,835 INFO org.apache.hadoop.mapred.ReduceTask (main): attempt_201008170108_0004_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
    2010-08-17 01:29:19,839 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.0): header: attempt_201008170108_0004_m_000001_0, compressed len: 18, decompressed len: 2
    2010-08-17 01:29:19,840 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.0): Shuffling 2 bytes (18 raw bytes) into RAM from attempt_201008170108_0004_m_000001_0
    2010-08-17 01:29:19,840 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.0): Read 2 bytes from map-output for attempt_201008170108_0004_m_000001_0
    2010-08-17 01:29:19,840 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.0): Rec #1 from attempt_201008170108_0004_m_000001_0 -> (-1, -1) from domU-12-31-39-0C-68-41.compute-1.internal
    2010-08-17 01:29:19,840 INFO org.apache.hadoop.mapred.ReduceTask (main): attempt_201008170108_0004_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
    2010-08-17 01:29:19,847 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.5): header: attempt_201008170108_0004_m_000002_0, compressed len: 18, decompressed len: 2
    2010-08-17 01:29:19,847 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.5): Shuffling 2 bytes (18 raw bytes) into RAM from attempt_201008170108_0004_m_000002_0
    2010-08-17 01:29:19,848 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.5): Read 2 bytes from map-output for attempt_201008170108_0004_m_000002_0
    2010-08-17 01:29:19,848 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.5): Rec #1 from attempt_201008170108_0004_m_000002_0 -> (-1, -1) from domU-12-31-39-0C-68-41.compute-1.internal
    2010-08-17 01:29:19,848 INFO org.apache.hadoop.mapred.ReduceTask (main): attempt_201008170108_0004_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
    2010-08-17 01:29:19,886 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.10): header: attempt_201008170108_0004_m_000003_0, compressed len: 18, decompressed len: 2
    2010-08-17 01:29:19,886 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.10): Shuffling 2 bytes (18 raw bytes) into RAM from attempt_201008170108_0004_m_000003_0
    2010-08-17 01:29:19,887 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.10): Read 2 bytes from map-output for attempt_201008170108_0004_m_000003_0
    2010-08-17 01:29:19,887 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.10): Rec #1 from attempt_201008170108_0004_m_000003_0 -> (-1, -1) from domU-12-31-39-0C-68-41.compute-1.internal
    2010-08-17 01:29:19,887 INFO org.apache.hadoop.mapred.ReduceTask (main): attempt_201008170108_0004_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
    2010-08-17 01:29:19,889 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.15): header: attempt_201008170108_0004_m_000004_0, compressed len: 18, decompressed len: 2
    2010-08-17 01:29:19,889 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.15): Shuffling 2 bytes (18 raw bytes) into RAM from attempt_201008170108_0004_m_000004_0
    2010-08-17 01:29:19,899 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.15): Read 2 bytes from map-output for attempt_201008170108_0004_m_000004_0
    2010-08-17 01:29:19,899 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.15): Rec #1 from attempt_201008170108_0004_m_000004_0 -> (-1, -1) from domU-12-31-39-0C-68-41.compute-1.internal
    2010-08-17 01:29:19,899 INFO org.apache.hadoop.mapred.ReduceTask (main): attempt_201008170108_0004_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
    2010-08-17 01:29:19,901 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.18): header: attempt_201008170108_0004_m_000005_0, compressed len: 18, decompressed len: 2
    2010-08-17 01:29:19,901 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.18): Shuffling 2 bytes (18 raw bytes) into RAM from attempt_201008170108_0004_m_000005_0
    2010-08-17 01:29:19,902 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.18): Read 2 bytes from map-output for attempt_201008170108_0004_m_000005_0
    2010-08-17 01:29:19,902 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.18): Rec #1 from attempt_201008170108_0004_m_000005_0 -> (-1, -1) from domU-12-31-39-0C-68-41.compute-1.internal
    2010-08-17 01:29:19,902 INFO org.apache.hadoop.mapred.ReduceTask (main): attempt_201008170108_0004_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
    2010-08-17 01:29:19,904 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.5): header: attempt_201008170108_0004_m_000006_0, compressed len: 18, decompressed len: 2
    2010-08-17 01:29:19,904 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.5): Shuffling 2 bytes (18 raw bytes) into RAM from attempt_201008170108_0004_m_000006_0
    2010-08-17 01:29:19,904 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.5): Read 2 bytes from map-output for attempt_201008170108_0004_m_000006_0
    2010-08-17 01:29:19,904 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.5): Rec #1 from attempt_201008170108_0004_m_000006_0 -> (-1, -1) from domU-12-31-39-0C-68-41.compute-1.internal
    2010-08-17 01:29:19,905 INFO org.apache.hadoop.mapred.ReduceTask (main): attempt_201008170108_0004_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
    2010-08-17 01:29:19,907 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.14): header: attempt_201008170108_0004_m_000007_0, compressed len: 18, decompressed len: 2
    2010-08-17 01:29:19,907 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.14): Shuffling 2 bytes (18 raw bytes) into RAM from attempt_201008170108_0004_m_000007_0
    2010-08-17 01:29:19,907 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.14): Read 2 bytes from map-output for attempt_201008170108_0004_m_000007_0
    2010-08-17 01:29:19,907 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.14): Rec #1 from attempt_201008170108_0004_m_000007_0 -> (-1, -1) from domU-12-31-39-0C-68-41.compute-1.internal
    2010-08-17 01:29:19,907 INFO org.apache.hadoop.mapred.ReduceTask (main): attempt_201008170108_0004_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
    2010-08-17 01:29:19,909 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.12): header: attempt_201008170108_0004_m_000008_0, compressed len: 18, decompressed len: 2
    2010-08-17 01:29:19,909 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.12): Shuffling 2 bytes (18 raw bytes) into RAM from attempt_201008170108_0004_m_000008_0
    2010-08-17 01:29:19,910 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.12): Read 2 bytes from map-output for attempt_201008170108_0004_m_000008_0
    2010-08-17 01:29:19,910 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.12): Rec #1 from attempt_201008170108_0004_m_000008_0 -> (-1, -1) from domU-12-31-39-0C-68-41.compute-1.internal
    2010-08-17 01:29:19,910 INFO org.apache.hadoop.mapred.ReduceTask (main): attempt_201008170108_0004_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
    2010-08-17 01:29:19,912 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.13): header: attempt_201008170108_0004_m_000009_0, compressed len: 18, decompressed len: 2
    2010-08-17 01:29:19,912 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.13): Shuffling 2 bytes (18 raw bytes) into RAM from attempt_201008170108_0004_m_000009_0
    2010-08-17 01:29:19,913 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.13): Read 2 bytes from map-output for attempt_201008170108_0004_m_000009_0
    2010-08-17 01:29:19,913 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.13): Rec #1 from attempt_201008170108_0004_m_000009_0 -> (-1, -1) from domU-12-31-39-0C-68-41.compute-1.internal
    2010-08-17 01:29:20,851 INFO org.apache.hadoop.mapred.ReduceTask (Thread for polling Map Completion Events): attempt_201008170108_0004_r_000001_0: Got 1 new map-outputs
    2010-08-17 01:29:23,864 INFO org.apache.hadoop.mapred.ReduceTask (Thread for polling Map Completion Events): attempt_201008170108_0004_r_000001_0: Got 1 new map-outputs
    2010-08-17 01:29:24,914 INFO org.apache.hadoop.mapred.ReduceTask (main): attempt_201008170108_0004_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
    2010-08-17 01:29:24,920 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.13): header: attempt_201008170108_0004_m_000010_0, compressed len: 18, decompressed len: 2
    2010-08-17 01:29:24,920 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.13): Shuffling 2 bytes (18 raw bytes) into RAM from attempt_201008170108_0004_m_000010_0
    2010-08-17 01:29:24,920 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.13): Read 2 bytes from map-output for attempt_201008170108_0004_m_000010_0
    2010-08-17 01:29:24,921 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.13): Rec #1 from attempt_201008170108_0004_m_000010_0 -> (-1, -1) from domU-12-31-39-0C-68-41.compute-1.internal
    2010-08-17 01:29:24,921 INFO org.apache.hadoop.mapred.ReduceTask (main): attempt_201008170108_0004_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
    2010-08-17 01:29:24,925 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.10): header: attempt_201008170108_0004_m_000011_0, compressed len: 18, decompressed len: 2
    2010-08-17 01:29:24,925 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.10): Shuffling 2 bytes (18 raw bytes) into RAM from attempt_201008170108_0004_m_000011_0
    2010-08-17 01:29:24,925 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.10): Read 2 bytes from map-output for attempt_201008170108_0004_m_000011_0
    2010-08-17 01:29:24,925 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.10): Rec #1 from attempt_201008170108_0004_m_000011_0 -> (-1, -1) from domU-12-31-39-0C-68-41.compute-1.internal
    2010-08-17 01:29:26,880 INFO org.apache.hadoop.mapred.ReduceTask (Thread for polling Map Completion Events): attempt_201008170108_0004_r_000001_0: Got 1 new map-outputs
    2010-08-17 01:29:29,895 INFO org.apache.hadoop.mapred.ReduceTask (Thread for polling Map Completion Events): attempt_201008170108_0004_r_000001_0: Got 1 new map-outputs
    2010-08-17 01:29:29,928 INFO org.apache.hadoop.mapred.ReduceTask (main): attempt_201008170108_0004_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
    2010-08-17 01:29:29,936 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.0): header: attempt_201008170108_0004_m_000012_0, compressed len: 18, decompressed len: 2
    2010-08-17 01:29:29,937 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.0): Shuffling 2 bytes (18 raw bytes) into RAM from attempt_201008170108_0004_m_000012_0
    2010-08-17 01:29:29,937 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.0): Read 2 bytes from map-output for attempt_201008170108_0004_m_000012_0
    2010-08-17 01:29:29,937 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.0): Rec #1 from attempt_201008170108_0004_m_000012_0 -> (-1, -1) from domU-12-31-39-0C-68-41.compute-1.internal
    2010-08-17 01:29:29,937 INFO org.apache.hadoop.mapred.ReduceTask (main): attempt_201008170108_0004_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
    2010-08-17 01:29:29,941 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.1): header: attempt_201008170108_0004_m_000013_0, compressed len: 18, decompressed len: 2
    2010-08-17 01:29:29,941 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.1): Shuffling 2 bytes (18 raw bytes) into RAM from attempt_201008170108_0004_m_000013_0
    2010-08-17 01:29:29,941 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.1): Read 2 bytes from map-output for attempt_201008170108_0004_m_000013_0
    2010-08-17 01:29:29,942 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.1): Rec #1 from attempt_201008170108_0004_m_000013_0 -> (-1, -1) from domU-12-31-39-0C-68-41.compute-1.internal
    2010-08-17 01:29:32,904 INFO org.apache.hadoop.mapred.ReduceTask (Thread for polling Map Completion Events): attempt_201008170108_0004_r_000001_0: Got 1 new map-outputs
    2010-08-17 01:29:34,943 INFO org.apache.hadoop.mapred.ReduceTask (main): attempt_201008170108_0004_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
    2010-08-17 01:29:34,948 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.1): header: attempt_201008170108_0004_m_000014_0, compressed len: 18, decompressed len: 2
    2010-08-17 01:29:34,948 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.1): Shuffling 2 bytes (18 raw bytes) into RAM from attempt_201008170108_0004_m_000014_0
    2010-08-17 01:29:34,949 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.1): Read 2 bytes from map-output for attempt_201008170108_0004_m_000014_0
    2010-08-17 01:29:34,949 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.1): Rec #1 from attempt_201008170108_0004_m_000014_0 -> (-1, -1) from domU-12-31-39-0C-68-41.compute-1.internal
    2010-08-17 01:29:35,912 INFO org.apache.hadoop.mapred.ReduceTask (Thread for polling Map Completion Events): attempt_201008170108_0004_r_000001_0: Got 1 new map-outputs
    2010-08-17 01:29:38,935 INFO org.apache.hadoop.mapred.ReduceTask (Thread for polling Map Completion Events): attempt_201008170108_0004_r_000001_0: Got 1 new map-outputs
    2010-08-17 01:29:39,950 INFO org.apache.hadoop.mapred.ReduceTask (main): attempt_201008170108_0004_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
    2010-08-17 01:29:39,971 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.12): header: attempt_201008170108_0004_m_000015_0, compressed len: 18, decompressed len: 2
    2010-08-17 01:29:39,971 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.12): Shuffling 2 bytes (18 raw bytes) into RAM from attempt_201008170108_0004_m_000015_0
    2010-08-17 01:29:39,971 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.12): Read 2 bytes from map-output for attempt_201008170108_0004_m_000015_0
    2010-08-17 01:29:39,972 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.12): Rec #1 from attempt_201008170108_0004_m_000015_0 -> (-1, -1) from domU-12-31-39-0C-68-41.compute-1.internal
    2010-08-17 01:29:39,972 INFO org.apache.hadoop.mapred.ReduceTask (main): attempt_201008170108_0004_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
    2010-08-17 01:29:39,978 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.1): header: attempt_201008170108_0004_m_000016_0, compressed len: 18, decompressed len: 2
    2010-08-17 01:29:39,978 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.1): Shuffling 2 bytes (18 raw bytes) into RAM from attempt_201008170108_0004_m_000016_0
    2010-08-17 01:29:39,979 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.1): Read 2 bytes from map-output for attempt_201008170108_0004_m_000016_0
    2010-08-17 01:29:39,979 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.1): Rec #1 from attempt_201008170108_0004_m_000016_0 -> (-1, -1) from domU-12-31-39-0C-68-41.compute-1.internal
    2010-08-17 01:29:41,942 INFO org.apache.hadoop.mapred.ReduceTask (Thread for polling Map Completion Events): attempt_201008170108_0004_r_000001_0: Got 1 new map-outputs
    2010-08-17 01:29:44,952 INFO org.apache.hadoop.mapred.ReduceTask (Thread for polling Map Completion Events): attempt_201008170108_0004_r_000001_0: Got 1 new map-outputs
    2010-08-17 01:29:44,984 INFO org.apache.hadoop.mapred.ReduceTask (main): attempt_201008170108_0004_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
    2010-08-17 01:29:44,990 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.16): header: attempt_201008170108_0004_m_000017_0, compressed len: 18, decompressed len: 2
    2010-08-17 01:29:44,990 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.16): Shuffling 2 bytes (18 raw bytes) into RAM from attempt_201008170108_0004_m_000017_0
    2010-08-17 01:29:44,990 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.16): Read 2 bytes from map-output for attempt_201008170108_0004_m_000017_0
    2010-08-17 01:29:44,990 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.16): Rec #1 from attempt_201008170108_0004_m_000017_0 -> (-1, -1) from domU-12-31-39-0C-68-41.compute-1.internal
    2010-08-17 01:29:44,991 INFO org.apache.hadoop.mapred.ReduceTask (main): attempt_201008170108_0004_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
    2010-08-17 01:29:44,993 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.16): header: attempt_201008170108_0004_m_000018_0, compressed len: 18, decompressed len: 2
    2010-08-17 01:29:44,993 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.16): Shuffling 2 bytes (18 raw bytes) into RAM from attempt_201008170108_0004_m_000018_0
    2010-08-17 01:29:44,994 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.16): Read 2 bytes from map-output for attempt_201008170108_0004_m_000018_0
    2010-08-17 01:29:44,994 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.16): Rec #1 from attempt_201008170108_0004_m_000018_0 -> (-1, -1) from domU-12-31-39-0C-68-41.compute-1.internal
    2010-08-17 01:29:47,962 INFO org.apache.hadoop.mapred.ReduceTask (Thread for polling Map Completion Events): attempt_201008170108_0004_r_000001_0: Got 1 new map-outputs
    2010-08-17 01:29:50,000 INFO org.apache.hadoop.mapred.ReduceTask (main): attempt_201008170108_0004_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
    2010-08-17 01:29:50,004 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.16): header: attempt_201008170108_0004_m_000019_0, compressed len: 18, decompressed len: 2
    2010-08-17 01:29:50,004 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.16): Shuffling 2 bytes (18 raw bytes) into RAM from attempt_201008170108_0004_m_000019_0
    2010-08-17 01:29:50,005 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.16): Read 2 bytes from map-output for attempt_201008170108_0004_m_000019_0
    2010-08-17 01:29:50,005 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.16): Rec #1 from attempt_201008170108_0004_m_000019_0 -> (-1, -1) from domU-12-31-39-0C-68-41.compute-1.internal
    2010-08-17 01:29:50,970 INFO org.apache.hadoop.mapred.ReduceTask (Thread for polling Map Completion Events): attempt_201008170108_0004_r_000001_0: Got 1 new map-outputs
    2010-08-17 01:29:53,979 INFO org.apache.hadoop.mapred.ReduceTask (Thread for polling Map Completion Events): attempt_201008170108_0004_r_000001_0: Got 1 new map-outputs
    2010-08-17 01:29:55,006 INFO org.apache.hadoop.mapred.ReduceTask (main): attempt_201008170108_0004_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
    2010-08-17 01:29:55,011 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.10): header: attempt_201008170108_0004_m_000020_0, compressed len: 18, decompressed len: 2
    2010-08-17 01:29:55,011 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.10): Shuffling 2 bytes (18 raw bytes) into RAM from attempt_201008170108_0004_m_000020_0
    2010-08-17 01:29:55,011 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.10): Read 2 bytes from map-output for attempt_201008170108_0004_m_000020_0
    2010-08-17 01:29:55,012 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.10): Rec #1 from attempt_201008170108_0004_m_000020_0 -> (-1, -1) from domU-12-31-39-0C-68-41.compute-1.internal
    2010-08-17 01:29:55,012 INFO org.apache.hadoop.mapred.ReduceTask (main): attempt_201008170108_0004_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
    2010-08-17 01:29:55,016 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.5): header: attempt_201008170108_0004_m_000021_0, compressed len: 18, decompressed len: 2
    2010-08-17 01:29:55,016 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.5): Shuffling 2 bytes (18 raw bytes) into RAM from attempt_201008170108_0004_m_000021_0
    2010-08-17 01:29:55,016 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.5): Read 2 bytes from map-output for attempt_201008170108_0004_m_000021_0
    2010-08-17 01:29:55,016 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.5): Rec #1 from attempt_201008170108_0004_m_000021_0 -> (-1, -1) from domU-12-31-39-0C-68-41.compute-1.internal
    2010-08-17 01:29:56,986 INFO org.apache.hadoop.mapred.ReduceTask (Thread for polling Map Completion Events): attempt_201008170108_0004_r_000001_0: Got 1 new map-outputs
    2010-08-17 01:29:59,997 INFO org.apache.hadoop.mapred.ReduceTask (Thread for polling Map Completion Events): attempt_201008170108_0004_r_000001_0: Got 1 new map-outputs
    2010-08-17 01:30:00,017 INFO org.apache.hadoop.mapred.ReduceTask (main): attempt_201008170108_0004_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
    2010-08-17 01:30:00,023 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.5): header: attempt_201008170108_0004_m_000022_0, compressed len: 18, decompressed len: 2
    2010-08-17 01:30:00,024 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.5): Shuffling 2 bytes (18 raw bytes) into RAM from attempt_201008170108_0004_m_000022_0
    2010-08-17 01:30:00,024 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.5): Read 2 bytes from map-output for attempt_201008170108_0004_m_000022_0
    2010-08-17 01:30:00,024 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.5): Rec #1 from attempt_201008170108_0004_m_000022_0 -> (-1, -1) from domU-12-31-39-0C-68-41.compute-1.internal
    2010-08-17 01:30:00,024 INFO org.apache.hadoop.mapred.ReduceTask (main): attempt_201008170108_0004_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
    2010-08-17 01:30:00,031 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.16): header: attempt_201008170108_0004_m_000023_0, compressed len: 18, decompressed len: 2
    2010-08-17 01:30:00,031 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.16): Shuffling 2 bytes (18 raw bytes) into RAM from attempt_201008170108_0004_m_000023_0
    2010-08-17 01:30:00,031 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.16): Read 2 bytes from map-output for attempt_201008170108_0004_m_000023_0
    2010-08-17 01:30:00,031 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.16): Rec #1 from attempt_201008170108_0004_m_000023_0 -> (-1, -1) from domU-12-31-39-0C-68-41.compute-1.internal
    2010-08-17 01:30:03,005 INFO org.apache.hadoop.mapred.ReduceTask (Thread for polling Map Completion Events): attempt_201008170108_0004_r_000001_0: Got 1 new map-outputs
    2010-08-17 01:30:05,033 INFO org.apache.hadoop.mapred.ReduceTask (main): attempt_201008170108_0004_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
    2010-08-17 01:30:05,038 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.18): header: attempt_201008170108_0004_m_000024_0, compressed len: 18, decompressed len: 2
    2010-08-17 01:30:05,038 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.18): Shuffling 2 bytes (18 raw bytes) into RAM from attempt_201008170108_0004_m_000024_0
    2010-08-17 01:30:05,039 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.18): Read 2 bytes from map-output for attempt_201008170108_0004_m_000024_0
    2010-08-17 01:30:05,039 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.18): Rec #1 from attempt_201008170108_0004_m_000024_0 -> (-1, -1) from domU-12-31-39-0C-68-41.compute-1.internal
    2010-08-17 01:30:06,011 INFO org.apache.hadoop.mapred.ReduceTask (Thread for polling Map Completion Events): attempt_201008170108_0004_r_000001_0: Got 1 new map-outputs
    2010-08-17 01:30:09,018 INFO org.apache.hadoop.mapred.ReduceTask (Thread for polling Map Completion Events): attempt_201008170108_0004_r_000001_0: Got 1 new map-outputs
    2010-08-17 01:30:10,039 INFO org.apache.hadoop.mapred.ReduceTask (main): attempt_201008170108_0004_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
    2010-08-17 01:30:10,044 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.19): header: attempt_201008170108_0004_m_000025_0, compressed len: 18, decompressed len: 2
    2010-08-17 01:30:10,045 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.19): Shuffling 2 bytes (18 raw bytes) into RAM from attempt_201008170108_0004_m_000025_0
    2010-08-17 01:30:10,045 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.19): Read 2 bytes from map-output for attempt_201008170108_0004_m_000025_0
    2010-08-17 01:30:10,045 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.19): Rec #1 from attempt_201008170108_0004_m_000025_0 -> (-1, -1) from domU-12-31-39-0C-68-41.compute-1.internal
    2010-08-17 01:30:10,046 INFO org.apache.hadoop.mapred.ReduceTask (main): attempt_201008170108_0004_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
    2010-08-17 01:30:10,050 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.17): header: attempt_201008170108_0004_m_000026_0, compressed len: 18, decompressed len: 2
    2010-08-17 01:30:10,050 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.17): Shuffling 2 bytes (18 raw bytes) into RAM from attempt_201008170108_0004_m_000026_0
    2010-08-17 01:30:10,050 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.17): Read 2 bytes from map-output for attempt_201008170108_0004_m_000026_0
    2010-08-17 01:30:10,050 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.17): Rec #1 from attempt_201008170108_0004_m_000026_0 -> (-1, -1) from domU-12-31-39-0C-68-41.compute-1.internal
    2010-08-17 01:30:12,026 INFO org.apache.hadoop.mapred.ReduceTask (Thread for polling Map Completion Events): attempt_201008170108_0004_r_000001_0: Got 1 new map-outputs
    2010-08-17 01:30:15,049 INFO org.apache.hadoop.mapred.ReduceTask (Thread for polling Map Completion Events): attempt_201008170108_0004_r_000001_0: Got 1 new map-outputs
    2010-08-17 01:30:15,052 INFO org.apache.hadoop.mapred.ReduceTask (main): attempt_201008170108_0004_r_000001_0 Need another 5 map output(s) where 0 is already in progress
    2010-08-17 01:30:15,052 INFO org.apache.hadoop.mapred.ReduceTask (main): attempt_201008170108_0004_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
    2010-08-17 01:30:15,060 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.0): header: attempt_201008170108_0004_m_000027_0, compressed len: 18, decompressed len: 2
    2010-08-17 01:30:15,060 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.0): Shuffling 2 bytes (18 raw bytes) into RAM from attempt_201008170108_0004_m_000027_0
    2010-08-17 01:30:15,060 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.0): Read 2 bytes from map-output for attempt_201008170108_0004_m_000027_0
    2010-08-17 01:30:15,060 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.0): Rec #1 from attempt_201008170108_0004_m_000027_0 -> (-1, -1) from domU-12-31-39-0C-68-41.compute-1.internal
    2010-08-17 01:30:15,061 INFO org.apache.hadoop.mapred.ReduceTask (main): attempt_201008170108_0004_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
    2010-08-17 01:30:15,070 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.0): header: attempt_201008170108_0004_m_000028_0, compressed len: 18, decompressed len: 2
    2010-08-17 01:30:15,070 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.0): Shuffling 2 bytes (18 raw bytes) into RAM from attempt_201008170108_0004_m_000028_0
    2010-08-17 01:30:15,076 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.0): Read 2 bytes from map-output for attempt_201008170108_0004_m_000028_0
    2010-08-17 01:30:15,076 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.0): Rec #1 from attempt_201008170108_0004_m_000028_0 -> (-1, -1) from domU-12-31-39-0C-68-41.compute-1.internal
    2010-08-17 01:30:18,063 INFO org.apache.hadoop.mapred.ReduceTask (Thread for polling Map Completion Events): attempt_201008170108_0004_r_000001_0: Got 1 new map-outputs
    2010-08-17 01:30:20,077 INFO org.apache.hadoop.mapred.ReduceTask (main): attempt_201008170108_0004_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
    2010-08-17 01:30:20,082 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.0): header: attempt_201008170108_0004_m_000029_0, compressed len: 18, decompressed len: 2
    2010-08-17 01:30:20,082 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.0): Shuffling 2 bytes (18 raw bytes) into RAM from attempt_201008170108_0004_m_000029_0
    2010-08-17 01:30:20,082 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.0): Read 2 bytes from map-output for attempt_201008170108_0004_m_000029_0
    2010-08-17 01:30:20,082 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.0): Rec #1 from attempt_201008170108_0004_m_000029_0 -> (-1, -1) from domU-12-31-39-0C-68-41.compute-1.internal
    2010-08-17 01:30:21,080 INFO org.apache.hadoop.mapred.ReduceTask (Thread for polling Map Completion Events): attempt_201008170108_0004_r_000001_0: Got 1 new map-outputs
    2010-08-17 01:30:24,086 INFO org.apache.hadoop.mapred.ReduceTask (Thread for polling Map Completion Events): attempt_201008170108_0004_r_000001_0: Got 1 new map-outputs
    2010-08-17 01:30:25,084 INFO org.apache.hadoop.mapred.ReduceTask (main): attempt_201008170108_0004_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
    2010-08-17 01:30:25,091 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.0): header: attempt_201008170108_0004_m_000030_0, compressed len: 18, decompressed len: 2
    2010-08-17 01:30:25,091 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.0): Shuffling 2 bytes (18 raw bytes) into RAM from attempt_201008170108_0004_m_000030_0
    2010-08-17 01:30:25,092 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.0): Read 2 bytes from map-output for attempt_201008170108_0004_m_000030_0
    2010-08-17 01:30:25,092 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.0): Rec #1 from attempt_201008170108_0004_m_000030_0 -> (-1, -1) from domU-12-31-39-0C-68-41.compute-1.internal
    2010-08-17 01:30:25,093 INFO org.apache.hadoop.mapred.ReduceTask (main): attempt_201008170108_0004_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
    2010-08-17 01:30:25,104 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.19): header: attempt_201008170108_0004_m_000031_0, compressed len: 18, decompressed len: 2
    2010-08-17 01:30:25,104 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.19): Shuffling 2 bytes (18 raw bytes) into RAM from attempt_201008170108_0004_m_000031_0
    2010-08-17 01:30:25,104 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.19): Read 2 bytes from map-output for attempt_201008170108_0004_m_000031_0
    2010-08-17 01:30:25,105 INFO org.apache.hadoop.mapred.ReduceTask (MapOutputCopier attempt_201008170108_0004_r_000001_0.19): Rec #1 from attempt_201008170108_0004_m_000031_0 -> (-1, -1) from domU-12-31-39-0C-68-41.compute-1.internal
    2010-08-17 01:30:26,097 INFO org.apache.hadoop.mapred.ReduceTask (Thread for polling Map Completion Events): GetMapEventsThread exiting
    2010-08-17 01:30:26,097 INFO org.apache.hadoop.mapred.ReduceTask (main): getMapsEventsThread joined.
    2010-08-17 01:30:26,097 INFO org.apache.hadoop.mapred.ReduceTask (main): Closed ram manager
    2010-08-17 01:30:26,097 INFO org.apache.hadoop.mapred.ReduceTask (main): Interleaved on-disk merge complete: 0 files left.
    2010-08-17 01:30:26,098 INFO org.apache.hadoop.mapred.ReduceTask (main): In-memory merge complete: 32 files left.
    2010-08-17 01:30:26,160 INFO org.apache.hadoop.mapred.Merger (main): Merging 32 sorted segments
    2010-08-17 01:30:26,161 INFO org.apache.hadoop.mapred.Merger (main): Down to the last merge-pass, with 0 segments left of total size: 0 bytes
    2010-08-17 01:30:26,173 INFO org.apache.hadoop.io.compress.CodecPool (main): Got brand-new compressor
    2010-08-17 01:30:26,184 INFO org.apache.hadoop.mapred.ReduceTask (main): Merged 32 segments, 64 bytes to disk to satisfy reduce memory limit
    2010-08-17 01:30:26,185 INFO org.apache.hadoop.mapred.ReduceTask (main): Merging 1 files, 22 bytes from disk
    2010-08-17 01:30:26,185 INFO org.apache.hadoop.mapred.ReduceTask (main): Merging 0 segments, 0 bytes from memory into reduce
    2010-08-17 01:30:26,186 INFO org.apache.hadoop.mapred.Merger (main): Merging 1 sorted segments
    2010-08-17 01:30:26,196 INFO org.apache.hadoop.mapred.Merger (main): Down to the last merge-pass, with 0 segments left of total size: 0 bytes
    2010-08-17 01:30:26,293 INFO org.apache.hadoop.streaming.PipeMapRed (main): PipeMapRed exec [/mnt3/var/lib/hadoop/mapred/taskTracker/jobcache/job_201008170108_0004/attempt_201008170108_0004_r_000001_0/work/./CBFinish.pl, --cmapjar=S3N://crossbow-refs/e_coli.jar, --destdir=/mnt/14270, --output=S3N://pings-ewha/e-coli/crossbow-mediated]
    2010-08-17 01:30:27,372 INFO org.apache.hadoop.fs.s3native.NativeS3FileSystem (main): Creating new file 's3n://pings-ewha/e-coli/crossbow-mediated/ignoreme2/part-00001' in S3
    2010-08-17 01:30:27,374 INFO org.apache.hadoop.fs.s3native.NativeS3FileSystem (main): Outputstream for key 'e-coli/crossbow-mediated/ignoreme2/part-00001' writing to tempfile '/mnt/var/lib/hadoop/s3,/mnt1/var/lib/hadoop/s3,/mnt2/var/lib/hadoop/s3,/mnt3/var/lib/hadoop/s3/output-4764919983786119736.tmp'
    2010-08-17 01:30:27,380 WARN org.apache.hadoop.streaming.PipeMapRed (Thread-34): java.lang.NullPointerException
    	at org.apache.hadoop.streaming.PipeMapRed$MRErrorThread.incrCounter(PipeMapRed.java:549)
    	at org.apache.hadoop.streaming.PipeMapRed$MRErrorThread.run(PipeMapRed.java:490)
    
    2010-08-17 01:30:27,390 INFO org.apache.hadoop.streaming.PipeMapRed (main): PipeMapRed failed!
    2010-08-17 01:30:27,392 WARN org.apache.hadoop.mapred.TaskTracker (main): Error running child
    java.lang.RuntimeException: java.lang.NullPointerException
    	at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:386)
    	at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:582)
    	at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137)
    	at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:477)
    	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:415)
    	at org.apache.hadoop.mapred.Child.main(Child.java:170)
    Caused by: java.lang.NullPointerException
    	at org.apache.hadoop.streaming.PipeMapRed$MRErrorThread.incrCounter(PipeMapRed.java:549)
    	at org.apache.hadoop.streaming.PipeMapRed$MRErrorThread.run(PipeMapRed.java:490)
    2010-08-17 01:30:27,395 INFO org.apache.hadoop.mapred.TaskRunner (main): Runnning cleanup for the task
    2010-08-17 01:30:27,395 INFO org.apache.hadoop.mapred.DirectFileOutputCommitter (main): Nothing to clean up on abort since there are no temporary files written
    Last edited by Serena Rhie; 08-16-2010, 06:30 PM. Reason: Sorry for bad view!

    Leave a comment:


  • Ben Langmead
    replied
    Originally posted by Serena Rhie View Post
    Here are my commands on web interface:

    Job type: Crossbow
    s3n://pings-ewha/e-coli/read-data/small.manifest
    s3n://pings-ewha/e-coli/crossbow-mediated
    Input type: Manifast
    Genome/Annotation: E. coli O157:H7
    ...
    Chromosome ploidy: All are haploid
    EC2 instances: 1

    Other options: set as default
    That set of options works for me. Does your job fail on step 2 or step 3? Your earlier post said step 2 but your newer one said step 3.

    Could you send me the "stderr" and "syslog" logs from one of the "task attempts" that fail? This document explains how to do this using the AWS Console interface:



    Thanks,
    Ben

    Leave a comment:


  • Serena Rhie
    replied
    Here are my commands on web interface:

    Job type: Crossbow
    s3n://pings-ewha/e-coli/read-data/small.manifest
    s3n://pings-ewha/e-coli/crossbow-mediated
    Input type: Manifast
    Genome/Annotation: E. coli O157:H7
    ...
    Chromosome ploidy: All are haploid
    EC2 instances: 1

    Other options: set as default

    Leave a comment:


  • Ben Langmead
    replied
    Originally posted by Serena Rhie View Post
    The manifest file is too long, so I tried with the sample provided as on Crossbow (via web interface) manual.
    I used the manifest file included in


    I fail on Step3. with an stderr msg


    Here is the stdout:


    Ben, can you help me? I really want to see the output results!
    Hi Serena,

    Please send me the exact command used.

    Thanks,
    Ben

    Leave a comment:


  • Serena Rhie
    replied
    The manifest file is too long, so I tried with the sample provided as on Crossbow (via web interface) manual.
    I used the manifest file included in
    $CROSSBOW_HOME/example/e_coli/small.manifest
    I fail on Step3. with an stderr msg
    Streaming Command Failed!
    Here is the stdout:
    packageJobJar: [/mnt/var/lib/hadoop/tmp/hadoop-unjar3365379216250869521/] [] /mnt/var/lib/hadoop/steps/5/tmp/streamjob208563616485322347.jar tmpDir=null
    Ben, can you help me? I really want to see the output results!

    Leave a comment:


  • Ben Langmead
    replied
    Originally posted by Serena Rhie View Post
    I've read your paper and tried to simulate part of your experiment you did on YH on your Web-based GUI with some of the paired-end read data with 10 instances, with the Job type option of "Just preprocess reads".

    However, AWS EC2 failed on "2. Preprocess short reads" step.
    And the stderr log appeared as following:

    Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Arrays.java:2734)
    at java.util.ArrayList.ensureCapacity(ArrayList.java:167)
    at java.util.ArrayList.add(ArrayList.java:351)
    at org.apache.hadoop.mapred.lib.NLineInputFormat.getSplits(NLineInputFormat.java:100)
    at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:833)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:804)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:753)
    at org.apache.hadoop.streaming.StreamJob.submitAndMonitorJob(StreamJob.java:1012)
    at org.apache.hadoop.streaming.StreamJob.run(StreamJob.java:127)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:50)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

    Seems like there is a memory leak or something...
    Can you give me some advice?
    Yikes, that's odd. Can you send me the manifest file you were using? I'll try to recreate this and let you know what I find. (I did encounter some issues with memory exhaustion when I moved over to Hadoop 0.20 but I thought I had fixed that by turning off JVM reuse - we'll see)

    Moreover, do I need to unzip the read files before uploading on S3?
    I really don't want to do that - because the transferring time and cost will be getting a monster...
    No, you don't. The input to the preprocessing step can be compressed with gzip or bzip2.

    Thanks,
    Ben

    Leave a comment:


  • Serena Rhie
    replied
    Whole-human resequencing with Crossbow

    Hi Ben,

    Originally posted by Ben Langmead View Post
    Hi all,

    The Crossbow paper, Searching for SNPs with cloud computing came out in provisional form today. Take a look if you're interested.

    Thanks,
    Ben
    I've read your paper and tried to simulate part of your experiment you did on YH on your Web-based GUI with some of the paired-end read data with 10 instances, with the Job type option of "Just preprocess reads".

    However, AWS EC2 failed on "2. Preprocess short reads" step.
    And the stderr log appeared as following:

    Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Arrays.java:2734)
    at java.util.ArrayList.ensureCapacity(ArrayList.java:167)
    at java.util.ArrayList.add(ArrayList.java:351)
    at org.apache.hadoop.mapred.lib.NLineInputFormat.getSplits(NLineInputFormat.java:100)
    at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:833)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:804)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:753)
    at org.apache.hadoop.streaming.StreamJob.submitAndMonitorJob(StreamJob.java:1012)
    at org.apache.hadoop.streaming.StreamJob.run(StreamJob.java:127)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:50)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

    Seems like there is a memory leak or something...
    Can you give me some advice?

    Moreover, do I need to unzip the read files before uploading on S3?
    I really don't want to do that - because the transferring time and cost will be getting a monster...

    Thanks,
    Serena Rhie

    Leave a comment:


  • jchien
    replied
    Crossbow-1.0.4 preprocess and SOAPsnp

    The preprocess step is parallel like the other steps. Can you tell me exactly what parameters you used, and what makes you conclude it's running single-threadedly?
    I specified cpus=12, but during the preprocess, there is only one process running as opposed to bowtie or soapsnp steps where I notice 12 process running in parallel. That's why I thought preprocess step is single-threaded.

    There should be another message that gives you more information about the error; could you send me the entire output?
    Here is output of entire error:

    Pid 14294 processing input task-00023 [24 of 48]...
    Aborting master loop because child failed
    Pid 16156 processing input task-00024 [25 of 48]...
    -- Reduce counters --
    SOAPsnp Alignments read 9664618
    SOAPsnp Paired alignments read 19492260
    SOAPsnp Positions called 1358172823
    SOAPsnp Positions called uncovered by any alignments 780183126
    SOAPsnp Positions called uncovered by unique alignments 789447449
    SOAPsnp Positions called with known SNP info 0
    SOAPsnp Positions with non-reference allele called 194467
    SOAPsnp Unique alignments read 9292965
    SOAPsnp Unpaired alignments read 0
    SOAPsnp wrapper Alignments processed 9746226
    SOAPsnp wrapper Out-of-range SNPs trimmed 9
    SOAPsnp wrapper Ranges processed 1477
    SOAPsnp wrapper SNP files missing 1477
    SOAPsnp wrapper SNPs reported 194458
    ==========================
    Stage 4 of 4. Postprocess
    ==========================
    Mon Aug 2 11:42:48 CDT 2010
    === Reduce ===
    # parallel reducers: 12
    # reduce tasks: 1
    Input: /tmp/crossbow/intermediate/10029/snps
    Output: /data/101b_full/crossbow_results
    Intermediate: /data/101b_full/crossbow_results.reduce.pre
    # bin, sort fields: 1, 2
    Total allowed sort memory footprint: 0
    Options: [ -keep-all -force ]
    Could not create new directory /data/101b_full/crossbow_results at /home/jchien/crossbow-1.0.4/ReduceWrap.pl line 81.
    Non-zero exitlevel from Postprocess stage

    Leave a comment:


  • Ben Langmead
    replied
    Hi Jeremy,

    Originally posted by jchien View Post
    I run Crossbow on local server running Ubuntu. Some of the scripts included with the tool used !/bin/sh, and in my case it was /bin/dash. Dash does not have pushd and popd commands called from the script. By changing the script to explicitly use /bin/bash, I was able to solve the minor hiccups.

    Is there particular reason why /bin/sh was explicitly declared as opposed to /bin/bash?
    Our mistake! We'll fix this in the next release.

    Second comment is more of a wish list. I have huge data files of several whole human genomes (each genome dataset contains >2.4 billion reads). I used Crossbow preprocess routine, and it appears to be single-threaded. Is that correct. It takes several hours just to pre-process the data on the server with 16 cores. Is there a way to speed up the process, or is it limited by disk access speed?
    The preprocess step is parallel like the other steps. Can you tell me exactly what parameters you used, and what makes you conclude it's running single-threadedly?

    Third is question related to SOAPsnp. After finishing the alignment, aligned data is split into 48 tasks. I am getting the following error in processing input task-00023 [24 of 48]....Aborting master loop because child failed.
    There should be another message that gives you more information about the error; could you send me the entire output?

    Thanks,
    Ben

    Leave a comment:


  • jchien
    replied
    Comments on Crossbow 1.0.4

    Originally posted by Ben Langmead View Post
    Hi Vix,

    Yes, several people contacted me with similar questions. We have since then released a version of Crossbow (v1.0.4) that we think has a much better Hadoop mode. It also has a single-computer mode, which does not require Hadoop (or Java). If you have time, please give that a shot.

    Sorry for the trouble,
    Ben
    Hi Ben,

    Thank you for writing that wonderful software. I tried the latest release (v1.0.4), and got some minor hiccups.

    I run Crossbow on local server running Ubuntu. Some of the scripts included with the tool used !/bin/sh, and in my case it was /bin/dash. Dash does not have pushd and popd commands called from the script. By changing the script to explicitly use /bin/bash, I was able to solve the minor hiccups.

    Is there particular reason why /bin/sh was explicitly declared as opposed to /bin/bash?

    Second comment is more of a wish list. I have huge data files of several whole human genomes (each genome dataset contains >2.4 billion reads). I used Crossbow preprocess routine, and it appears to be single-threaded. Is that correct. It takes several hours just to pre-process the data on the server with 16 cores. Is there a way to speed up the process, or is it limited by disk access speed?

    Third is question related to SOAPsnp. After finishing the alignment, aligned data is split into 48 tasks. I am getting the following error in processing input task-00023 [24 of 48]....Aborting master loop because child failed.

    Jeremy

    Leave a comment:


  • Ben Langmead
    replied
    Originally posted by VIX_Z View Post
    Hi Ben,
    I want to try crossbow on my local hadoop enabled cluster. Can you share the data you tried for the "small version of the experiment on local Hadoop cluster". I am ending up with various errors while using other reads data.

    With thanks,
    Vix
    Hi Vix,

    Yes, several people contacted me with similar questions. We have since then released a version of Crossbow (v1.0.4) that we think has a much better Hadoop mode. It also has a single-computer mode, which does not require Hadoop (or Java). If you have time, please give that a shot.

    Sorry for the trouble,
    Ben

    Leave a comment:


  • xinwu
    replied
    Hi all,
    My question is same as Dan326. CloudBurst used rmap algorithm with hadoop, so Dan's question can be summarized as How can I run CloudBurst using bowtie algorithm rather than rmap. As paper of bowtie indicates, bowtie is much faster than other mapping tools, so, if it combines with hadoop, you will get the quickest solution so far. Correct me if I am wrong. I am also try to find this kind of short reads mapping solution.

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    04-22-2024, 07:01 AM
  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 11:49 AM
0 responses
15 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-24-2024, 08:47 AM
0 responses
16 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
61 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
60 views
0 likes
Last Post seqadmin  
Working...
X