A hadoop cluster can be launched within the HPC environment, but managed by the PBS/slurm job scheduler using Myhadoop framework developed by San Diego Supercomputer Center. (Please see https://www.grid.tuc.gr/fileadmin/users_data/grid/documents/hadoop/Krish...)
Availability and Restrictions
Versions
The following versions of Hadoop are available on OSC systems:
Version | Owens |
---|---|
3.0.0-alpha1 | X* |
You can use module spider hadoop
to view available modules for a given machine. Feel free to contact OSC Help if you need other versions for your work.
Access
Hadoop is available to all OSC users. If you have any questions, please contact OSC Help.
Publisher/Vendor/Repository and License Type
Apache software foundation, Open source
Usage
Set-up
In order to configure your environment for the usage of Hadoop, run the following command:
module load hadoop
In order to access a particular version of Hadoop, run the following command
module load hadoop/3.0.0-alpha1
Using Hadoop
In order to run Hadoop in batch, reference the example batch script below. This script requests 6 node on the Owens cluster for 1 hour of walltime.
#!/bin/bash #SBATCH --job-name hadoop-example #SBATCH --nodes=6 --ntasks-per-node=28 #SBATCH --time=01:00:00 #SBATCH --account <account> export WORK=$SLURM_SUBMIT_DIR module load hadoop/3.0.0-alpha1 module load myhadoop/v0.40 export HADOOP_CONF_DIR=$TMPDIR/mycluster-conf-$SLURM_JOBID cd $TMPDIR myhadoop-configure.sh -c $HADOOP_CONF_DIR -s $TMPDIR $HADOOP_HOME/sbin/start-dfs.sh hadoop dfsadmin -report hadoop dfs -mkdir data hadoop dfs -put $HADOOP_HOME/README.txt data/ hadoop dfs -ls data hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-alpha1.jar wordcount data/README.txt wordcount-out hadoop dfs -ls wordcount-out hadoop dfs -copyToLocal -f wordcount-out $WORK $HADOOP_HOME/sbin/stop-dfs.sh myhadoop-cleanup.sh
Example Jobs
Please check /usr/local/src/hadoop/3.0.0-alpha1/test.osc folder for more examples of hadoop jobs