A hadoop cluster can be launched within the HPC environment, but managed by the PBS/slurm job scheduler using  Myhadoop framework developed by San Diego Supercomputer Center. (Please see https://www.grid.tuc.gr/fileadmin/users_data/grid/documents/hadoop/Krish...)

Availability and Restrictions


The following versions of Hadoop are available on OSC systems: 

Version Owens
3.0.0-alpha1 X*
* Current default version

You can use module spider hadoop to view available modules for a given machine. Feel free to contact OSC Help if you need other versions for your work.


Hadoop is available to all OSC users. If you have any questions, please contact OSC Help.

Publisher/Vendor/Repository and License Type

Apache software foundation, Open source



In order to configure your environment for the usage of Hadoop, run the following command:

module load hadoop

In order to access a particular version of Hadoop, run the following command

module load hadoop/3.0.0-alpha1

Using Hadoop

In order to run Hadoop in batch, reference the example batch script below. This script requests 6 node on the Owens cluster for 1 hour of walltime. 

#SBATCH --job-name hadoop-example
#SBATCH --nodes=6 --ntasks-per-node=28
#SBATCH --time=01:00:00
#SBATCH --account <account>

module load hadoop/3.0.0-alpha1
module load myhadoop/v0.40
export HADOOP_CONF_DIR=$TMPDIR/mycluster-conf-$SLURM_JOBID


myhadoop-configure.sh -c $HADOOP_CONF_DIR -s $TMPDIR
hadoop dfsadmin -report
hadoop  dfs -mkdir data
hadoop  dfs -put $HADOOP_HOME/README.txt  data/
hadoop  dfs -ls data
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-alpha1.jar wordcount data/README.txt wordcount-out
hadoop  dfs -ls wordcount-out
hadoop  dfs  -copyToLocal -f  wordcount-out  $WORK

Example Jobs

Please check /usr/local/src/hadoop/3.0.0-alpha1/test.osc folder for more examples of hadoop jobs

