Ollama

Ollama is an open-source inference server supporting a number of generative AI models.  This module also includes Open-WebUI, which provides an easy-to-use web interface.

Ollama is in early user testing phase - not all functionality is guaranteed to work.  Contact oschelp@osc.edu with any questions.

Availability and Restrictions

Versions

Ollama is available on OSC Clusters. The versions currently available at OSC are:

Version Cardinal Ascend
0.5.13 X  

 

You can use module spider ollama to view available modules for a given machine.

Access:

All OSC users may use Ollama and Open-WebUI, but individual models may have their own license restrictions.

Publisher/Vendor/Repository and License Type

https://github.com/ollama/ollama, MIT license.

https://github.com/open-webui/open-webui, BSD-3-Clause license.

Prerequisites

  • GPU Usage: Ollama should be run with a GPU for best performance. 
  • OnDemand Desktop Session: If using the Open-WebUI web interface, you will need to first start an OnDemand Desktop session on Cardinal with GPU.

Running Ollama and Open-WebUI

Ollama is available through the module system and must be loaded prior to running any of the commands below:

loading ollama module:
module load ollama/0.5.13
Starting ollama:
ollama_start
Starting open-webui:
open_webui_start

In your OnDemand Desktop session, navigate to localhost:8080 to access the webui.  Ollama and Open-WebUI must be started first.  A model must be installed before it is available - see Model Management below.

Model Management

installing a model:
ollama_pull <modelname>

The list of supported models can be found at ollama.com/library. Ollama must be running prior to pulling a new model.  By default, models are saved to $HOME/.ollama/models, but this is customizable through the use of environment variables.  See module show ollama/0.5.13 for more details.

Some models require licensing agreements or are otherwise restricted and require a Hugging Face account and login.  With the Ollama module loaded, use the huggingface-cli tool to login:

>>> huggingface-cli login

For more details, see https://huggingface.co/docs/huggingface_hub/en/guides/cli.

 

Deleting a model:
ollama_rm <modelname>

Ollama must be running prior to deleting model.

 

First Time Setup

1. Start Ollama

2. Pull a model

3. Start Open-WebUI (optional)

4. Configure Open-WebUI to connect to Ollama (optional)

You may need to configure Open-WebUI to connect to your Ollama instance the first time.  To do this, click on Admin Panel -> Settings -> Connections.  Ollama API should be set to http://localhost:11434.

If you want to use OpenAI services, you can enter your API token here, otherwise disable this connection type. Of note, this connection type uses external commercial services and is not part of OSC nor subject to OSC data policies.

Interactive vs. Batch Usage

Ollama can be used interactively by loading the module and starting the service(s) as described above.

Requesting a GPU-enabled desktop session and using Open-WebUI is one possible use case.

The Ollama module can also be used in batch mode by loading the module in your batch script.  For example, you may want to run offline inference by running a script that relies on an inference endpoint.

Ollama provides an OpenAI API-compliant API endpoint, and can be accessed by Open-WebUI or another OpenAI API-compliant client, meaning you can bring your own clients or write your own.  As long as you can send requests to localhost:11434, this should work and support a wide variety of workflows. 

Please note this software is in early user testing and might not function as desired.  Please reach out to oschelp@osc.edu with any issues.

    Jupyter Usage

    This is not yet tested but might work - contact oschelp@osu.edu if you're interested in this functionality.

     

    Supercomputer: 
    Technologies: