OSC has various storage systems to fulfill different HPC research needs. Information on the each filesystem can be found in data storage technical documentation.
Review the overview of the filesystems, storage hardware, and the storage documentation.
Review information about storing data with strict security needs.
OSC's data storage is continually updated and expanded. View the some of the major changes.
Visit known issues and filter by the filesystem category to view current known issues with filesystems.
OSC has several different file systems where you can create files and directories. The characteristics of those systems and the policies associated with them determine their suitability for any particular purpose. This section describes the characteristics and policies that you should take into consideration in selecting a file system to use.
The various file systems are described in subsequent sections.
Most of our file systems are shared. Directories and files on the shared file systems are accessible from all OSC HPC systems. By contrast, local storage is visible only on the node it is located on. Each compute node has a local disk with scratch file space.
Some of our storage environments are intended for long-term storage; files are never deleted by the system or OSC staff. Some are intended as scratch space, with files deleted as soon as the associated job exits. Others fall somewhere in between, with expected data lifetimes of a few months to a couple of years.
Some of the file systems are backed up to tape; some are considered temporary storage and are not backed up. Backup schedules differ for different systems.
In no case do we make an absolute guarantee about our ability to recover data. Please read the official OSC data management policies for details. That said, we have never lost backed-up data and have rarely had an accidental loss of non-backed-up data.
The permanent (backed-up) and scratch file systems all have quotas limiting the amount of file space and the number of files that each user or group can use. Your usage and quota information are displayed every time you log in to one of our HPC systems. You can also check your home directory quota using the quota
command. We encourage you to pay attention to these numbers because your file operations, and probably your compute jobs, will fail if you exceed them. If you have extremely large files, you will have to pay attention to the amount of local file space available on different compute nodes.
File systems have different performance characteristics including read/write speeds and behavior under heavy load. Performance matters a lot if you have I/O-intensive jobs. Choosing the right file system can have a significant impact on the speed and efficiency of your computations. You should never do heavy I/O in your home or project directories, for example.
Each file system is configured differently to serve a different purpose:
Home Directory | Project | Local Disk | Scratch (global) | Backup | |
---|---|---|---|---|---|
Path | /users/project/userID |
/fs/ess |
/tmp |
/fs/scratch |
N/A |
Environment Variable | $HOME or ~ | N/A | $TMPDIR | $PFSDIR | N/A |
Space Purpose | Permanent storage | Long-term storage | Temporary | Temporary | Backup; replicated in Cleveland
|
Backed Up? | Daily | Daily | No | No | Yes |
Flushed | No | No | End of job when $TMPDIR is used |
End of job when $PFSDIR is used |
No |
Visibility | Login and compute nodes | Login and compute nodes | Compute node | Login and compute nodes | N/A |
Quota/Allocation | 500 GB of storage and 1,000,000 files | Typically 1-5 TB of storage and 100,000 files per TB. | Varies. Depending on node | 100 TB of storage and 25,000,000 files | N/A |
Total Size | 1.9 PB |
/fs/ess: 13.5 PB |
Varies. Depending on system |
/fs/scratch: 3.5 PB |
|
Bandwidth | 40 GB/S |
Reads: 60 GB/S Writes: 50 GB/S |
Varies. Depending on system | Reads: 170 GB/S
Writes: 70 GB/S |
N/A |
Type | NetApp WAFL service | GPFS | Varies. Depending on system | GPFS |
The storage at OSC consists of servers, data storage subsystems, and networks providing a number of storage services to OSC HPC systems. The current configuration consists of:
On July 12th, 2016 OSC migrated its old GPFS and Lustre filesystems to new Project and Scratch services, respectively. We've moved 1.22 PB of data, and the new capacities are 3.4 PB for Project, and 1.1 PB for Scratch. If you store data on these services, there are a few important details to note.
The Project service is now available at /fs/project
, and the Scratch service is available at /fs/scratch
. We have created symlinks on the Oakley and Ruby clusters to ensure that existing job scripts continue to function; however the symlinks will not be available on future systems, such as Owens. No action is required on your part to continue using your existing job scripts on current clusters.
However, you may wish to start updating your paths accordingly, in preparation for Owens being available later this year.
Project space allocations and Scratch space data was migrated automatically to the new services. For data on the Project service, ACLs, Xattrs, and Atimes were all preserved. However, Xattrs were not preserved for data on the Scratch service.
Additionally, Biomedical Informatics at The Ohio State University had some data moved from a temporary location to its permanent location on the Project service. We had prepared for this, and already provided symlinks so that the data appeared to be in the final location prior to the July 12th downtime, so the move should be mostly transparent to users. However, ALCs, Xattrs, and Atimes were not preserved for this data.
File system |
Transfer method |
ACLs preserved |
Xattrs preserved |
Atime preserved |
/fs/project |
AFM |
Yes |
Yes |
Yes |
/fs/lustre |
rsync |
Yes |
No |
Yes |
/users/bmi |
rsync |
No |
No |
No |
Full details and documentation of the new service capacities and capabilities are available at https://www.osc.edu/supercomputing/storage-environment-at-osc/
In March 2020, OSC expanded the existing project and scratch storage filesystems by 8.6 petabytes. Adding the existing storage capacity at OSC, this brings the total storage capacity of OSC to ~14 petabytes.
The new project and scratch storage is available using the path /fs/ess/<project-code>
for project space and /fs/ess/scratch/<project-code>
for scratch space. Existing data can be reached using the existing paths /fs/project
and /fs/scratch
.
Any new project storage allocation requests will be granted on the new storage, as long as the project did not have existing project space. Any new storage allocations will use the file path /fs/ess/<project-code>
.
Some projects will have access to the new scratch space at /fs/ess/scratch/<project-code>
. We will work with the individual group if access to /fs/ess/scratch/
is granted for that group.
Existing project and scratch storage space may be required to move to the new storage space. If this happens, then OSC can optionally setup a symlink or a redirect, so that compatibility for programs and job scripts is maintained for some time. However, redirects are not a permanent solution and will be removed after some time. The members of the project should work to make sure that once the redirect is removed, it does not negatively affect their work at OSC.
In October 2022, OSC retires the Data Direct Networks (DDN) GRIDScaler system deployed in 2016 and expands the IBM Elastic Storage System (ESS) for both Project and global Scratch services. This expands the total capacity of Project and Scratch storage at OSC to ~16 petabytes with better performance.
All project and scratch storage is available using the path /fs/ess/<project-code>
for project space and /fs/scratch/<project-code>
for scratch space.
OSC have been migrating all current Project data and Scratch data to the new services since September 2022, and runs the final synchronization of the data during Oct 11 2022 downtime. ACLs and extended attributes for the data are also preserved after the migration.
During December 13 2022 downtime, OSC cleaned the scratch directories for users who used to have scratch on both DDN and ESS storage (/fs/scratch/<project-code>
and /fs/ess/scratch/<project-code>
). All directories under /fs/ess/scratch/
points to /fs/scratch/
so they are essentially the same storage.
OSC have setup symlinks for the data on the storage so the compatibility for programs and job scripts is maintained. Please start to update your existing scripts to replace /fs/project/<project-code>
with /fs/ess/<project-code>
for project; and replace /fs/ess/scratch/<project-code>
with /fs/scratch/<project-code>
for scratch.
/fs/ess/<project-code>
for project storage and /fs/scratch/<project-code>
for scratch storage in all future job scripts.For users who used to have project space on the DDN storage, you will see /fs/ess/<project-code>
instead of /fs/project/<project-code>
. Please use the directory /fs/ess/<project-code>
, which is your current project space location including all of your previous data on project.
For users who used to have scratch on the ESS storage, you will see /fs/scratch/<project-code
instead of /fs/ess/scratch/<project-code>
). Please use the directory /fs/scratch/<project-code
, which is your current scratch space location including all of your scratch data.
OSC's Protected Data Service (PDS) is designed to address the most common security control requirements encountered by researchers while also reducing the workload on individual PIs and research teams to satisfy these requirements.
The OSC cybersecurity program is based upon the National Institute of Standards and Technology (NIST) Special Publication (SP) 800-53, Revision 4 requirements for security, and reflects the additional requirements of established Information Technology (IT) security practices.
OSC currently supports the following protected data types.
If you need support for a data type that is not listed, please contact OSC Help to discuss.
OSC's PDS was developed with the intent of meeting the security control requirements of your research agreements and to eliminate the burden placed on PIs who would otherwise be required to maintain their own compliance infrastructure with certification and reporting requirements.
In order to begin a project at OSC with data protection requirements, please follow these steps:
Send an email to oschelp@osc.edu and describe the project's data requirements.
You will hear back from OSC to set up an initial consultation to discuss your project and your data. Based on your project and the data being used, we may request the necessary documentation (data use agreements, BAA, MOU, etc).
Once OSC receives the necessary documentation, the request to store data on the PDS will be reviewed, and if appropriate, approved.
All PDS projects require multi-factor authentication (MFA). MFA will be set by OSC when the project is created.
OSC will help set up the project and the storage used to store the projected data. Here is a list of useful links:
/fs/ess/PDEXXXX
and /fs/scratch/PDEXXXX
directories.There are other storage locations at OSC, but none of the follwing locations can be used to store protected data because they do not have the proper controls and requirements to safely store it:
/users/<project-code>
/fs/ess/<non-PDS-project>
/fs/scratch/<non-PDS-project>
The directory permissions where protected data are stored are setup to prevent changing the permissions or access control entries on the top-level directories by regular users. Only members of the project are authorized to access the data; users are not permitted to attempt to share data with unauthroized users.
The protected data environment will be monitored for unauthorized changes to permissions and access control.
Protected data directoires will be set with permissions to restrict access to only project users. Project users are determined by group membership. For example, project PDE1234 has a protected data location at /fs/ess/PDE1234
and only users in the group PDE1234 may access data in that directory.
Adding a user to a project in OSC client portal adds the group to their user account, likewise removing the user from the project, removes their group. See our page for invite, add, remove users.
Do not share accounts/passwords, ever.
A user that logs in with another person's account is able to perform actions on behalf of that person, including unauthorized actions mentioned above.
Transferring files securely to OSC involves understanding which commands/applications to use and which directory to use.
Before transferring files, one should ensure that the proper permissions will be applied once transferred, such as verifying the permissions and acl of the destination directory for a transferred file.
Install filezilla client software and use the filezilla tutorial to transfer files.
Use the client sftp://sftp.osc.edu
Select login type as interactive, as multi-factor authentication will be required to login for protected data projects.
/fs/ess/secure_dir
before starting the file transferProtected Data Service projects must use the OSC high assurance endpoint or transfers may fail. See Globus high assureance page for more information. Also, ensure protected data is being shared in accordance with its requirements.
There is guide for using globus on our globus page.
You can use the OnDemand file explorer for upload and download of protected data as well as the integrated Globus High Assurance application.
This is guide for using OnDemand file transfer.
Files and directories can also be transferred manually on the command line.
scp src <username>@sftp.osc.edu:/fs/ess/secure_dir
sftp <username>@sftp.osc.edu ## then run sftp transfer commands (get, put, etc.)
rsync --progress -r local-dir <username>@sftp.osc.edu:/fs/ess/secure_dir