HOWTO: Use 'rclone' to Upload Data

rclone is a tool that can be used to upload and download files to a cloud storage (like Microsoft OneDrive, BuckeyeBox) from the command line. It's shipped as a standalone binary, but requires some user configuration before using. In this page, we will provide instructions on how to use rclone to upload data to OneDrive. For instructions with other cloud storage, check rclone Online documentation.

You can use "Globus" feature of OnDemand to perform data transfer between OneDrive and other storage. See this File Transfer and Management page for more information. 

Setup

Before configuration, please first log into OSC OnDemand and request a Pitzer Lightweight Desktop session. Walltime of 1 hour should be sufficient to finish the configuration.  

Note: this does not work with the 'konqueror' browser present on OSC Systems. Please set default to Firefox first before you do the setup following the instructions below:
* xfce: Applications (Top left corner) -> Settings -> Preferred Applications
* mate: System (top bar towards the left) -> Preferences -> Preferred Applications

Once the session is ready, open a terminal. In the terminal, run the command

rclone config

It prompts you with a bunch of questions:

  • It shows "No remotes found -- make a new one" or list available remotes you made before
    •  Answer "n" for "New remote"
  • "name>" (the name for the new remote)
    • Type "OneDrive" (or whatever else you want to call this remote)
  • "Storage>" (the storage type of the new remote)
    • This should display a list to choose from. Enter the number corresponding to the "Microsoft OneDrive" storage type, which is "26".
    • (It is "6" for BuckeyeBox)
  • "client_id>"
    • Leave this blank (just press enter).
  • "client_secret>"
    • Leave this blank (just press enter).
  • "Choose national cloud region for OneDrive."
    • This should display a list to choose from. Enter the number corresponding to the "Microsoft Cloud Global" region, which is "1".
  • "Edit advanced config?"
    • Type "n" for no
  • "Use auto config?"
    • Answer "y" for yes
  • A web browser window should pop up allowing you to log into box. It is a good idea at this point to verify that the url is actually OneDrive before entering any credentials 
    • Enter your OSU email
    • This should take you to the OSU login page. Login with your OSU credentials 
    • Go back to the terminal once "Success" is displayed.
  • "Your choice>"
    • One of five options to locate the drive you wish to use.
    • Type "1" to use your personal or business OneDrive
  • "Choose drive to use"
    • Type "0"
  • "Is this Okay? y/n>"
    • Type "y" to confirm the drive you wish to use is correct.
  • "y/e/d>"
    • Type "y" to confirm you wish to add this remote to rclone.

Testing rclone

Note: you do not need to use Pitzer Lightweight Desktop when you run 'rclone'. You can test the data transfer with a small file using login nodes (either Pitzer or Owens), or request a regular compute node to do the data transfer with large files. 

Create an empty hello.txt file and upload it to OneDrive using 'rclone copy' as below in a terminal:

touch hello.txt
rclone copy hello.txt OneDrive:/test

This creates a toplevel directory in OneDrive called 'test' if it does not already exist, and uploads the file hello.txt to it.

To verify the uploading is successful, you can either login to OneDrive in a web browser to check the file, or use rclone ls command in the terminal as:

rclone ls OneDrive:/test
Note: be careful when using ls on a large directory, because it's recursive. You can add a '--max-depth 1' flag to stop the recursion. 

Downloading from OneDrive to OSC

Copy the contents of a source directory from a configured OneDrive remote, OneDrive:/src/dir/path, into a destination directory in your OSC session, /dest/dir/path, using the code below:

rclone copy OneDrive:/src/dir/path /dest/dir/path

Identical files on the source and destination directories are not transferred. Only the contents of the provided source directory are copied, not the directory name and contents.

copy does not delete files from the destination. To delete files from the destination directory in order to match the source directory, use the sync command instead.

If only one file is being transferred, use the copyto command instead.

Note: The --no-traverse option can be used to increase efficiency by stopping rclone from listing the destination. It should be used when copying a small number of files and/or have a large number of files on the destination, but not when a large number of files are being copied.
Note: Shared folders will not appear when listing a directory they are filed in. They are still accessible and data can be move to/from them. For example, the commands rclone ls OneDrive:/path/to/shared_folder and rclone copy OneDrive:/path/to/shared_folder /dest/dir/path will work normally even though the shared folder does not appear when listing their source directory.

Limitations

If rclone remains unused for 90 days, the refresh token will expire, leading to issues with authorization. This can be easily resolved by executing the rclone config reconnect remote: command, which generates a fresh token and refresh token.

Naming

It's important to note OneDrive is case insensitive which prohibits the coexistence files such as "Hello.doc" and "hello.doc". Certain characters are prohibited from being in OneDrive filenames and are commonly encountered on non-Windows platforms. Rclone addresses this by converting these filenames to their visually equivalent Unicode alternatives.

File Sizes

The largest allowed file size is 250 GiB for both OneDrive Personal and OneDrive for Business (Updated 13 Jan 2021).

Path Length

The entire path, including the file name, must contain fewer than 400 characters for OneDrive, OneDrive for Business and SharePoint Online. It is important to know the limitation when encrypting file and folder names with rclone, as the encrypted names are typically longer than the original ones.

Number of Files

OneDrive seems to be OK with at least 50,000 files in a folder, but at 100,000 rclone will get errors listing the directory like couldn’t list files: UnknownError:.

Reference

 

Supercomputer: