# 2. Preparing for Fine-tuning

For a smooth tutorial experience, the following specifications are recommended:

  • CPU: 16 cores or more

  • Memory: 512GB or more

  • MAF version: 24.11.0

  • Storage: 500GB or more

Please verify that your environment meets these requirements before starting the tutorial.

# Getting Started

To start, you'll need to obtain a container or virtual machine on the MoAI Platform from your infrastructure provider. You can use public cloud services based on the MoAI Platform, such as:

After accessing the platform via SSH, run the moreh-smi command to ensure the MoAI Accelerator is properly recognized. Note that device names may vary depending on the system.

# Verifying the MoAI Accelerator

For this tutorial, which involves training a large-scale language model (LLM) like Llama3, selecting the appropriate size of MoAI Accelerator is crucial. First, use the moreh-smi command to check the current MoAI Accelerator in use.

Details on the specific MoAI Accelerator settings required for training will be provided in 3. Model Fine-tuning

$ moreh-smi
+----------------------------------------------------------------------------------------------------+
|                                                  Current Version: 24.11.0  Latest Version: 24.11.0 |
+----------------------------------------------------------------------------------------------------+
|  Device  |        Name         |      Model     |  Memory Usage  |  Total Memory  |  Utilization   |
+====================================================================================================+
|  * 0     |   MoAI Accelerator  |  xLarge.512GB  |  -             |  -             |  -             |
+----------------------------------------------------------------------------------------------------+

Setting up the PyTorch script execution environment on the MoAI Platform is similar to working on a standard GPU server.

# Checking PyTorch Installation

Once you’ve accessed the container via SSH, check if PyTorch is installed in the current conda environment by running:

$ conda list torch
...
# Name                    Version                   Build  Channel
torch                     2.1.0+cu118.moreh24.11.0          pypi_0    pypi
...

The version name includes both the PyTorch version and the MoAI version required to run it.
In the example above, it indicates that PyTorch 2.1.0+cu118 is installed with MoAI version 24.11.0.

If you see the message conda: command not found, if the torch package is not listed, or if the torch package exists but does not include "moreh" in the version name, please follow the instructions in the Prepare Fine-tuning on MoAI Platform document to create a conda environment.

If the moreh version is not 24.11.0 but a different version, please execute the following code.

$ update-moreh --target 24.11.0 --torch 2.1.0
Currently installed: 24.9.0
Possible upgrading version: 24.11.0
  
Do you want to upgrade? (y/n, default:n)
y

# Verifying PyTorch Functionality

Run the following to ensure the torch package is properly imported and that the MoAI Accelerator is recognized:

$ python
Python 3.8.20 (default)
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
...
>>> torch.cuda.device_count()
1
>>> torch.cuda.get_device_name()
[info] Requesting resources for MoAI Accelerator from the server...
[info] Initializing the worker daemon for MoAI Accelerator
[info] [1/1] Connecting to resources on the server (192.168.110.00:24158)...
[info] Establishing links to the resources...
[info] MoAI Accelerator is ready to use.
'MoAI Accelerator'
>>> quit()

# Downloading the Training Script

Download the PyTorch script for training from the GitHub repository by running:

For this tutorial, we will use the train_llama3.py script located in the tutorial directory.

$ sudo apt-get install git
$ git clone https://github.com/moreh-dev/quickstart.git
$ cd quickstart
~/quickstart$ ls tutorial
...  train_llama3.py  ...

# Installing Required Python Packages

Install third-party Python packages needed to run the script by executing:

$ pip install -r requirements/requirements_llama3.txt

# Acquire Access to the Model

To access and download the Llama3 70B model checkpoint from Hugging Face Hub, you will need to agree to the community license and provide your Hugging Face token information.

First, enter the necessary information and agree to the license on the Hugging Face website.

meta-llama/Meta-Llama-3-70B · Hugging Face
https://huggingface.co/meta-llama/Meta-Llama-3-70B

Once you've submitted the agreement form, check that the status on the page has updated as follows:

Now you can authenticate your Hugging Face token with the following command:

huggingface-cli login