Managing Large Files with LFS
This article will introduce how to use LFS to manage large files, including GitLab instance configuration and Git client configuration.
Applicable scenarios:
- Using LFS to manage executable files in code repositories, such as compiled JAR packages
- Using LFS to manage AI large model files in code repositories
TOC
GitLab Instance Configuration
Prerequisites
- A GitLab instance has been deployed according to the GitLab Instance Deployment documentation.
GitLab supports two methods for configuring LFS, with the main difference being the storage location of LFS files.
- Using local storage to save LFS files
- Using object storage to save LFS files
Since LFS typically stores large files, storage capacity usage can be significant. Therefore, it's important to plan storage capacity according to requirements before deployment.
Using Local Storage for LFS Files
By default, the deployed GitLab instance has already enabled the LFS feature and uses local storage to save LFS files.
Locally stored LFS files are saved in the attachment storage, with the path: shareds/lfs-objects.
The attachment storage varies depending on the deployment method:
- GitLab instances deployed using the HostPath method use node storage. Due to the difficulty of expanding node storage, this method is not recommended for production environments.
- GitLab instances deployed using storage classes or specified PVC methods both use PVC as the storage medium.
Using Object Storage for LFS Files (Recommended)
GitLab officially recommends using object storage to save LFS files.
GitLab supports multiple types of object storage. The following example uses MinIO to illustrate how to configure GitLab to use object storage.
Create the following buckets in MinIO:
git-lfsgitlab-uploads
The method to create buckets using the mc cli command is as follows:
Prepare a MinIO configuration file named rails-storage.yaml with the following content:
Where:
provideris the type of object storage, MinIO uses the fixed valueAWSregionis the region of the object storage, MinIO uses the fixed valueus-east-1aws_access_key_idis the access key ID for the object storageaws_secret_access_keyis the access key for the object storageendpointis the access address for the object storagepath_styleis the access method for the object storage, using the fixed valuetruehere
Save the configuration file as a secret in the cluster, noting that the namespace must match that of the GitLab instance:
Modify the GitLab instance configuration by adding the following to enable object storage:
Resource and Parameter Configuration
Unlike regular API requests, when uploading and downloading LFS files, the workhorse component consumes more CPU resources. The resources of the workhorse component directly affect push and pull performance.
Modify the GitLab instance configuration by adding the following to adjust the workhorse component resources:
In addition, you need to increase the timeout for the webservice component. The configuration method is as follows:
Git Client Configuration
Prerequisites
- Git client is installed, execute the
git versioncommand to check the Git version - Git-lfs client is installed, execute the
git-lfs versioncommand to check the git-lfs version
Configuring Git Client Parameters
Execute the following command to save clone credentials, avoiding password input for every pull and push:
Execute the following command to set the concurrent transfer count for LFS files, which can effectively improve the stability of push and pull, especially when pushing a large number of files at once:
Execute the following command to set the LFS activity timeout. This parameter must be added if GitLab uses object storage:
Git Project Configuration
Execute the following commands to clone the Git repository to your local machine and run git lfs install to install the LFS git hook:
Execute the following command to set the tracking pattern for LFS files, for example, to track all .safetensors files:
The above operation will generate a .gitattributes file. Execute the following commands to commit this file to the remote repository first:
Afterwards, adding or updating .safetensors files to the repository will automatically save them using LFS.
To verify if a file is saved using LFS, you can view the file in the GitLab repository. If there is a small LFS icon next to the file name, it indicates that the file is saved using LFS.
Common Issues
Clone Failure for Very Large Repositories
When cloning very large repositories, if the client has insufficient resources, the client may be killed by the system after running for a while. The solution is:
- Add the
GIT_LFS_SKIP_SMUDGE=1parameter when cloning to skip LFS file dumping - Enter the local code directory and execute the
git lfs pullcommand to pull LFS files to the local machine