This article will introduce how to use LFS to manage large files, including GitLab instance configuration and Git client configuration.
Applicable scenarios:
GitLab supports two methods for configuring LFS, with the main difference being the storage location of LFS files.
Since LFS typically stores large files, storage capacity usage can be significant. Therefore, it's important to plan storage capacity according to requirements before deployment.
By default, the deployed GitLab instance has already enabled the LFS feature and uses local storage to save LFS files.
Locally stored LFS files are saved in the attachment storage
, with the path: shareds/lfs-objects
.
The attachment storage varies depending on the deployment method:
GitLab officially recommends using object storage to save LFS files.
GitLab supports multiple types of object storage. The following example uses MinIO to illustrate how to configure GitLab to use object storage.
Create the following buckets in MinIO:
git-lfs
gitlab-uploads
The method to create buckets using the mc cli command is as follows:
Prepare a MinIO configuration file named rails-storage.yaml
with the following content:
Where:
provider
is the type of object storage, MinIO uses the fixed value AWS
region
is the region of the object storage, MinIO uses the fixed value us-east-1
aws_access_key_id
is the access key ID for the object storageaws_secret_access_key
is the access key for the object storageendpoint
is the access address for the object storagepath_style
is the access method for the object storage, using the fixed value true
hereSave the configuration file as a secret in the cluster, noting that the namespace must match that of the GitLab instance:
Modify the GitLab instance configuration by adding the following to enable object storage:
Unlike regular API requests, when uploading and downloading LFS files, the workhorse component consumes more CPU resources. The resources of the workhorse component directly affect push and pull performance.
Workhorse Component Resource Configuration | Push Peak Bandwidth | CPU Usage | Memory Usage |
---|---|---|---|
1C 500Mi | 70 MBps | 1C(100%) | 100Mi(20%) |
2C 500Mi | 140 MBps | 2C(100%) | 100Mi(20%) |
Modify the GitLab instance configuration by adding the following to adjust the workhorse component resources:
In addition, you need to increase the timeout for the webservice component. The configuration method is as follows:
git version
command to check the Git versiongit-lfs version
command to check the git-lfs versionExecute the following command to save clone credentials, avoiding password input for every pull and push:
Execute the following command to set the concurrent transfer count for LFS files, which can effectively improve the stability of push and pull, especially when pushing a large number of files at once:
Execute the following command to set the LFS activity timeout. This parameter must be added if GitLab uses object storage:
Execute the following commands to clone the Git repository to your local machine and run git lfs install
to install the LFS git hook:
Execute the following command to set the tracking pattern for LFS files, for example, to track all .safetensors
files:
The above operation will generate a .gitattributes
file. Execute the following commands to commit this file to the remote repository first:
Afterwards, adding or updating .safetensors
files to the repository will automatically save them using LFS.
To verify if a file is saved using LFS, you can view the file in the GitLab repository. If there is a small LFS icon next to the file name, it indicates that the file is saved using LFS.
When cloning very large repositories, if the client has insufficient resources, the client may be killed by the system after running for a while. The solution is:
GIT_LFS_SKIP_SMUDGE=1
parameter when cloning to skip LFS file dumpinggit lfs pull
command to pull LFS files to the local machine