SeaweedFS Migration
Migration guide for moving on-premise storage backend from MiniO to SeaweedFS
Preparation
Create a SeaweedFS deployments for core and hubble according to your storage needs
Template:
# =============================================================================
# SeaweedFS Storage Values Template for FOSSA Deployment
# =============================================================================
#
# FOSSA requires two SeaweedFS instances for S3-compatible object storage:
# 1. core-seaweedfs - Stores analysis artifacts for FOSSA Core
# 2. hubble-seaweedfs - Stores compliance data for FOSSA Hubble
#
# Adjust storage sizes based on your expected workload. See the "Sizing Guide"
# at the bottom of this file for recommendations.
#
# Prerequisites:
# - A Kubernetes cluster with a default StorageClass configured
# - Helm 3.x
#
# Installation:
# helm repo add fossa https://charts.fossa.com
# helm repo update
#
# helm install core-seaweedfs fossa/seaweedfs \
# --namespace <namespace> \
# --version 4.17.0 \
# -f core-seaweedfs-values.yaml
#
# helm install hubble-seaweedfs fossa/seaweedfs \
# --namespace <namespace> \
# --version 4.17.0 \
# -f hubble-seaweedfs-values.yaml
#
# Upgrade:
# helm upgrade core-seaweedfs fossa/seaweedfs \
# --namespace <namespace> \
# --version 4.17.0 \
# -f core-seaweedfs-values.yaml
#
# helm upgrade hubble-seaweedfs fossa/seaweedfs \
# --namespace <namespace> \
# --version 4.17.0 \
# -f hubble-seaweedfs-values.yaml
#
# =============================================================================
# =============================================================================
# Sizing Guide
# =============================================================================
#
# The main values to adjust are volume.dataDirs[].size fields. These control
# how much raw data each SeaweedFS instance can store.
#
# Recommended volume sizes by deployment scale:
#
# Deployment Size | Core volume.size | Hubble volume.size | Notes
# ------------------------|------------------|--------------------|-------------------
# Small (<100 projects) | 50Gi | 25Gi | Dev/eval environments
# Medium (<500 projects) | 100Gi | 50Gi | Standard production
# Large (<2000 projects) | 250Gi | 100Gi | Large organizations
# XLarge (2000+ projects) | 500Gi+ | 250Gi+ | Enterprise scale
#
# Other parameters:
#
# master.data.size - Volume metadata. 1Gi is sufficient for all but the
# largest deployments (10k+ volumes).
#
# filer.data.size - File index. 1Gi handles millions of objects. Increase
# to 5-10Gi only if you expect tens of millions of files.
#
# maxVolumes - Upper limit on the number of logical volumes SeaweedFS
# can create. The defaults (10000 for core, 6000 for
# hubble) are generous and rarely need adjustment.
#
# storageClass - Uncomment and set if you need a specific Kubernetes
# StorageClass (e.g., for SSD-backed storage, specific
# provisioners, or cloud-specific volume types). If left
# commented out, the cluster default StorageClass is used.
#
# Storage can be expanded after initial deployment by increasing the PVC size,
# provided your StorageClass supports volume expansion
# (allowVolumeExpansion: true).
# =============================================================================
Example:
# =============================================================================
# Core SeaweedFS Values
# =============================================================================
# S3-compatible object storage for FOSSA Core analysis artifacts.
#
# After installation, the S3 endpoint is available at:
# http://core-seaweedfs-s3:8333
#
# See seaweedfs-values-template.yaml for installation instructions and the
# sizing guide.
# =============================================================================
nameOverride: core-seaweedfs
global:
serviceAccountName: core-seaweedfs
imagePullSecrets: test-fossa-core-quay.io # <-- existing pull secret created by fossa-core when release name is test -- to list existing pull secrets `kubectl get secrets --field-selector type=kubernetes.io/dockerconfigjson`
# -- S3 API Configuration
s3:
enabled: true
enableAuth: true
createBuckets:
- name: fossa.test
anonymousRead: false
# -- Master Server Storage
# Stores volume metadata and cluster topology. Lightweight; 1Gi is
# sufficient for most deployments.
master:
data:
type: persistentVolumeClaim
size: 1Gi # <-- Adjust if needed (rarely necessary)
# storageClass: "" # <-- Uncomment to use a specific StorageClass
# -- Filer Storage
# Stores the file-to-chunk mapping index. 1Gi is sufficient for most
# deployments; increase if you expect millions of objects.
filer:
data:
type: persistentVolumeClaim
size: 1Gi # <-- Adjust if needed
# storageClass: "" # <-- Uncomment to use a specific StorageClass
logs:
type: emptyDir
# -- Volume Server Storage (primary data storage)
# This is where actual file data is stored. Size this based on the total
# amount of analysis artifact data you expect to store.
volume:
dataDirs:
- name: data1
type: persistentVolumeClaim
size: 100Gi # <-- ADJUST: Set based on expected data volume
maxVolumes: 10000
# storageClass: "" # <-- Uncomment to use a specific StorageClass
# =============================================================================
# Hubble SeaweedFS Values
# =============================================================================
# S3-compatible object storage for FOSSA Hubble compliance data.
#
# After installation, the S3 endpoint is available at:
# http://hubble-seaweedfs-s3:8333
#
# See seaweedfs-values-template.yaml for installation instructions and the
# sizing guide.
# =============================================================================
nameOverride: hubble-seaweedfs
global:
serviceAccountName: hubble-seaweedfs
imagePullSecrets: test-hubble-quay.io # <-- existing pull secret created by fossa-core when release name is test -- to list existing pull secrets `kubectl get secrets --field-selector type=kubernetes.io/dockerconfigjson`
# -- S3 API Configuration
s3:
enabled: true
enableAuth: true
createBuckets:
- name: hubble.fossa.test
anonymousRead: false
# -- Master Server Storage
# Stores volume metadata and cluster topology. Lightweight; 1Gi is
# sufficient for most deployments.
master:
data:
type: persistentVolumeClaim
size: 1Gi # <-- Adjust if needed (rarely necessary)
# storageClass: "" # <-- Uncomment to use a specific StorageClass
# -- Filer Storage
# Stores the file-to-chunk mapping index. 1Gi is sufficient for most
# deployments; increase if you expect millions of objects.
filer:
data:
type: persistentVolumeClaim
size: 1Gi # <-- Adjust if needed
# storageClass: "" # <-- Uncomment to use a specific StorageClass
logs:
type: emptyDir
# -- Volume Server Storage (primary data storage)
# This is where actual file data is stored. Size this based on the total
# amount of compliance data you expect to store.
volume:
dataDirs:
- name: data1
type: persistentVolumeClaim
size: 100Gi # <-- ADJUST: Set based on expected data volume
maxVolumes: 6000
# storageClass: "" # <-- Uncomment to use a specific StorageClass
Execution
- Schedule a maintenance window
- Set the application to maintenance mode (assuming namespace is fossa)
kubectl config set-context --current --namespace=fossa
helm upgrade -i fossa fossa/fossa-core --values fossa-core-config.yml --set global.maintenanceMode.enabled=true --version "^4.0.0"- Deploy SeaweedFS for Core and Hubble
helm upgrade -i core-seaweedfs fossa/seaweedfs --values core-seaweedfs-values.yaml
helm upgrade -i hubble-seaweedfs fossa/seaweedfs --values hubble-seaweedfs-values.yaml
-
Transfer data to SeaweedFS from Core and Hubble Minio
Using the following job, we can migrate all the data from Minio to Seaweedfs:
You can update the job according to your Minio and Seaweedfs configuration:
In the following example we have set the following:
- Core:
- Bucket =
fossa.test - Minio
access_key_id=miniosecret_access_key=minio123endpoint=http://fossa-core-minio:80
- Seaweedfs:
access_key_idsecret_access_keyendpoint=http://core-seaweedfs-s3:8333
- Bucket =
- Hubble:
- Bucket =
hubble.fossa.test - Minio:
access_key_id=miniosecret_access_key=minio123endpoint=http://fossa-hubble-minio:80
- Seaweedfs:
access_key_id=seaweedfssecret_access_key=seaweedfs123endpoint=http://hubble-seaweedfs-s3:8333
- Bucket =
apiVersion: v1
kind: ConfigMap
metadata:
name: rclone-migration-config
data:
rclone.conf: |
[minio-core]
type = s3
provider = Minio
access_key_id = minio
secret_access_key = minio123
endpoint = http://fossa-core-minio:80
region = minio
[minio-hubble]
type = s3
provider = Minio
access_key_id = minio
secret_access_key = minio123
endpoint = http://fossa-hubble-minio:80
region = minio
[seaweedfs-core]
type = s3
provider = Other
access_key_id = seaweedfs
secret_access_key = seaweedfs123
endpoint = http://core-seaweedfs-s3:8333
region = us-east-1
force_path_style = true
[seaweedfs-hubble]
type = s3
provider = Other
access_key_id = seaweedfs
secret_access_key = seaweedfs123
endpoint = http://hubble-seaweedfs-s3:8333
region = us-east-1
force_path_style = true
core-bucket: "fossa.test"
hubble-bucket: "hubble.fossa.test"
migrate.sh: |
#!/bin/sh
set -e
echo "============================================"
echo " MinIO -> SeaweedFS Migration"
echo " Started: $(date -u +%Y-%m-%dT%H:%M:%SZ)"
echo "============================================"
# ---------- tuning knobs (adjust for production) ----------
# TRANSFERS: number of file transfers in parallel
# CHECKERS: number of parallel hash-checking threads
# S3_CONCURRENCY: multipart upload concurrency per file
# S3_CHUNK: multipart chunk size (bigger = fewer API calls for large files)
# BUFFER: in-memory buffer per transfer
TRANSFERS="${RCLONE_TRANSFERS:-16}"
CHECKERS="${RCLONE_CHECKERS:-8}"
S3_CONCURRENCY="${RCLONE_S3_UPLOAD_CONCURRENCY:-4}"
S3_CHUNK="${RCLONE_S3_CHUNK_SIZE:-64M}"
BUFFER="${RCLONE_BUFFER_SIZE:-32M}"
COMMON_FLAGS="--transfers=$TRANSFERS \
--checkers=$CHECKERS \
--s3-upload-concurrency=$S3_CONCURRENCY \
--s3-chunk-size=$S3_CHUNK \
--buffer-size=$BUFFER \
--stats=10s \
--stats-one-line \
--log-level=INFO \
--checksum \
--retries=3 \
--retries-sleep=5s \
--low-level-retries=10"
# ---------- pre-flight checks ----------
echo ""
echo ">>> Pre-flight: listing source buckets"
echo "--- minio-core ---"
rclone lsd minio-core: 2>&1
echo "--- minio-hubble ---"
rclone lsd minio-hubble: 2>&1
echo "--- seaweedfs-core ---"
rclone lsd seaweedfs-core: 2>&1
echo "--- seaweedfs-hubble ---"
rclone lsd seaweedfs-hubble: 2>&1
echo ""
echo ">>> Source object counts"
CORE_SRC=$(rclone size minio-core:$CORE_BUCKET --json 2>/dev/null)
HUBBLE_SRC=$(rclone size minio-hubble:$HUBBLE_BUCKET --json 2>/dev/null)
echo " core: $CORE_SRC"
echo " hubble: $HUBBLE_SRC"
# ---------- migration: core ----------
echo ""
echo "============================================"
echo " [1/2] Migrating core: minio -> seaweedfs"
echo "============================================"
eval rclone sync minio-core:$CORE_BUCKET seaweedfs-core:$CORE_BUCKET $COMMON_FLAGS 2>&1
CORE_RC=$?
echo " core migration exit code: $CORE_RC"
# ---------- migration: hubble ----------
echo ""
echo "============================================"
echo " [2/2] Migrating hubble: minio -> seaweedfs"
echo "============================================"
eval rclone sync minio-hubble:$HUBBLE_BUCKET seaweedfs-hubble:$HUBBLE_BUCKET $COMMON_FLAGS 2>&1
HUBBLE_RC=$?
echo " hubble migration exit code: $HUBBLE_RC"
# ---------- post-migration validation ----------
echo ""
echo "============================================"
echo " Post-migration validation"
echo "============================================"
echo ""
echo ">>> Destination object counts"
CORE_DST=$(rclone size seaweedfs-core:$CORE_BUCKET --json 2>/dev/null)
HUBBLE_DST=$(rclone size seaweedfs-hubble:$HUBBLE_BUCKET --json 2>/dev/null)
echo " core src: $CORE_SRC"
echo " core dst: $CORE_DST"
echo " hubble src: $HUBBLE_SRC"
echo " hubble dst: $HUBBLE_DST"
echo ""
echo ">>> Checking for differences (core)..."
rclone check minio-core:$CORE_BUCKET seaweedfs-core:$CORE_BUCKET --one-way $COMMON_FLAGS 2>&1
CORE_CHECK=$?
echo ""
echo ">>> Checking for differences (hubble)..."
rclone check minio-hubble:$HUBBLE_BUCKET seaweedfs-hubble:$HUBBLE_BUCKET --one-way $COMMON_FLAGS 2>&1
HUBBLE_CHECK=$?
echo ""
echo "============================================"
echo " Migration Summary"
echo " Finished: $(date -u +%Y-%m-%dT%H:%M:%SZ)"
echo "============================================"
echo " Core sync=$CORE_RC check=$CORE_CHECK"
echo " Hubble sync=$HUBBLE_RC check=$HUBBLE_CHECK"
if [ "$CORE_RC" -ne 0 ] || [ "$HUBBLE_RC" -ne 0 ] || [ "$CORE_CHECK" -ne 0 ] || [ "$HUBBLE_CHECK" -ne 0 ]; then
echo " STATUS: FAILED (see logs above)"
exit 1
fi
echo " STATUS: SUCCESS"
---
apiVersion: batch/v1
kind: Job
metadata:
name: rclone-minio-to-seaweedfs
spec:
backoffLimit: 2
ttlSecondsAfterFinished: 86400
template:
metadata:
labels:
app: rclone-migration
spec:
restartPolicy: OnFailure
containers:
- name: rclone
image: rclone/rclone:latest
imagePullPolicy: IfNotPresent
command: ["/bin/sh", "/scripts/migrate.sh"]
env:
- name: CORE_BUCKET
valueFrom:
configMapKeyRef:
name: rclone-migration-config
key: core-bucket
- name: HUBBLE_BUCKET
valueFrom:
configMapKeyRef:
name: rclone-migration-config
key: hubble-bucket
# --- Tune these for production (1-4TB) ---
# For Kind test, keep conservative to avoid overwhelming the node
- name: RCLONE_TRANSFERS
value: "8"
- name: RCLONE_CHECKERS
value: "4"
- name: RCLONE_S3_UPLOAD_CONCURRENCY
value: "4"
- name: RCLONE_S3_CHUNK_SIZE
value: "16M"
- name: RCLONE_BUFFER_SIZE
value: "16M"
resources:
requests:
cpu: 500m
memory: 256Mi
limits:
cpu: "2"
memory: 1Gi
volumeMounts:
- name: rclone-config
mountPath: /config/rclone
readOnly: true
- name: scripts
mountPath: /scripts
readOnly: true
volumes:
- name: rclone-config
configMap:
name: rclone-migration-config
items:
- key: rclone.conf
path: rclone.conf
- name: scripts
configMap:
name: rclone-migration-config
items:
- key: migrate.sh
path: migrate.sh
mode: 0755
kubectl config set-context --current --namespace=fossa
kubectl apply -f rclone-migration-minio-seaweedfs.yaml- Follow the progress of the migration of data from minio to Seaweedfs
kubectl logs jobs/rclone-minio-to-seaweedfs --follow- Update your values file
fossa-core-config.ymlwithiam.region,iam.kindand Seaweedfs
.
.
.
iam:
region: seaweedfs
kind: AccessKey
storage:
forcePathStyle: true
endpoint: http://core-seaweedfs-s3:8333
bucket: fossa.com
auth:
type: AccessKey
accessKey: seaweedfs
secretKey: seaweedfs123
.
.
.
hubble:
iam:
region: seaweedfs
kind: AccessKey
storage:
endpoint: http://hubble-seaweedfs-s3:8333
bucket: hubble.fossa.com
auth:
type: AccessKey
accessKey: seaweedfs
secretKey: seaweedfs123
- Upon completion of the data migration, upgrade the deployment to use Seaweedfs and disable maintenance mode
helm upgrade -i fossa fossa/fossa-core --values fossa-core-config.yml --set global.maintenanceMode.enabled=false --version "^5.0.0"- Verify your data is still available by accessing the console
- After a week or two decommission the minio deployments
helm delete fossa-core-minio
helm delete fossa-hubble-minioIf you're certain that the minio data will not be used anymore you can proceed to delete its pvc
kubectl delete pvc fossa-core-minio
kubectl delete pvc fossa-hubble-minio
Updated about 4 hours ago
