SeaweedFS Migration

Migration guide for moving on-premise storage backend from MiniO to SeaweedFS

Preparation

Create a SeaweedFS deployments for core and hubble according to your storage needs

Template:

# =============================================================================
# SeaweedFS Storage Values Template for FOSSA Deployment
# =============================================================================
#
# FOSSA requires two SeaweedFS instances for S3-compatible object storage:
#   1. core-seaweedfs   - Stores analysis artifacts for FOSSA Core
#   2. hubble-seaweedfs - Stores compliance data for FOSSA Hubble
#
# Adjust storage sizes based on your expected workload. See the "Sizing Guide"
# at the bottom of this file for recommendations.
#
# Prerequisites:
#   - A Kubernetes cluster with a default StorageClass configured
#   - Helm 3.x
#
# Installation:
#   helm repo add fossa https://charts.fossa.com
#   helm repo update
#
#   helm install core-seaweedfs fossa/seaweedfs \
#     --namespace <namespace> \
#     --version 4.17.0 \
#     -f core-seaweedfs-values.yaml
#
#   helm install hubble-seaweedfs fossa/seaweedfs \
#     --namespace <namespace> \
#     --version 4.17.0 \
#     -f hubble-seaweedfs-values.yaml
#
# Upgrade:
#   helm upgrade core-seaweedfs fossa/seaweedfs \
#     --namespace <namespace> \
#     --version 4.17.0 \
#     -f core-seaweedfs-values.yaml
#
#   helm upgrade hubble-seaweedfs fossa/seaweedfs \
#     --namespace <namespace> \
#     --version 4.17.0 \
#     -f hubble-seaweedfs-values.yaml
#
# =============================================================================


# =============================================================================
# Sizing Guide
# =============================================================================
#
# The main values to adjust are volume.dataDirs[].size fields. These control
# how much raw data each SeaweedFS instance can store.
#
# Recommended volume sizes by deployment scale:
#
#   Deployment Size         | Core volume.size | Hubble volume.size | Notes
#   ------------------------|------------------|--------------------|-------------------
#   Small  (<100 projects)  |  50Gi            |  25Gi              | Dev/eval environments
#   Medium (<500 projects)  | 100Gi            |  50Gi              | Standard production
#   Large  (<2000 projects) | 250Gi            | 100Gi              | Large organizations
#   XLarge (2000+ projects) | 500Gi+           | 250Gi+             | Enterprise scale
#
# Other parameters:
#
#   master.data.size   - Volume metadata. 1Gi is sufficient for all but the
#                        largest deployments (10k+ volumes).
#
#   filer.data.size    - File index. 1Gi handles millions of objects. Increase
#                        to 5-10Gi only if you expect tens of millions of files.
#
#   maxVolumes         - Upper limit on the number of logical volumes SeaweedFS
#                        can create. The defaults (10000 for core, 6000 for
#                        hubble) are generous and rarely need adjustment.
#
#   storageClass       - Uncomment and set if you need a specific Kubernetes
#                        StorageClass (e.g., for SSD-backed storage, specific
#                        provisioners, or cloud-specific volume types). If left
#                        commented out, the cluster default StorageClass is used.
#
# Storage can be expanded after initial deployment by increasing the PVC size,
# provided your StorageClass supports volume expansion
# (allowVolumeExpansion: true).
# =============================================================================

Example:

# =============================================================================
# Core SeaweedFS Values
# =============================================================================
# S3-compatible object storage for FOSSA Core analysis artifacts.
#
# After installation, the S3 endpoint is available at:
#   http://core-seaweedfs-s3:8333
#
# See seaweedfs-values-template.yaml for installation instructions and the
# sizing guide.
# =============================================================================

nameOverride: core-seaweedfs

global:
	serviceAccountName: core-seaweedfs
	imagePullSecrets: test-fossa-core-quay.io # <-- existing pull secret created by fossa-core when release name is test -- to list existing pull secrets `kubectl get secrets --field-selector type=kubernetes.io/dockerconfigjson`

# -- S3 API Configuration
s3:
  enabled: true
  enableAuth: true
  createBuckets:
    - name: fossa.test
      anonymousRead: false

# -- Master Server Storage
# Stores volume metadata and cluster topology. Lightweight; 1Gi is
# sufficient for most deployments.
master:
  data:
    type: persistentVolumeClaim
    size: 1Gi                            # <-- Adjust if needed (rarely necessary)
    # storageClass: ""                   # <-- Uncomment to use a specific StorageClass

# -- Filer Storage
# Stores the file-to-chunk mapping index. 1Gi is sufficient for most
# deployments; increase if you expect millions of objects.
filer:
  data:
    type: persistentVolumeClaim
    size: 1Gi                            # <-- Adjust if needed
    # storageClass: ""                   # <-- Uncomment to use a specific StorageClass
  logs:
    type: emptyDir

# -- Volume Server Storage (primary data storage)
# This is where actual file data is stored. Size this based on the total
# amount of analysis artifact data you expect to store.
volume:
  dataDirs:
    - name: data1
      type: persistentVolumeClaim
      size: 100Gi                        # <-- ADJUST: Set based on expected data volume
      maxVolumes: 10000
      # storageClass: ""                 # <-- Uncomment to use a specific StorageClass
# =============================================================================
# Hubble SeaweedFS Values
# =============================================================================
# S3-compatible object storage for FOSSA Hubble compliance data.
#
# After installation, the S3 endpoint is available at:
#   http://hubble-seaweedfs-s3:8333
#
# See seaweedfs-values-template.yaml for installation instructions and the
# sizing guide.
# =============================================================================

nameOverride: hubble-seaweedfs

global:
	serviceAccountName: hubble-seaweedfs
	imagePullSecrets: test-hubble-quay.io # <-- existing pull secret created by fossa-core when release name is test -- to list existing pull secrets `kubectl get secrets --field-selector type=kubernetes.io/dockerconfigjson`


# -- S3 API Configuration
s3:
  enabled: true
  enableAuth: true
  createBuckets:
    - name: hubble.fossa.test
      anonymousRead: false

# -- Master Server Storage
# Stores volume metadata and cluster topology. Lightweight; 1Gi is
# sufficient for most deployments.
master:
  data:
    type: persistentVolumeClaim
    size: 1Gi                            # <-- Adjust if needed (rarely necessary)
    # storageClass: ""                   # <-- Uncomment to use a specific StorageClass

# -- Filer Storage
# Stores the file-to-chunk mapping index. 1Gi is sufficient for most
# deployments; increase if you expect millions of objects.
filer:
  data:
    type: persistentVolumeClaim
    size: 1Gi                            # <-- Adjust if needed
    # storageClass: ""                   # <-- Uncomment to use a specific StorageClass
  logs:
    type: emptyDir

# -- Volume Server Storage (primary data storage)
# This is where actual file data is stored. Size this based on the total
# amount of compliance data you expect to store.
volume:
  dataDirs:
    - name: data1
      type: persistentVolumeClaim
      size: 100Gi                         # <-- ADJUST: Set based on expected data volume
      maxVolumes: 6000
      # storageClass: ""                 # <-- Uncomment to use a specific StorageClass

Execution

  1. Schedule a maintenance window
  2. Set the application to maintenance mode (assuming namespace is fossa)
kubectl config set-context --current --namespace=fossa
helm upgrade -i fossa fossa/fossa-core --values fossa-core-config.yml --set global.maintenanceMode.enabled=true --version "^4.0.0"
  1. Deploy SeaweedFS for Core and Hubble
helm upgrade -i core-seaweedfs fossa/seaweedfs --values core-seaweedfs-values.yaml
helm upgrade -i hubble-seaweedfs fossa/seaweedfs --values hubble-seaweedfs-values.yaml
  1. Transfer data to SeaweedFS from Core and Hubble Minio

    Using the following job, we can migrate all the data from Minio to Seaweedfs:

You can update the job according to your Minio and Seaweedfs configuration:

In the following example we have set the following:

  • Core:
    • Bucket = fossa.test
    • Minio
      • access_key_id = minio
      • secret_access_key = minio123
      • endpoint= http://fossa-core-minio:80
    • Seaweedfs:
      • access_key_id
      • secret_access_key
      • endpoint = http://core-seaweedfs-s3:8333
  • Hubble:
    • Bucket = hubble.fossa.test
    • Minio:
      • access_key_id = minio
      • secret_access_key = minio123
      • endpoint= http://fossa-hubble-minio:80
    • Seaweedfs:
      • access_key_id = seaweedfs
      • secret_access_key = seaweedfs123
      • endpoint = http://hubble-seaweedfs-s3:8333
apiVersion: v1
kind: ConfigMap
metadata:
  name: rclone-migration-config
data:
  rclone.conf: |
    [minio-core]
    type = s3
    provider = Minio
    access_key_id = minio
    secret_access_key = minio123
    endpoint = http://fossa-core-minio:80
    region = minio

    [minio-hubble]
    type = s3
    provider = Minio
    access_key_id = minio
    secret_access_key = minio123
    endpoint = http://fossa-hubble-minio:80
    region = minio

    [seaweedfs-core]
    type = s3
    provider = Other
    access_key_id = seaweedfs 
    secret_access_key = seaweedfs123  
    endpoint = http://core-seaweedfs-s3:8333
    region = us-east-1
    force_path_style = true

    [seaweedfs-hubble]
    type = s3
    provider = Other
    access_key_id = seaweedfs
    secret_access_key = seaweedfs123 
    endpoint = http://hubble-seaweedfs-s3:8333
    region = us-east-1
    force_path_style = true

  core-bucket: "fossa.test"
  hubble-bucket: "hubble.fossa.test"
  migrate.sh: |
    #!/bin/sh
    set -e

    echo "============================================"
    echo "  MinIO -> SeaweedFS Migration"
    echo "  Started: $(date -u +%Y-%m-%dT%H:%M:%SZ)"
    echo "============================================"

    # ---------- tuning knobs (adjust for production) ----------
    # TRANSFERS:  number of file transfers in parallel
    # CHECKERS:   number of parallel hash-checking threads
    # S3_CONCURRENCY: multipart upload concurrency per file
    # S3_CHUNK:   multipart chunk size (bigger = fewer API calls for large files)
    # BUFFER:     in-memory buffer per transfer
    TRANSFERS="${RCLONE_TRANSFERS:-16}"
    CHECKERS="${RCLONE_CHECKERS:-8}"
    S3_CONCURRENCY="${RCLONE_S3_UPLOAD_CONCURRENCY:-4}"
    S3_CHUNK="${RCLONE_S3_CHUNK_SIZE:-64M}"
    BUFFER="${RCLONE_BUFFER_SIZE:-32M}"

    COMMON_FLAGS="--transfers=$TRANSFERS \
      --checkers=$CHECKERS \
      --s3-upload-concurrency=$S3_CONCURRENCY \
      --s3-chunk-size=$S3_CHUNK \
      --buffer-size=$BUFFER \
      --stats=10s \
      --stats-one-line \
      --log-level=INFO \
      --checksum \
      --retries=3 \
      --retries-sleep=5s \
      --low-level-retries=10"

    # ---------- pre-flight checks ----------
    echo ""
    echo ">>> Pre-flight: listing source buckets"
    echo "--- minio-core ---"
    rclone lsd minio-core: 2>&1
    echo "--- minio-hubble ---"
    rclone lsd minio-hubble: 2>&1
    echo "--- seaweedfs-core ---"
    rclone lsd seaweedfs-core: 2>&1
    echo "--- seaweedfs-hubble ---"
    rclone lsd seaweedfs-hubble: 2>&1

    echo ""
    echo ">>> Source object counts"
    CORE_SRC=$(rclone size minio-core:$CORE_BUCKET --json 2>/dev/null)
    HUBBLE_SRC=$(rclone size minio-hubble:$HUBBLE_BUCKET --json 2>/dev/null)
    echo "  core:   $CORE_SRC"
    echo "  hubble: $HUBBLE_SRC"

    # ---------- migration: core ----------
    echo ""
    echo "============================================"
    echo "  [1/2] Migrating core: minio -> seaweedfs"
    echo "============================================"
    eval rclone sync minio-core:$CORE_BUCKET seaweedfs-core:$CORE_BUCKET $COMMON_FLAGS 2>&1
    CORE_RC=$?
    echo "  core migration exit code: $CORE_RC"

    # ---------- migration: hubble ----------
    echo ""
    echo "============================================"
    echo "  [2/2] Migrating hubble: minio -> seaweedfs"
    echo "============================================"
    eval rclone sync minio-hubble:$HUBBLE_BUCKET seaweedfs-hubble:$HUBBLE_BUCKET $COMMON_FLAGS 2>&1
    HUBBLE_RC=$?
    echo "  hubble migration exit code: $HUBBLE_RC"

    # ---------- post-migration validation ----------
    echo ""
    echo "============================================"
    echo "  Post-migration validation"
    echo "============================================"

    echo ""
    echo ">>> Destination object counts"
    CORE_DST=$(rclone size seaweedfs-core:$CORE_BUCKET --json 2>/dev/null)
    HUBBLE_DST=$(rclone size seaweedfs-hubble:$HUBBLE_BUCKET --json 2>/dev/null)
    echo "  core src:    $CORE_SRC"
    echo "  core dst:    $CORE_DST"
    echo "  hubble src:  $HUBBLE_SRC"
    echo "  hubble dst:  $HUBBLE_DST"

    echo ""
    echo ">>> Checking for differences (core)..."
    rclone check minio-core:$CORE_BUCKET seaweedfs-core:$CORE_BUCKET --one-way $COMMON_FLAGS 2>&1
    CORE_CHECK=$?

    echo ""
    echo ">>> Checking for differences (hubble)..."
    rclone check minio-hubble:$HUBBLE_BUCKET seaweedfs-hubble:$HUBBLE_BUCKET --one-way $COMMON_FLAGS 2>&1
    HUBBLE_CHECK=$?

    echo ""
    echo "============================================"
    echo "  Migration Summary"
    echo "  Finished: $(date -u +%Y-%m-%dT%H:%M:%SZ)"
    echo "============================================"
    echo "  Core   sync=$CORE_RC   check=$CORE_CHECK"
    echo "  Hubble sync=$HUBBLE_RC check=$HUBBLE_CHECK"

    if [ "$CORE_RC" -ne 0 ] || [ "$HUBBLE_RC" -ne 0 ] || [ "$CORE_CHECK" -ne 0 ] || [ "$HUBBLE_CHECK" -ne 0 ]; then
      echo "  STATUS: FAILED (see logs above)"
      exit 1
    fi

    echo "  STATUS: SUCCESS"
---
apiVersion: batch/v1
kind: Job
metadata:
  name: rclone-minio-to-seaweedfs
spec:
  backoffLimit: 2
  ttlSecondsAfterFinished: 86400
  template:
    metadata:
      labels:
        app: rclone-migration
    spec:
      restartPolicy: OnFailure
      containers:
      - name: rclone
        image: rclone/rclone:latest
        imagePullPolicy: IfNotPresent
        command: ["/bin/sh", "/scripts/migrate.sh"]
        env:
        - name: CORE_BUCKET
          valueFrom:
            configMapKeyRef:
              name: rclone-migration-config
              key: core-bucket
        - name: HUBBLE_BUCKET
          valueFrom:
            configMapKeyRef:
              name: rclone-migration-config
              key: hubble-bucket
        # --- Tune these for production (1-4TB) ---
        # For Kind test, keep conservative to avoid overwhelming the node
        - name: RCLONE_TRANSFERS
          value: "8"
        - name: RCLONE_CHECKERS
          value: "4"
        - name: RCLONE_S3_UPLOAD_CONCURRENCY
          value: "4"
        - name: RCLONE_S3_CHUNK_SIZE
          value: "16M"
        - name: RCLONE_BUFFER_SIZE
          value: "16M"
        resources:
          requests:
            cpu: 500m
            memory: 256Mi
          limits:
            cpu: "2"
            memory: 1Gi
        volumeMounts:
        - name: rclone-config
          mountPath: /config/rclone
          readOnly: true
        - name: scripts
          mountPath: /scripts
          readOnly: true
      volumes:
      - name: rclone-config
        configMap:
          name: rclone-migration-config
          items:
          - key: rclone.conf
            path: rclone.conf
      - name: scripts
        configMap:
          name: rclone-migration-config
          items:
          - key: migrate.sh
            path: migrate.sh
            mode: 0755

kubectl config set-context --current --namespace=fossa
kubectl apply -f rclone-migration-minio-seaweedfs.yaml

  1. Follow the progress of the migration of data from minio to Seaweedfs
kubectl logs jobs/rclone-minio-to-seaweedfs --follow

  1. Update your values file fossa-core-config.yml with iam.region, iam.kind and Seaweedfs
.
.
.

iam:
  region: seaweedfs
  kind: AccessKey

storage:
  forcePathStyle: true
  endpoint: http://core-seaweedfs-s3:8333
  bucket: fossa.com
  auth:
    type: AccessKey
    accessKey: seaweedfs
    secretKey: seaweedfs123
.
.
.
hubble:
  iam:
    region: seaweedfs
    kind: AccessKey    

  storage:
    endpoint: http://hubble-seaweedfs-s3:8333
    bucket: hubble.fossa.com
    auth:
      type: AccessKey
      accessKey: seaweedfs
      secretKey: seaweedfs123
  1. Upon completion of the data migration, upgrade the deployment to use Seaweedfs and disable maintenance mode
helm upgrade -i fossa fossa/fossa-core --values fossa-core-config.yml --set global.maintenanceMode.enabled=false --version "^5.0.0"
  1. Verify your data is still available by accessing the console
  2. After a week or two decommission the minio deployments
helm delete fossa-core-minio
helm delete fossa-hubble-minio

If you're certain that the minio data will not be used anymore you can proceed to delete its pvc

kubectl delete pvc fossa-core-minio
kubectl delete pvc fossa-hubble-minio