From s3bkp to VolSync

I Knew About VolSync (and Dismissed It)
When I first built s3bkp, I was aware VolSync existed. I looked at it briefly, decided it wasn’t mature enough for what I needed, and moved on. My custom tool worked, backed up PVCs to S3, restored them during blue/green cluster migrations, and I had full control over every aspect. Why fix what isn’t broken?
The Video That Changed My Mind
Then I watched Mircea Anton’s video: How I Backup My Kubernetes Cluster the GitOps way (Volsync).
Mircea’s setup looked familiar. Too familiar. He was using VolSync with Flux Kustomize Components, per-app postBuild variables, S3 storage with restic, automatic restore on first PVC provision via dataSourceRef. It was essentially the same architecture I had built with s3bkp: declarative, GitOps-native, restore-as-code. But without the 2,400 lines of bash. Without the Kyverno injection policies. Without the custom container image I had to maintain, test, and update.
That was the moment it clicked. I hadn’t just built a backup tool. I had rebuilt VolSync from scratch, with worse consistency guarantees (s3bkp backs up live data; VolSync takes a VolumeSnapshot first) and a permanent resource overhead (24/7 sidecar vs. temporary mover Jobs).
Why Not Just Use Velero?
This was the first question I asked myself. Velero is the standard Kubernetes backup tool, and I do use it for disaster recovery. But Velero’s restore model is fundamentally imperative: you run velero restore create, it restores an entire namespace (or a filtered subset), and then you deploy your app on top. That works for disaster recovery, but it doesn’t work for blue/green cluster migrations where I need restore-as-code.
What I needed was: commit an app to git, Flux deploys it on the new cluster, and the data is automatically restored from the old cluster’s backup before the app starts. No manual velero restore command. No runbook. No human in the loop. Just deploy and go.
Velero can’t do this because:
- No
dataSourceRefintegration. Velero doesn’t participate in Kubernetes volume populators. You can’t point a PVC at a Velero backup and have it auto-populate on first provision. - Restores are one-shot, not desired state. Velero does have a
RestoreCRD, and you can technically apply it declaratively. But a Restore CR is inherently one-shot and immutable: once it transitions toCompleted, it’s done. You can’t commit it to git as persistent desired state the way you would a HelmRelease. If Flux re-applies it, Velero won’t re-run it. There’s no “keep this PVC restored from this backup” primitive.
VolSync solves both. The ReplicationDestination CRD is the restore intent committed to git. The PVC’s dataSourceRef is the declarative link. Delete the PVC, Flux recreates it, VolSync repopulates it from the latest backup. That’s restore-as-code.
What VolSync Does Better
The core difference is architecture. s3bkp runs as a Kyverno-injected sidecar that lives inside every backed-up pod, permanently consuming resources. VolSync is an operator: you declare a ReplicationSource CRD, and it spawns a temporary mover Job on schedule, takes a VolumeSnapshot for consistency, backs up to S3, and then the Job terminates. No permanent sidecar. No pod-level coupling.
For restores, VolSync uses Kubernetes-native dataSourceRef volume populators. You create a PVC that points at a ReplicationDestination, and Kubernetes populates the volume from the latest backup before the PVC is even available to mount. The app literally cannot start until the restore is complete. With s3bkp, I had to carefully manage restore timing in the init container to avoid race conditions where the app would overwrite restored data.
| Aspect | s3bkp | VolSync |
|---|---|---|
| Architecture | Kyverno-injected sidecar | CRD-based operator |
| Backup consistency | Live filesystem (app is writing) | VolumeSnapshot first |
| Resource overhead | Permanent sidecar per pod | Temporary Job, then gone |
| Restore mechanism | Init container (timing-sensitive) | Volume populator (atomic) |
| Configuration | Pod labels + annotations | CRD resources |
| Maintenance | ~2,400 lines of bash | Upstream Helm chart |
| Monitoring | Custom 25+ metrics | Native operator metrics |
Evaluating the Migration
Before committing, I ran a proof of concept with a single app. The evaluation checklist was straightforward:
- Can VolSync back up a PVC to my existing S3 storage (RustFS)?
- Can it restore on a fresh PVC via
dataSourceRef? - Can it restore cross-cluster (blue cluster reading from green’s bucket)?
- Does it work with my existing Flux + Kustomize patterns?
- Can I monitor it with Prometheus?
Every box checked. The POC app was fully migrated in a single session. The Flux Kustomize Component pattern from Mircea’s setup fit naturally into my existing repo structure.
The Migration
With the POC validated, I worked through the remaining apps systematically. Each migration follows the same two-phase pattern:
Phase 1: Deploy VolSync backup alongside s3bkp (both run in parallel). Wait for the first VolSync backup to land in S3. Copy the backup to the other cluster’s bucket for cross-cluster restore readiness.
Phase 2: Remove s3bkp labels, scale the app down, refresh the VolSync restore snapshot, delete the old PVC, and let Flux recreate it with dataSourceRef. The volume populator restores data before the app starts. Verify, done.
The two-phase approach means there’s always a fallback. If VolSync’s backup fails in Phase 1, s3bkp is still running. Only after verifying VolSync works do I cut over in Phase 2.
Lessons Learned Along the Way
The migration wasn’t entirely smooth. A few things I learned:
- VolumeSnapshot before backup is not just a nice-to-have. s3bkp backed up live data, which meant occasional inconsistencies with apps that write frequently. VolSync’s snapshot-first approach eliminates this entire class of problems.
- Restore timing matters more than you think. With s3bkp, the init container could race with the app. VolSync’s
dataSourceRefpopulator runs before the PVC is even mountable, making restore atomic by design. - Kyverno mutation persistence is surprising. Removing the s3bkp label from a pod template doesn’t make Kyverno remove its previously injected containers. Crossplane and Helm’s SSA both preserve fields they don’t explicitly manage. The fix: delete the Deployment and let the controller recreate it clean.
- Multi-PVC apps need multiple Flux Kustomizations. Kustomize components can’t be included twice in the same Kustomization (upstream limitation). For apps with multiple volumes, each PVC gets its own Flux Kustomization CR pointing at the same shared component.
Current Status
All 11 apps are migrated. s3bkp’s Kyverno policies have been archived and the repository decommissioned.
I don’t regret building s3bkp. It taught me the real complexity behind backup tooling and served reliably for several months. But maintaining a custom solution when a community-supported one exists is a cost I no longer need to pay. Sometimes the best code you write is the code you eventually delete.