Changelog
All notable changes to HyperCache are recorded here. The format follows Keep a Changelog, and the project adheres to Semantic Versioning.
[Unreleased]¶
Added¶
- Client API auth v2: multi-token, scoped, mTLS-capable. New
pkg/httpauth/package withPolicy,TokenIdentity,CertIdentity,Scopetypes and a scope-enforcing fiber middleware. Replaces the single-token bearerAuth helper incmd/hypercache-server/main.go. Three credential classes resolved in priority order (bearer → mTLS cert → ServerVerify hook), with constant-time multi-token compare that visits every configured token even on early match to prevent token-cardinality timing leaks. Per-route scope enforcement:GET/HEAD/owners-lookup/batch-getrequireScopeRead;PUT/DELETE/batch-put/batch-deleterequireScopeWrite. Anonymous identity (withAllowAnonymous: true) receives all scopes — used by the binary to preserve the zero-config dev posture. - YAML auth config + legacy env-var coexistence.
HYPERCACHE_AUTH_CONFIG=/etc/hypercache/auth.yaml(new) loads a multi-token policy with per-identity scopes:
tokens:
- id: app-prod
token: "<secret>"
scopes: [read, write]
- id: ops
token: "<secret>"
scopes: [admin]
cert_identities:
- subject_cn: app.internal
scopes: [read]
allow_anonymous: false
The legacy HYPERCACHE_AUTH_TOKEN keeps working byte-identical:
one synthesized identity with all three scopes. The two env
vars are NOT mutually exclusive — HYPERCACHE_AUTH_CONFIG
governs the client API, HYPERCACHE_AUTH_TOKEN continues to
drive the dist transport's symmetric peer auth (single trust
domain). Both can be set in the same deployment without
conflict. Missing or malformed config files exit the binary
non-zero rather than fall through to permissive open mode —
fail-closed by design.
- mTLS on the client API. New env vars
HYPERCACHE_API_TLS_CERT, HYPERCACHE_API_TLS_KEY, and
HYPERCACHE_API_TLS_CLIENT_CA wrap the listener with
tls.NewListener. With CA set, RequireAndVerifyClientCert
is enabled and the verified peer cert's Subject CN is matched
against the policy's CertIdentities to resolve the calling
identity. Plaintext, standard-TLS, and mTLS shapes all share
one listener path. End-to-end coverage at
cmd/hypercache-server/mtls_e2e_test.go
drives a real handshake against an in-process CA / server-cert
/ client-cert chain and asserts CN-to-identity resolution
works in both directions (matching CN → 200, non-matching
CN → 401).
Security¶
- Constant-time bearer-token compare on the client API. Replaced
the plaintext
got != wantcheck at cmd/hypercache-server/main.go withcrypto/subtle.ConstantTimeCompareto defeat timing side-channels. A naive string compare returns as soon as the first differing byte is found, leaking per-byte equality ofHYPERCACHE_AUTH_TOKENto a remote attacker who can measure response time. The fix mirrors the dist transport's existing constant-time check at pkg/backend/dist_http_server.go:144-152. No public API change; the env-var contract and "empty token → open mode" back-compatible behavior are unchanged. New auth-test suite at cmd/hypercache-server/auth_test.go pins the contract: missing/wrong/malformed/lowercase/wrong-length bearer headers all return 401, public meta routes (/healthz,/v1/openapi.yaml) stay reachable without credentials, every protected route fires the wrapper. The newnewAuthedServerhelper drivesregisterClientRoutesdirectly so future wiring regressions are caught (the existinghandlers_test.go::newTestServerdeliberately bypasses auth for handler-correctness coverage).
Added¶
- OpenAPI 3.1 specification + drift-detection. The
hypercache-serverbinary now embeds its own contract viacmd/hypercache-server/openapi.yaml(//go:embed) and serves it atGET /v1/openapi.yaml— every running node is self-describing. The spec covers all nine client routes (single-key PUT/GET/HEAD/DELETE, owners lookup, three batch operations, plus the/healthzand/v1/openapi.yamlmeta endpoints), with reusableErrorResponse,ItemEnvelope, and batch-operation schemas, thebearerAuthsecurity scheme, andoperationIdon every operation for codegen-friendliness. A drift detector at cmd/hypercache-server/openapi_test.go drivesregisterClientRoutesdirectly and asserts every fiber-registered route has a matching path in the spec — and vice-versa — so the contract cannot silently fall out of sync with the binary. Two CI workflows back this up at .github/workflows/openapi.yml:redocly lintvalidates the schema against the OpenAPI 3.1 meta-spec, and the Go drift test runs on every change tomain.goor the spec. The docs site renders the same spec inline at the new API Reference page via themkdocs-swagger-ui-tagplugin — a single source of truth for the binary, the docs, and any client codegen that points at a live cluster. - Documentation site on GitHub Pages, built with MkDocs Material
and published automatically on every push to
main. Eight navigated pages — landing, quickstart, 5-node cluster tutorial, Helm chart guide, server-binary reference, distributed-backend architecture, operations runbook, RFC index — plus the CHANGELOG and thecmd/hypercache-server/README.mdpulled in via the include-markdown plugin so they don't drift. A build-time hook at_mkdocs/hooks.pyrewrites repo-relative source-code references (../pkg/foo.go) into canonical GitHub URLs so the same markdown renders correctly both on github.com and on the rendered Pages site. Workflow at.github/workflows/docs.ymlbuilds with--stricton every PR (catches broken docs-internal links on submission) and deploys viaactions/deploy-pages@v4on main pushes. The README now links to the rendered site. Polishing pass on the existing markdown surface: relaxedmdlrules that fight MkDocs/frontmatter idioms (MD041 for YAML frontmatter pages, MD010 for Go's tab-in-code-blocks convention, MD033/MD032 for Material's grid-cards HTML). - Richer client API — metadata inspection, JSON envelopes, batch
operations. Three additions to the
cmd/hypercache-serverHTTP surface:HEAD /v1/cache/:keyreturns the value's metadata inX-Cache-*response headers (Version, Origin, Last-Updated, TTL-Ms, Expires-At, Owners, Node) with no body — fast existence + TTL inspection without paying the value-transfer cost. 200 if present, 404 if not.GET /v1/cache/:keynow honorsAccept: application/jsonand returns anitemEnvelopewith the same metadata as HEAD plus the base64-encoded value. The bare-curldefault remains raw bytes viaapplication/octet-stream— current clients are unaffected.POST /v1/cache/batch/{get,put,delete}enable bulk operations in a single round-trip. Each request carries an array; the response carries one result entry per item with per-item status, owners, and error reporting.batch-putitems accept either UTF-8 strings (default) or base64-encoded byte payloads viavalue_encoding: "base64". Per-item errors are surfaced inerror+codefields without failing the whole batch. Six unit tests at cmd/hypercache-server/handlers_test.go pin the contracts: HEAD present/missing, Accept-JSON envelope shape, default-raw round-trip, mixed-encoding batch-put, batch-get found/missing, batch-delete cycle.
- SWIM self-refutation + cross-process gossip dissemination.
Closes the last
experimentalmarker on the heartbeat path. Three pieces:acceptGossipself-refute — incoming entries that reference the local node as Suspect or Dead at incarnation ≥ ours now bump the local incarnation and re-mark Alive. Higher-incarnation-wins propagation in the same function disseminates the refutation cluster-wide, so a falsely- suspected node can clear suspicion through gossip alone (pre-fix the only path was a fresh probe).- HTTP gossip wire — new
Gossip(ctx, targetID, members)method onDistTransport, newPOST /internal/gossipserver endpoint (auth-wrapped), newGossipMemberwire DTO.runGossipTicknow falls through to the HTTP path when the transport isn't anInProcessTransport, so cross-process clusters disseminate membership state — pre-Phase-E this was an in-process-only no-op. - The
experimentalqualifier is removed fromheartbeatLoop's comment + the heartbeat-section field doc; SWIM-style indirect probes (Phase B.1) and self-refutation (this round) together provide the SWIM properties the marker was tracking. Regression coverage at tests/integration/dist_swim_refute_test.go:TestDistSWIM_HTTPGossipExchangeexercises the wire (A pushes membership to B over HTTP; B's view converges),TestDistSWIM_SelfRefutedrives a forged "you are suspect" gossip into a node's/internal/gossipand asserts the local incarnation bumps + state returns to Alive.
- End-to-end resilience test at
scripts/tests/20-test-cluster-resilience.sh
— kills a docker container mid-run, asserts the surviving 4
nodes still serve every previously-written key AND every key
written during the outage, then restarts the killed node and
asserts it converges on the full state within 60 s. Validates
Phase B.2 (hint-replay) and the post-restart anti-entropy
paths against the actual docker network — a class of bugs
in-process tests can't reach. 24 assertions across 6 phases.
Wired into both
make test-cluster(runs after the smoke, exit-code-propagated through the same teardown trap) and theclusterCI workflow as a follow-up step. - Cross-process cluster smoke in CI —
.github/workflows/cluster.yml boots
the 5-node
docker-compose.cluster.ymlstack on every PR/push, waits for/healthzon every node, then runs the assertion script at scripts/tests/10-test-cluster-api.sh. Container logs are dumped on failure for debuggability without a re-run. This catches the class of bugs that escaped the previous PR (factory dropped DistMemoryOptions, seeds without IDs, json.RawMessage on non-owner GET) — none would have been detected by unit/integration tests because they only exercised in-process behavior. make test-clusterMakefile target mirrors the CI flow for local development: brings the cluster up, waits, runs the smoke, and tears down on the way out (preserving the smoke's exit code).scripts/tests/wait-for-cluster.shis the polling helper that blocks until every node's/healthzreturns 200, with a default 30-second deadline configurable viaTIMEOUT_SECS. Used by both the Makefile and the CI workflow so the assertion script downstream never races the listener bind.scripts/tests/10-test-cluster-api.shhardened from a print-only smoke into a real regression test: 17 explicit assertions across propagation / wire-encoding / cross-node delete, color-codedOK/FAILoutput, exit code reflects total failure count.cmd/hypercache-server/main_test.go— fast Go unit tests pinning the wire-encoding contracts onwriteValue/decodeBase64Bytes. Covers[]byte(writer path),string(replica path),json.RawMessage(non-owner-GET path), and the base64-heuristic length floors. Runs without docker for tight feedback during development.- GitHub Release automation —
.github/workflows/release.yml
triggers on
v*.*.*tag pushes and creates the GitHub Release page viasoftprops/action-gh-release@v2. The release body pins readers to the matching container image tag in GHCR and the CHANGELOG.md at that ref; PR-since-previous-tag notes are appended automatically. Pre-release tags (v1.2.3-rc1,v1.2.3-beta) are flagged via theprereleasefield;workflow_dispatchlets operators (re-)create a release for an existing tag without re-tagging. - Helm chart for k8s deployment at
chart/hypercache/. Renders into a
StatefulSet (stable per-pod hostnames so the
id@addrseed list resolves deterministically), a headless Service for peer DNS, separate client and management Services, an optional chart-managed Secret for the auth token (or external Secret reference for production rotation), a PodDisruptionBudget (defaultminAvailable: 4), pod anti-affinity, and a hardened pod security context (non-root, read-only rootfs, all caps dropped). The ServiceAccount + Service + StatefulSet composition matches whathelm installemits viahelm lintandhelm templateagainst any kube-version. Configure cluster size, replication factor, capacity, heartbeat, hint TTL, rebalance interval, and resources via standard Helm values — see chart/hypercache/values.yaml for the full surface. - Pre-commit excludes Helm templates from
check-yamlandyamllint. Both validators choke on Go-template{{ ... }}syntax inside the chart manifests;helm lintis the right validator for those, and CI runs that separately. - Multi-arch container image workflow —
.github/workflows/image.yml builds
the
hypercache-serverDocker image forlinux/amd64andlinux/arm64via buildx + QEMU, publishing to GHCR (ghcr.io/<owner>/<repo>/hypercache-server). PR triggers build-only (no registry pollution),mainpushes publish:mainand:sha-<short>, semver tag pushes (v*.*.*) publish:v1.2.3,:1.2.3,:1.2,:1, and:latest.:latestis deliberately restricted to semver tag pushes — production deployments pinning:latestalways get a stable release, never an in-flightmaincommit. GHA cache speeds re-builds when only Go source has changed.
Fixed¶
- Cluster propagation was completely broken. The
DistMemoryBackendConstructor.Createfactory infactory.gosilently discardedcfg.DistMemoryOptionsand calledbackend.NewDistMemory(ctx)with no arguments. EveryWithDistNode,WithDistSeeds,WithDistReplication, etc. that callers wired throughhypercache.NewConfigwas a silent no-op, leaving every node with a default standalone configuration that only knew itself. The factory now forwardscfg.DistMemoryOptions...like every other backend constructor does. This was the production-blocking bug — a Set on one node never reached its peers because the other nodes weren't actually in any node's ring. - Seed addresses without node IDs produced a broken ring.
initStandaloneMembershipadded every seed to membership with an emptyNodeID, so the consistent-hash ring was built over empty-string owners.Setwould resolve owners as["", "", "self"], fan-outs to""failed withErrBackendNotFound, the writer self-promoted, and the data never reached its peers. The HTTP transport has no node-discovery protocol, so the only way to populate node IDs in the ring is at configuration time. Seeds now accept an optionalid@addrsyntax (node-2@hypercache-2:7946) — bareaddrkeeps the legacy empty-ID behavior for in-process tests. Production deployments must useid@addr. Removefrom a non-primary owner skipped the primary.removeImplcheckeddm.ownsKeyInternal(key)(true for any ring owner) and ranapplyRemovelocally — butapplyRemove's fan-out only coversowners[1:]under the assumption the caller isowners[0]. When a replica initiated the remove, the primary never got the delete. The Remove path now mirrors Set: non-primary callers forward to the primary, primary applies + fans out. Tombstones now propagate cluster-wide regardless of which node receives the DELETE.- Client API responses were unhelpful. Set/Remove returned
204 No Contentwith empty bodies; errors were raw text viaSendString. Replaced with structured JSON: PUT/DELETE return{key, stored|deleted, bytes, node, owners}so operators can immediately see where the value landed; errors return{error, code}with stable code strings (BAD_REQUEST,NOT_FOUND,DRAINING,INTERNAL). AddedGET /v1/owners/:keyfor client-side ring visibility. - GET response leaked base64 on replicas.
[]bytevalues round-trip through JSON as base64 strings; replica nodes that received a value via the dist HTTP transport stored it as astringand returned it raw, so aPUT worldon node-A resulted ind29ybGQ=fromGETon node-B. The client GET handler now base64-decodes string values when they look like valid byte content, restoring writer-receiver symmetry. -
GET on non-owner nodes returned a JSON-quoted base64 string. The dist HTTP transport's
decodeGetBodydecodesItem.Valueasjson.RawMessageto preserve wire-bytes type fidelity. The client GET handler's type switch only matched[]byteandstring, so non-owner GETs (which always go through the forward-fetch path) fell to thedefaultbranch and re-emitted the value as JSON — producing"d29ybGQ="instead ofworld. Added an explicitjson.RawMessagecase that interprets the raw JSON as a string when possible, then base64-decodes if applicable. Verified end-to-end against the 5-node Docker cluster where two of the five nodes are non-owners for any given key. -
Race in
queueHintbetween hint enqueue and hint replay. Pre-fix, the metric writedm.metrics.hintedBytes.Store(dm.hintBytes)happened after releasinghintsMu, so a concurrentadjustHintAccountingcall from the replay loop could race the read. Capturing the value under the lock closes the race. Surfaced when migration failures began funneling throughqueueHint(Phase B.2 below) — previously the migration path swallowed errors silently, so the hint enqueue rate from rebalance ticks was much lower.
Added (earlier in this cycle)¶
- Structured logging on the dist backend. New
WithDistLogger(*slog.Logger)option wires a structured logger into the dist backend's background loops (heartbeat, hint replay, rebalance, merkle sync) and operational error surfaces (HTTP listener bind failures, serve-goroutine exits, failed migrations during rebalance, dropped hints, peer state transitions). Library default is silent —WithDistLoggernot called installs aslog.DiscardHandlerso the dist backend never writes to stderr unless the caller opts in. Every record is pre-bound withcomponent=dist_memoryandnode_id=<id>attributes for grep/filter. Phase A.1 of the production-readiness work. - OpenTelemetry tracing on the dist backend. New
WithDistTracerProvider(trace.TracerProvider)option opens spans on every publicGet/Set/Remove, with child spans (dist.replicate.set/dist.replicate.remove) per peer during fan-out. Span attributes includecache.key.length,dist.consistency,dist.owners.count,dist.acks,cache.hit, andpeer.id. Cache key values are intentionally never recorded on spans — keys can be PII (user IDs, session tokens). Library default is a no-op tracer (noop.NewTracerProvider), so spans cost nothing unless the caller opts in. NewConsistencyLevel.String()method renders consistency levels human-readably for log/span attrs. Phase A.2 of the production-readiness work. - OpenTelemetry metrics on the dist backend. New
WithDistMeterProvider(metric.MeterProvider)option registers an observable instrument for every field onDistMetrics— counters for cumulative totals (dist.write.attempts,dist.forward.*,dist.hinted.*,dist.merkle.syncs,dist.rebalance.*, etc.), gauges for current state (dist.members.alive,dist.tombstones.active,dist.hinted.bytes, last-operation latencies in nanoseconds, etc.). A single registered callback observes all instruments from oneMetrics()snapshot per collection cycle, so there is no per-operation overhead beyond the existing atomic counters. Names use thedist.prefix so a Prometheus exporter renders them under a single subsystem.Stopunregisters the callback so the SDK does not invoke it against a stopped backend. Library default is a no-op meter, so metrics cost nothing unless the caller opts in. Phase A.3 of the production-readiness work. - SWIM-style indirect heartbeat probes. New
WithDistIndirectProbes(k, timeout)option enables the indirect- probe refutation path: when a direct heartbeat to a peer fails, this node askskrandom alive peers to probe the target on its behalf, and only marks the target suspect if every relay also fails. Filters caller-side network blips (transient NIC reset, single stuck connection in this node's pool) that would otherwise cause spurious suspect/dead transitions. New transport methodIndirectHealth(ctx, relayNodeID, targetNodeID)and HTTP endpointGET /internal/probe?target=<id>carry the probe; auth-wrapped identically to the rest of/internal/*. New metricsdist.heartbeat.indirect_probe.success,.failure,.refutedexpose probe outcomes.k = 0(default) preserves the pre-Phase-B behavior. Phase B.1 of the production-readiness work — note that the heartbeat path still carries theexperimentalmarker until self-refutation via incarnation-disseminating gossip lands in a later phase. - Migration failures now retry through the hint queue. When a
rebalance forwards a key to its new primary and the transport
returns any error (not just
ErrBackendNotFound), the item is enqueued onto the existing hint-replay queue keyed by the new primary, instead of being logged and dropped. The hint-replay loop drains it on its configured schedule until the hint TTL expires. Same broadening applies to thereplicateTofan-out on the primarySetpath — transient HTTP failures (timeout, 5xx, connection reset) no longer silently drop replicas. Phase B.2 of the production-readiness work. - On-wire compression for the dist HTTP transport. New
DistHTTPLimits.CompressionThresholdfield opts the auto-created HTTP client into gzip-compressing Set request bodies whose serialized payload exceeds the configured byte threshold. The client setsContent-Encoding: gzipand the server transparently decompresses (via fiber v3's auto-decodingBody()). Threshold0(default) preserves the pre-Phase-B wire format byte-for-byte. Operators on bandwidth-constrained links with values above ~1 KiB typically see meaningful reductions; below-threshold values pay no compression cost. Roll out the threshold to all peers before raising it on any peer — a server with compression disabled will reject a gzip body with HTTP 400. Phase B.3 of the production-readiness work. - Drain endpoint for graceful shutdown. New
DistMemory.Drain(ctx)method andPOST /dist/drainHTTP endpoint mark the node for shutdown:/healthreturns 503 so load balancers stop routing,Set/Removereturnsentinel.ErrDraining,Getcontinues to serve so in-flight reads complete. NewIsDraining()accessor for dashboards. New metricdist.drainsrecords transitions. Drain is one-way and idempotent. Phase C.1 of the production-readiness work. - Cursor-based key enumeration replaces the pre-Phase-C
testing-only
/internal/keysendpoint. The endpoint now returns shard-level pages with anext_cursortoken; clients walk the cursor chain to enumerate the full key set. New?limit=<n>query parameter truncates within a shard for clusters with very large shards (response then carriestruncated=trueand the samenext_cursor). TheDistHTTPTransport.ListKeyshelper now walks pages internally so existing callers (anti-entropy fallback, tests) keep their full-set semantics unchanged. Phase C.2 of the production-readiness work. - Operations runbook at docs/operations.md covering split-brain, hint-queue overflow, rebalance under load, replica loss, observability wiring (logger/tracer/meter), drain procedure, and capacity-planning notes. Cross-links each failure mode to the metrics that surface it. Phase C.3 of the production-readiness work.
- Production server binary at
cmd/hypercache-server. Wraps DistMemory via HyperCache and exposes three HTTP listeners per node: the client REST API (PUT/GET/DELETE /v1/cache/:key), management HTTP (/health,/stats,/config,/dist/metrics,/cluster/*), and the inter-node dist HTTP. 12-factor configuration viaHYPERCACHE_*environment variables — same binary runs in Docker, k8s, and bare-metal. Graceful shutdown on SIGTERM/SIGINT runs Drain → API stop → HyperCache Stop with a 30 s deadline. JSON-formatted slog logger pre-bound withnode_id. Multi-stageDockerfilebuilds a distroless static image (gcr.io/distroless/static-debian12:nonroot). - 5-node local cluster compose at
docker-compose.cluster.yml— five hypercache-server nodes on a sharedhypercache-clusterDocker network, each knowing the other four as seeds, replication=3. Client APIs exposed on host ports 8081–8085, management HTTP on 9081–9085. Includes a smoke-test recipe in the server README. Phase D of the production-readiness work. HyperCache.DistDrain(ctx)convenience method in hypercache_dist.go — calls Drain on the underlying DistMemory backend when one is configured, no-op on in-memory / Redis backends. Lets the server binary trigger drain without type-asserting through the unexported backend field.
[0.5.0] — 2026-05-05¶
Security¶
- Fixed silent inbound auth bypass when
DistHTTPAuth.ClientSignwas set without a matching inbound verifier. Previously, a config ofDistHTTPAuth{ClientSign: hmacSign}flipped the internalconfiguredpredicate to true (causing the auto-client to sign outbound traffic), butverify()had no inbound material and silently allowed every request — so an operator wiring half of an HMAC scheme could end up with signed-out / open-in nodes that looked authenticated. The internal predicate is now split intoinboundConfigured()/ outbound-path checks, andNewDistMemoryrejects this shape at construction withsentinel.ErrInsecureAuthConfig. Operators who legitimately want signed-out / open-in deployments (e.g. inbound is gated by an L4 firewall or service mesh below this server) must opt in via the newDistHTTPAuth.AllowAnonymousInboundfield. All other configurations (Token-only,Token+ServerVerify,Token+ClientSign,ServerVerify-only) are unaffected. Reported by the post-tag security review; addressed before any v0.5.0 public announcement.
Added¶
DistHTTPAuth.AllowAnonymousInbound— explicit opt-in for asymmetric signed-out / open-in configurations.sentinel.ErrInsecureAuthConfig— surfaced fromNewDistMemorywhen the auth policy would silently disable inbound enforcement.
[0.4.3] — 2026-05-04¶
A modernization release. The headline themes:
- Eviction is now sharded by default for concurrency-friendly throughput.
- The distributed-memory backend (
DistMemory) gained body limits, TLS, bearer-token auth, lifecycle-context cancellation, and surfaced listener errors. - A typed wrapper (
Typed[T, V]) is available for compile-time type-safe access without the caller-side type assertions of the untyped API. - The legacy
pkg/cachev1 store and thelongbridgeapp/asserttest dependency are gone.
The full course-correction plan (Phase 0 baseline → Phase 6 file split, plus Phase 5a–5e DistMemory hardening) is in commit history. The two RFCs that informed the design decisions live under docs/rfcs/.
Breaking changes¶
pkg/cachev1 removed. All callers must usepkg/cache/v2.longbridgeapp/asserttest dependency removed. Tests now usestretchr/testify/require. Internal test code only — no impact on library consumers, but downstream contributors authoring tests against this codebase must userequire.sentinel.ErrMgmtHTTPShutdownTimeoutremoved.ManagementHTTPServer.Shutdownnow callsapp.ShutdownWithContextand returns the underlying ctx error directly. Callers comparing against the removed sentinel must switch toerrors.Is(err, context.DeadlineExceeded)or equivalent.- Sharded eviction is default-on (32 shards). Items no longer
evict in strict global LRU/LFU order — the algorithm operates
independently within each shard. Total capacity is honored within
±32 (one slot of slack per shard). Use
WithEvictionShardCount(1)to restore strict-global ordering at the cost of single-mutex contention. hypercache.godecomposed into 6 files (hypercache.go,hypercache_io.go,hypercache_eviction.go,hypercache_expiration.go,hypercache_dist.go,hypercache_construct.go). No public API change; third-party patches against line numbers in the prior single-file layout will not apply.ManagementHTTPServerconstructor order fix.WithMgmtReadTimeoutandWithMgmtWriteTimeoutpreviously mutated struct fields afterfiber.Newhad locked in the defaults — the options were silent no-ops. Construction order is now correct, so any code relying on the silent no-op (e.g., setting absurd values knowing they would be ignored) will see those values take effect.
Performance¶
Measurements on Apple M4 Pro, go test -bench, count=5, benchstat.
- Per-shard atomic
Count.BenchmarkConcurrentMap_Count: 53 → ~10 ns/op._CountParallel: 1181 → ~13 ns/op. Eliminates the lock-storm that previously serialized on a single mutex during eviction-loop count checks. - Sharded eviction algorithm (
pkg/eviction/sharded.go). Replaces the global eviction-algorithm mutex with 32 per-shard mutexes routed by the same hashConcurrentMapuses, so a key's data shard and eviction shard align (cache-locality on Set). iter.Seq2migration replacing channel-basedIterBuffered.BenchmarkConcurrentMap_All(renamed from_IterBuffered): 757µs → 26.5µs/op (-96.51%). Bytes/op: 1.73 MiB → 0 B/op. Allocs/op: 230 → 0. Eliminated 32 goroutines + 32 channels per iteration.- xxhash consolidation (
pkg/cache/v2/hash.go). Replaced inlined FNV-1a withxxhash.Sum64Stringfolded to 32 bits.BenchmarkConcurrentMap_GetShard: 10.07 → 3.46 ns/op (-65.63%). - Sharded item-aware eviction was tried and rejected per RFC 0001. The hypothesis (duplicate-map overhead is the bottleneck) was falsified — sharded contention dominates. Code removed; lessons preserved in the RFC for future contributors.
Features¶
hypercache.Typed[T, V]wrapper for compile-time type-safe cache access. Wraps an existingHyperCache[T]; multipleTypedviews can share one underlying cache over disjoint keyspaces. IncludesSet,Get,GetTyped(explicitErrTypeMismatch),GetWithInfo,GetOrSet,GetMultiple,Remove,Clear. See hypercache_typed.go and RFC 0002 Phase 1. Phase 2 (deepItem[V]generics) is v3 territory, conditional on adoption signal.WithDistHTTPLimits(DistHTTPLimits)option for the dist transport: serverBodyLimit/ReadTimeout/WriteTimeout/IdleTimeout/Concurrency, plus clientResponseLimit/ClientTimeout. Defaults: 16 MiB request/response body cap, 5 s read/write/client timeout, 60 s idle, fiber's 256 KiB concurrency cap. Partial overrides honored — zero fields inherit defaults.WithDistHTTPAuth(DistHTTPAuth)option for bearer-token auth on/internal/*and/health(Tokenfor the common case;ServerVerify/ClientSignhooks for JWT, mTLS-derived identity, HMAC, etc.). Constant-time token compare on the server side. The auto-created HTTP client signs every outgoing request with the same token. Mismatched-token peers are rejected with HTTP 401 (sentinel.ErrUnauthorized).- TLS support via
DistHTTPLimits.TLSConfig. The server wraps its listener withtls.NewListener; the auto-created HTTP client attaches the same*tls.Configto itsTransport.TLSClientConfigwith ALPN forced tohttp/1.1(fiber/fasthttp doesn't speak h2). Same*tls.Configconfigures both sides — operators applying it consistently across the cluster get encrypted intra-cluster traffic out of the box. Plaintext peers handshake-fail. - Dist server lifecycle context —
DistMemory.LifecycleContext()exposes a context derived from the constructor's that is canceled onStop(). Replaces the prior pattern where handlers captured the constructor'scontext.Background()and never observed cancellation. In-flight handlers and replica forwards seeDone()the momentStopis called. LastServeError()accessor on bothdistHTTPServerandManagementHTTPServer. Replaces the prior_ = serveErrpattern that silently swallowed listener-loop crashes — operators can now surface the failure to logs/alerts.Stop()goroutine-leak fix. BothdistHTTPServer.stopandManagementHTTPServer.Shutdownnow callapp.ShutdownWithContext(ctx)directly instead of wrappingapp.Shutdown()in a goroutine and racing it against ctx done (which leaked the goroutine when ctx fired first).- New sentinels:
sentinel.ErrTypeMismatch,sentinel.ErrUnauthorized.
Internal¶
Worth surfacing for contributors:
- v2 module layout is the file split listed under "Breaking changes" above — readability win, no API change.
- Test helpers introduced under
tests/:tests/dist_cluster_helper.go::SetupInProcessCluster[RF],tests/merkle_node_helper.go,pkg/backend/dist_memory_test_helpers.go::EnableHTTPForTest(build tagtest). - Lint discipline: 35
nolintdirectives total across the repo, each with a one-line justification. golangci-lint v2.12.1 runs clean with--build-tags test.
Removed¶
pkg/cachev1 (see "Breaking changes").longbridgeapp/asserttest dependency (see "Breaking changes").sentinel.ErrMgmtHTTPShutdownTimeout(see "Breaking changes").- Experimental
WithItemAwareEvictionoption /IAlgorithmItemAwareinterface /LRUItemAware/ShardedItemAwaretypes — landed briefly during the RFC 0001 spike, then torn out per the RFC's own discipline when the perf gate failed. The RFC document preserves the measurement and the lessons.
Unreleased: https://github.com/hyp3rd/hypercache/compare/v0.5.0...HEAD Released: 0.5.0