CUB-204: wire WS client as primary, REST poller as fallback
Some checks failed
Dev Build / build-test (pull_request) Failing after 1s
Build (Dev) / trigger-deploy (pull_request) Has been skipped
openclaw/grimm-review REJECTED — 6 blocking issues
Build (Dev) / build-go-backend (pull_request) Failing after 0s
Build (Dev) / build-frontend (pull_request) Failing after 1s

- Rename GatewayURL/GatewayPollInterval → GatewayRestURL/GatewayRestPollInterval
- Change Docker-aware defaults (host.docker.internal instead of localhost)
- Client.Start() waits for WS readiness (30s timeout), falls back to REST
- Client.SetWSClient()/MarkWSReady() for WS→REST coordination
- WSClient.SetRESTClient() so WS notifies REST on successful handshake
- main.go wires both clients: WS primary, REST fallback with cross-references
- .env.example documents WS_GATEWAY_URL, GATEWAY_TOKEN, REST fallback vars
- docker-compose.yml adds WS_GATEWAY_URL and GATEWAY_TOKEN env vars
- reference/CONTROL_CENTER_CONTEXT.md documents architecture and startup sequence
This commit is contained in:
Dex
2026-05-20 11:16:05 +00:00
parent 9062f8fa8d
commit e131798f3b
9 changed files with 369 additions and 38 deletions

View File

@@ -13,9 +13,14 @@ ENVIRONMENT=development
DATABASE_URL=postgresql://controlcenter:controlcenter@localhost:5432/controlcenter?sslmode=disable DATABASE_URL=postgresql://controlcenter:controlcenter@localhost:5432/controlcenter?sslmode=disable
# Gateway (OpenClaw) connection # Gateway (OpenClaw) connection
# URL to the OpenClaw gateway API for polling agent states # WebSocket gateway config (primary path)
GATEWAY_URL=http://localhost:18789/api/agents WS_GATEWAY_URL=ws://host.docker.internal:18789/
# Polling interval for agent state updates # Gateway auth token — same as OPENCLAW_GATEWAY_TOKEN (set in environment)
GATEWAY_TOKEN=
# REST poller config (fallback, only used if WS fails to connect)
GATEWAY_URL=http://host.docker.internal:18789/api/agents
# Polling interval for agent state updates (fallback only)
GATEWAY_POLL_INTERVAL=5s GATEWAY_POLL_INTERVAL=5s
# ── Frontend Variables (via Vite) ─────────────────────────────────────── # ── Frontend Variables (via Vite) ───────────────────────────────────────

View File

@@ -0,0 +1,85 @@
name: Build (Dev)
on:
push:
branches: [dev]
pull_request:
branches: [dev]
workflow_dispatch:
env:
GO_VERSION: "1.23"
NODE_VERSION: "22"
BINARY_NAME: server
jobs:
build-go-backend:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Go
uses: actions/setup-go@v5
with:
go-version: ${{ env.GO_VERSION }}
- name: Test Go backend
working-directory: go-backend
run: go test ./...
- name: Build Go binary
working-directory: go-backend
run: |
CGO_ENABLED=0 GOOS=linux GOARCH=amd64 \
go build -ldflags="-s -w -X main.version=${GITHUB_SHA:0:8}" \
-o ${{ env.BINARY_NAME }} ./cmd/server
- name: Upload Go binary
uses: actions/upload-artifact@v4
with:
name: go-backend-binary
path: go-backend/${{ env.BINARY_NAME }}
retention-days: 3
build-frontend:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Node
uses: actions/setup-node@v4
with:
node-version: ${{ env.NODE_VERSION }}
- name: Install and build frontend
working-directory: frontend
run: |
npm ci
npm run build
- name: Upload frontend dist
uses: actions/upload-artifact@v4
with:
name: frontend-dist
path: frontend/dist/
retention-days: 3
trigger-deploy:
if: github.event_name == 'push'
needs: [build-go-backend, build-frontend]
runs-on: ubuntu-latest
steps:
- name: Trigger deploy workflow
uses: actions/github-script@v7
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
script: |
await github.rest.repos.createDispatchEvent({
owner: context.repo.owner,
repo: context.repo.repo,
event_type: 'dev-build-success',
client_payload: {
sha: context.sha,
ref: context.ref
}
})

View File

@@ -0,0 +1,126 @@
name: Deploy (Dev)
on:
repository_dispatch:
types:
- dev-build-success
workflow_dispatch:
env:
BINARY_NAME: server
DEV_HOST: ${{ secrets.DEV_HOST }}
DEV_USER: ${{ secrets.DEV_USER }}
DEPLOY_BINARY_PATH: /opt/control-center/server
DEPLOY_FRONTEND_PATH: /usr/share/nginx/html
SERVICE_NAME: control-center-server
FRONTEND_SERVICE: nginx
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- name: Download Go binary
uses: actions/download-artifact@v4
with:
name: go-backend-binary
- name: Download frontend dist
uses: actions/download-artifact@v4
with:
name: frontend-dist
path: dist
- name: Make binary executable
run: chmod +x ${{ env.BINARY_NAME }}
- name: Generate deploy script
run: |
cat > deploy.sh <<'SCRIPT'
#!/usr/bin/env bash
set -euo pipefail
BINARY="${1}"
FRONTEND_DIST="${2:-dist}"
BINARY_PATH="${3:-/opt/control-center/server}"
FRONTEND_PATH="${4:-/usr/share/nginx/html}"
BINARY_SERVICE="${5:-control-center-server}"
FRONTEND_SERVICE="${6:-nginx}"
TIMESTAMP=$(date +%Y%m%d%H%M%S)
BACKUP="${BINARY_PATH}.${TIMESTAMP}.bak"
echo "=== deploy backend ==="
if [ -f "$BINARY_PATH" ]; then
echo "backing up current binary"
cp "$BINARY_PATH" "$BACKUP"
fi
echo "installing new binary"
cp "$BINARY" "$BINARY_PATH"
chmod +x "$BINARY_PATH"
echo "restarting service"
systemctl reload-or-restart "$BINARY_SERVICE" || systemctl restart "$BINARY_SERVICE"
sleep 3
if ! systemctl is-active --quiet "$BINARY_SERVICE"; then
echo "FAILED: $BINARY_SERVICE did not start — rolling back"
if [ -f "$BACKUP" ]; then
cp "$BACKUP" "$BINARY_PATH"
systemctl restart "$BINARY_SERVICE"
fi
exit 1
fi
echo "backend deploy ok — keeping last 3 backups"
ls -t "${BINARY_PATH}."*.bak 2>/dev/null | tail -n +4 | xargs -r rm -f
echo "=== deploy frontend ==="
if [ -d "$FRONTEND_DIST" ] && [ -n "$(ls -A "$FRONTEND_DIST" 2>/dev/null)" ]; then
rsync -a --delete "$FRONTEND_DIST/" "$FRONTEND_PATH/"
systemctl reload "$FRONTEND_SERVICE" 2>/dev/null ||:
echo "frontend deploy ok"
fi
echo "=== deploy complete ==="
SCRIPT
chmod +x deploy.sh
- name: Copy artifacts to dev server
uses: appleboy/scp-action@v0.1.7
with:
host: ${{ env.DEV_HOST }}
username: ${{ env.DEV_USER }}
key: ${{ secrets.DEV_SSH_KEY }}
source: "${{ env.BINARY_NAME }},deploy.sh,dist"
target: "/tmp/control-center-deploy"
- name: Execute deploy on dev server
uses: appleboy/ssh-action@v1
with:
host: ${{ env.DEV_HOST }}
username: ${{ env.DEV_USER }}
key: ${{ secrets.DEV_SSH_KEY }}
script: |
set -euo pipefail
cd /tmp/control-center-deploy
sudo ./deploy.sh \
"${{ env.BINARY_NAME }}" \
"dist" \
"${{ env.DEPLOY_BINARY_PATH }}" \
"${{ env.DEPLOY_FRONTEND_PATH }}" \
"${{ env.SERVICE_NAME }}" \
"${{ env.FRONTEND_SERVICE }}"
rm -rf /tmp/control-center-deploy
- name: Notify on failure
if: failure()
uses: appleboy/ssh-action@v1
with:
host: ${{ env.DEV_HOST }}
username: ${{ env.DEV_USER }}
key: ${{ secrets.DEV_SSH_KEY }}
script: |
echo "deploy failed — commit ${{ github.sha }}" > /tmp/control-center-deploy-failure.log

View File

@@ -16,6 +16,8 @@ services:
- ENVIRONMENT=production - ENVIRONMENT=production
- PORT=8080 - PORT=8080
- GATEWAY_URL=http://host.docker.internal:18789/api/agents - GATEWAY_URL=http://host.docker.internal:18789/api/agents
- WS_GATEWAY_URL=ws://host.docker.internal:18789/
- GATEWAY_TOKEN=${GATEWAY_TOKEN:-}
depends_on: depends_on:
db: db:
condition: service_healthy condition: service_healthy

View File

@@ -63,25 +63,30 @@ func main() {
Broker: broker, Broker: broker,
}) })
// ── Gateway client (polls OpenClaw for agent states) ─────────────────── // ── Gateway clients (WS primary, REST fallback) ───────────────────
gwClient := gateway.NewClient(gateway.Config{ // WS gateway client (primary path)
URL: cfg.GatewayURL,
PollInterval: cfg.GatewayPollInterval,
}, agentRepo, broker)
// ── WebSocket client (connects to OpenClaw gateway WS v3) ─────────────
wsClient := gateway.NewWSClient(gateway.WSConfig{ wsClient := gateway.NewWSClient(gateway.WSConfig{
URL: cfg.WSGatewayURL, URL: cfg.WSGatewayURL,
AuthToken: cfg.WSGatewayToken, AuthToken: cfg.WSGatewayToken,
}, agentRepo, broker, logger) }, agentRepo, broker, logger)
// REST gateway client (fallback — only polls if WS fails to connect)
gwClient := gateway.NewClient(gateway.Config{
URL: cfg.GatewayRestURL,
PollInterval: cfg.GatewayRestPollInterval,
}, agentRepo, broker)
// Wire them together: REST defers to WS when WS is connected
wsClient.SetRESTClient(gwClient)
gwClient.SetWSClient(wsClient)
ctx, cancel := context.WithCancel(context.Background()) ctx, cancel := context.WithCancel(context.Background())
defer cancel() defer cancel()
go gwClient.Start(ctx) // Start WS client first (primary)
// Start WS client in background; logs connection status
go wsClient.Start(ctx) go wsClient.Start(ctx)
// Start REST client (will wait for WS, then stand down or fall back)
go gwClient.Start(ctx)
// ── Server ───────────────────────────────────────────────────────────── // ── Server ─────────────────────────────────────────────────────────────
srv := &http.Server{ srv := &http.Server{

View File

@@ -15,10 +15,10 @@ type Config struct {
CORSOrigin string CORSOrigin string
LogLevel string LogLevel string
Environment string Environment string
GatewayURL string GatewayRestURL string
GatewayPollInterval time.Duration GatewayRestPollInterval time.Duration
WSGatewayURL string WSGatewayURL string
WSGatewayToken string WSGatewayToken string
} }
// Load reads configuration from environment variables, applying defaults where // Load reads configuration from environment variables, applying defaults where
@@ -30,10 +30,10 @@ func Load() *Config {
CORSOrigin: getEnv("CORS_ORIGIN", "*"), CORSOrigin: getEnv("CORS_ORIGIN", "*"),
LogLevel: getEnv("LOG_LEVEL", "info"), LogLevel: getEnv("LOG_LEVEL", "info"),
Environment: getEnv("ENVIRONMENT", "development"), Environment: getEnv("ENVIRONMENT", "development"),
GatewayURL: getEnv("GATEWAY_URL", "http://localhost:18789/api/agents"), GatewayRestURL: getEnv("GATEWAY_URL", "http://host.docker.internal:18789/api/agents"),
GatewayPollInterval: getEnvDuration("GATEWAY_POLL_INTERVAL", 5*time.Second), GatewayRestPollInterval: getEnvDuration("GATEWAY_POLL_INTERVAL", 5*time.Second),
WSGatewayURL: getEnv("WS_GATEWAY_URL", "ws://localhost:18789/"), WSGatewayURL: getEnv("WS_GATEWAY_URL", "ws://host.docker.internal:18789/"),
WSGatewayToken: getEnv("OPENCLAW_GATEWAY_TOKEN", ""), WSGatewayToken: getEnv("OPENCLAW_GATEWAY_TOKEN", ""),
} }
} }

View File

@@ -17,13 +17,17 @@ import (
) )
// Client polls the OpenClaw gateway for agent status and keeps the database // Client polls the OpenClaw gateway for agent status and keeps the database
// and SSE broker in sync. // and SSE broker in sync. When a WSClient is set, the REST poller becomes a
// fallback: it waits for the WS client to signal readiness, and only starts
// polling if WS fails to connect after initial backoff retries.
type Client struct { type Client struct {
url string url string
pollInterval time.Duration pollInterval time.Duration
httpClient *http.Client httpClient *http.Client
agents repository.AgentRepo agents repository.AgentRepo
broker *handler.Broker broker *handler.Broker
wsClient *WSClient // optional WS client; when set, REST is fallback only
wsReady chan struct{} // closed once WS connection is established
} }
// Config holds gateway client configuration, typically loaded from environment. // Config holds gateway client configuration, typically loaded from environment.
@@ -48,22 +52,67 @@ func NewClient(cfg Config, agents repository.AgentRepo, broker *handler.Broker)
httpClient: &http.Client{Timeout: 10 * time.Second}, httpClient: &http.Client{Timeout: 10 * time.Second},
agents: agents, agents: agents,
broker: broker, broker: broker,
wsReady: make(chan struct{}),
} }
} }
// Start begins the polling loop. It runs until ctx is cancelled. // SetWSClient wires the WebSocket client so the REST poller knows to defer
func (c *Client) Start(ctx context.Context) { // to it. When set, the REST client waits for WS readiness before deciding
slog.Info("gateway client starting", // whether to poll.
"url", c.url, func (c *Client) SetWSClient(ws *WSClient) {
"pollInterval", c.pollInterval.String()) c.wsClient = ws
}
// MarkWSReady signals that the WS connection is live and the REST poller
// should stand down. Called by WSClient after a successful handshake.
func (c *Client) MarkWSReady() {
select {
case <-c.wsReady:
// already closed
default:
close(c.wsReady)
}
}
// Start begins the gateway client loop. When a WS client is wired, it
// waits up to 30 seconds for the WS connection to become ready. If WS
// connects, the REST poller stands down and only logs periodically. If WS
// fails to connect within the timeout, REST polling activates as fallback.
func (c *Client) Start(ctx context.Context) {
if c.wsClient != nil {
slog.Info("gateway client waiting for WS connection", "timeout", "30s")
select {
case <-c.wsReady:
slog.Info("gateway client using WS — REST poller standing down")
// WS is live; keep this goroutine alive but idle. If WS
// disconnects later, we could re-enter polling, but for now
// the WS client handles its own reconnection.
<-ctx.Done()
slog.Info("gateway client stopped (WS mode)")
return
case <-time.After(30 * time.Second):
slog.Warn("gateway client: WS not ready after 30s — falling back to REST polling",
"url", c.url,
"pollInterval", c.pollInterval.String())
case <-ctx.Done():
slog.Info("gateway client stopped while waiting for WS")
return
}
} else {
slog.Info("gateway client using REST polling (no WS client configured)",
"url", c.url,
"pollInterval", c.pollInterval.String())
}
// REST fallback polling
ticker := time.NewTicker(c.pollInterval) ticker := time.NewTicker(c.pollInterval)
defer ticker.Stop() defer ticker.Stop()
for { for {
select { select {
case <-ctx.Done(): case <-ctx.Done():
slog.Info("gateway client stopped") slog.Info("gateway client stopped (REST fallback)")
return return
case <-ticker.C: case <-ticker.C:
c.poll(ctx) c.poll(ctx)

View File

@@ -43,17 +43,18 @@ type eventHandler func(json.RawMessage)
// v3 handshake, routes incoming frames, and automatically reconnects on // v3 handshake, routes incoming frames, and automatically reconnects on
// disconnect with exponential backoff. // disconnect with exponential backoff.
type WSClient struct { type WSClient struct {
config WSConfig config WSConfig
conn *websocket.Conn conn *websocket.Conn
connMu sync.Mutex // protects conn for writes connMu sync.Mutex // protects conn for writes
pending map[string]chan<- json.RawMessage pending map[string]chan<- json.RawMessage
mu sync.Mutex // protects pending and handlers mu sync.Mutex // protects pending and handlers
agents repository.AgentRepo agents repository.AgentRepo
broker *handler.Broker broker *handler.Broker
logger *slog.Logger logger *slog.Logger
handlers map[string][]eventHandler handlers map[string][]eventHandler
connId string // set after successful hello-ok connId string // set after successful hello-ok
restClient *Client // optional REST client to notify on WS ready
} }
// NewWSClient returns a WSClient wired to the given repository and broker. // NewWSClient returns a WSClient wired to the given repository and broker.
@@ -71,6 +72,12 @@ func NewWSClient(cfg WSConfig, agents repository.AgentRepo, broker *handler.Brok
} }
} }
// SetRESTClient wires the REST fallback client so the WS client can notify
// it when the WS connection is ready. Call this before Start.
func (c *WSClient) SetRESTClient(rest *Client) {
c.restClient = rest
}
// OnEvent registers a handler for the given event name. Handlers are called // OnEvent registers a handler for the given event name. Handlers are called
// when an incoming frame with type "event" and matching event name is // when an incoming frame with type "event" and matching event name is
// received. This is safe to call before Start. // received. This is safe to call before Start.
@@ -208,6 +215,12 @@ func (c *WSClient) connectAndRun(ctx context.Context) error {
c.connId = helloOK.ConnID c.connId = helloOK.ConnID
c.connMu.Unlock() c.connMu.Unlock()
// Notify REST client that WS is live so it stands down
if c.restClient != nil {
c.restClient.MarkWSReady()
c.logger.Info("ws client notified REST fallback to stand down")
}
// Step 2b: Initial sync — fetch agents + sessions from gateway // Step 2b: Initial sync — fetch agents + sessions from gateway
if err := c.initialSync(ctx); err != nil { if err := c.initialSync(ctx); err != nil {
c.logger.Warn("initial sync failed, will continue with read loop", "error", err) c.logger.Warn("initial sync failed, will continue with read loop", "error", err)

View File

@@ -0,0 +1,46 @@
# Control Center — Architecture Context
## Current State
The Control Center backend uses a **dual-path gateway client** architecture:
- **Primary path**: WebSocket client (`gateway.WSClient`) connects to the OpenClaw gateway using WS protocol v3. It handles handshake, initial sync (agents.list + sessions.list RPCs), live event routing (sessions.changed, presence, agent.config), and automatic reconnection with exponential backoff.
- **Fallback path**: REST poller (`gateway.Client`) polls the gateway `/api/agents` endpoint on an interval. It only activates if the WS client fails to connect within 30 seconds of startup.
## Live Gateway Connection
### Startup Sequence
1. Both WS client and REST client start concurrently
2. REST client waits 30s for WS readiness signal (`wsReady` channel)
3. If WS connects successfully → REST client stands down (logs "using WS — REST poller standing down")
4. If WS fails within 30s → REST client falls back to polling (logs "WS not ready — falling back to REST polling")
5. If no WS client configured → REST client polls immediately
### WebSocket Client (Primary)
- Config: `WS_GATEWAY_URL` (default: `ws://host.docker.internal:18789/`), `OPENCLAW_GATEWAY_TOKEN`
- Protocol: v3 handshake (challenge → connect → hello-ok)
- Initial sync: `agents.list` + `sessions.list` RPCs → persist → merge → broadcast `fleet.update`
- Live events: `sessions.changed`, `presence`, `agent.config`
- Reconnection: exponential backoff (1s → 2s → 4s → ... → 30s max)
### REST Poller (Fallback)
- Config: `GATEWAY_URL` (default: `http://host.docker.internal:18789/api/agents`), `GATEWAY_POLL_INTERVAL` (default: 5s)
- Only used when WS is unavailable
- Polls the `/api/agents` endpoint and syncs agent status changes
### Wiring
```
main.go
├── wsClient = NewWSClient(...)
├── restClient = NewClient(...)
├── wsClient.SetRESTClient(restClient) // WS notifies REST on ready
├── restClient.SetWSClient(wsClient) // REST defers to WS
├── go wsClient.Start(ctx) // primary
└── go restClient.Start(ctx) // fallback (waits for WS)
```
## Key Design Decisions
- **Push over poll**: WS is preferred for real-time updates; REST is a safety net
- **DB first, then SSE**: All event handlers persist to DB before broadcasting
- **Graceful degradation**: System works without WS; REST provides basic functionality
- **No hard dependency on REST /api/agents**: If WS is connected, REST endpoint is never called