洪 民憙 (Hong Minhee) :nonbinary:'s avatar
洪 民憙 (Hong Minhee) :nonbinary:

@hongminhee@hollo.social

I've been thinking about adding federation health monitoring to —not as a separate data store or custom API, but by extending the existing integration. The idea is to expose delivery outcomes, signature verification failures, and per-remote-host error rates as OpenTelemetry metrics alongside the spans Fedify already emits. If you already have a Prometheus or Grafana setup, you'd get federation observability basically for free. Circuit breaker behavior (temporarily skipping a remote server that's been consistently unreachable) could surface as OpenTelemetry events, keeping everything in the same trace context rather than scattered across separate logs.

Does this sound useful to you? I'm curious whether people building on Fedify—or running federated servers in general—would actually reach for this, and what kinds of things you'd most want to observe. Happy to hear any thoughts.

Julian Fietkau's avatar
Julian Fietkau

@julian@fietkau.social · Reply to Fedify: ActivityPub server framework's post

@fedify As a Mastodon server admin and user, I look at the Sidekiq diagnostic interface whenever I notice something is off – for example when I'm not seeing a post which I know should exist. I don't monitor connection health proactively. Maybe people who are admins for larger servers do that.

For Fedify, I might use something like you describe on rare occasions, and would accordingly see it as a nice to have feature, but lower priority.

Emelia's avatar
Emelia

@thisismissem@activitypub.space · Reply to Fedify: ActivityPub server framework's post

I think you may want circuit breakers as independent from observerability but this level of detail for observerability would still be good. Keep in mind at scale you usually can only do sampling of x% of events in OTel