Implementing WebSockets with AWS

2 hours ago 1

If you’ve ever built a stateful WebSocket server, moving that mental model onto AWS Lambda can feel… weird. Lambdas are stateless, short-lived, and concurrent, while a WebSocket feels like it wants a cozy long-running process with in-memory connection maps. The trick is to split “socket lifetime” from “compute lifetime.” Unlike a traditional WebSocket server (one URL you both read and write on), API Gateway + Lambda uses a dual-URL model:

  • Client WebSocket URL (wss://) — what the browser/app dials to start the socket, e.g. wss://<api-id>.execute-api.<region>.amazonaws.com/<stage> API Gateway owns this TCP/WebSocket connection and emits lifecycle events to your Lambda.

  • Management HTTPS endpoint (https://) — what your Lambda functions call to push frames to a specific connection, e.g. https://<domainName>/<stage>/@connections/{connectionId} You don’t “write to the socket”; you POST to this management endpoint.

From this dual-URL framing, a simple pattern falls out:

  • Store connection state externally: on $connect (DynamoDB is perfect for this).

  • Clean up on $disconnect: delete the record (or let TTL handle this).

  • Push by connectionId: when you need to send, query your registry for target connectionIds, then POST to https://…/@connections/{id} (not the wss:// URL).

Once you accept this separation of wss:// (client) and https:// (management), WebSockets on Lambda get pretty friendly—and you gain the ability for your Lambda functions to send data to connections that may have been established on a user machine long before the current invocation.

Three common “WebSockets on AWS” patterns

Before we proceed, just to note that what we’re covering in this post is API Gateway WebSocket API + Lambda. That’s one of several ways to “do WebSockets on AWS”:

  1. API Gateway WebSocket API + Lambda (managed connections, serverless compute) ← this guide
  2. Application Load Balancer (ALB) with WebSockets terminating on ECS/EKS/EC2 (you run long-lived processes)
  3. Managed real-time overlays that use WebSockets under the hood (e.g., AWS AppSync GraphQL subscriptions, AWS IoT Core over WebSockets)

The other two options are beyond the scope of this guide - but just be aware that there are indeed other options for implementing WebSockets on AWS.

WebSocket “routing” on API Gateway

With HTTP, routing to server-side handlers happens based on the endpoint you call (path + method). With WebSockets on API Gateway, routing happens based on the message payload you send over the connection.

API Gateway evaluates a route selection expression (commonly $request.body.action) on each incoming frame to pick the handler, then invokes your Lambda with requestContext.routeKey. Three routes are built-in: $connect (triggered automatically on new connection), $disconnect (triggered automatically when a client disconnects), and $default (triggered when no custom route matches). By "built-in," we mean these route keys are automatically recognized by API Gateway, but you still need to configure your Lambdas to handle them.

Custom route keys are driven by payload fields e.g. for a frame such as

{ "action": "sendMessage", "targetUserId": "u_456", "text": "hi" }

the $request.body.action selection expression would evaluate to route key "sendMessage", setting requestContext.routeKey to that value.

In any case, the built-in route keys $connect and $disconnect are the most critical to understanding the connection lifecycle pattern that we are trying to describe in this guide.

Core pattern

Think of a Connection Registry you control (e.g., DynamoDB):

  • Each live connection has a connectionId (API Gateway gives you this).
  • You can attach metadata: user ID, room/channel, device ID, TTL.
  • On $connectstore the connection record.
  • On $disconnectremove the record.
  • On messagequery the registry for target connectionIds, then send via the management endpoint for each.

Important: you don’t write back to the original wss://… URL the client used to open the socket. Instead, you call a separate HTTPS management endpoint—think of it as a hotline to that specific open connection on a user’s device.

This decoupling lets any Lambda invocation (now or later) reach connections that were established in the past, even if no handler is currently “holding” them.


Note that in the code that follows we are making the following design choices:

  • DynamoDB as the connection registry.
  • Go with the aws-sdk-go-v2.

AWS Resources required

These resources will need to be provisioned (e.g., via CloudFormation, CDK, Terraform, or the AWS Console):

  • DynamoDB table (e.g., ws_connections) with primary key (PK, SK) and GSI1 (GSI1PK, GSI1SK)
  • Lambda function (Go runtime) with environment variable CONNECTIONS_TABLE set, referring to the DDB table name (see above), and APIGW_MANAGEMENT_ENDPOINT set to the management endpoint URL that comes from having provisioned the API Gateway WebSocket API (see below).
  • API Gateway WebSocket API that points to the above Lambda for $connect, $disconnect, and any custom routes (e.g., broadcastAll).

Go (aws-sdk-go-v2) — handlers & helpers

We’ll wire a single Lambda to handle $connect, $disconnect, and a simple custom route called broadcastAll (purely to demo “query connection registry → send via management endpoint”). In practice, any Lambda—with access to the DynamoDB table and the management endpoint—can do the sending.

Module imports

package main import ( "context" "encoding/json" "fmt" "log/slog" "os" "strconv" "time" "errors" "github.com/aws/aws-lambda-go/events" "github.com/aws/aws-lambda-go/lambda" "github.com/aws/aws-sdk-go-v2/aws" "github.com/aws/aws-sdk-go-v2/config" apigwm "github.com/aws/aws-sdk-go-v2/service/apigatewaymanagementapi" "github.com/aws/aws-sdk-go-v2/service/dynamodb" "github.com/aws/aws-sdk-go-v2/service/dynamodb/types" )

Item model (our Connection Registry entry)

Remember: API Gateway gives us a connectionId. We store it externally (DynamoDB) with optional metadata and TTL so zombies are auto-reaped.

type ConnItem struct { PK string SK string ConnectionID string UserID string ConnectedAt string TTL *int64 }

Bootstrapping SDK clients

var ( log = slog.Default() ddbClient *dynamodb.Client connectionsTable string defaultTTL *int64 // nil if not set ) func init() { cfg, err := config.LoadDefaultConfig(context.Background()) if err != nil { panic(err) } ddbClient = dynamodb.NewFromConfig(cfg) connectionsTable = os.Getenv("CONNECTIONS_TABLE") if connectionsTable == "" { panic("CONNECTIONS_TABLE is required") } if v := os.Getenv("CONNECTION_TTL_SECONDS"); v != "" { if secs, err := strconv.ParseInt(v, 10, 64); err == nil && secs > 0 { t := time.Now().Add(time.Duration(secs) * time.Second).Unix() defaultTTL = &t } } }

One Lambda, three routes

API Gateway WebSocket invokes us with APIGatewayWebsocketProxyRequest. We switch on requestContext.RouteKey. (Recall from earlier: built-ins are "$connect", "$disconnect", "$default"; custom routes come from the route selection expression, commonly $request.body.action.)

func main() { lambda.Start(router) } func router(ctx context.Context, evt events.APIGatewayWebsocketProxyRequest) (events.APIGatewayProxyResponse, error) { switch evt.RequestContext.RouteKey { case "$connect": return onConnect(ctx, evt) case "$disconnect": return onDisconnect(ctx, evt) case "broadcastAll": // simple custom route to demo "query → send" return onBroadcastAll(ctx, evt) default: return onDefault(ctx, evt) } }

$connect — store connection record (the registry)

We persist the connectionId (derived from the event) + optional metadata. This is the split of socket vs. compute lifetimes: the socket (and associated connectionId) lives in API Gateway; our compute invocation is brief and just registers it.

func onConnect(ctx context.Context, evt events.APIGatewayWebsocketProxyRequest) (events.APIGatewayProxyResponse, error) { connID := evt.RequestContext.ConnectionID if connID == "" { return events.APIGatewayProxyResponse{StatusCode: 400}, fmt.Errorf("missing connectionId") } item := ConnItem{ PK: "CONN#" + connID, SK: "CONN", ConnectionID: connID, ConnectedAt: time.Now().UTC().Format(time.RFC3339), TTL: defaultTTL, // can be nil if you don't want TTL } av := map[string]types.AttributeValue{ "PK": &types.AttributeValueMemberS{Value: item.PK}, "SK": &types.AttributeValueMemberS{Value: item.SK}, "connectionId": &types.AttributeValueMemberS{Value: item.ConnectionID}, "connectedAt": &types.AttributeValueMemberS{Value: item.ConnectedAt}, } if item.TTL != nil { av["ttl"] = &types.AttributeValueMemberN{Value: strconv.FormatInt(*item.TTL, 10)} } _, err := ddbClient.PutItem(ctx, &dynamodb.PutItemInput{ TableName: aws.String(connectionsTable), Item: av, ConditionExpression: aws.String("attribute_not_exists(PK)"), // avoid accidental overwrites }) if err != nil { log.Error("put connection failed", "err", err, "connectionId", connID) return events.APIGatewayProxyResponse{StatusCode: 500}, nil } // We don't send a frame here. Returning 200 lets the handshake succeed. return events.APIGatewayProxyResponse{StatusCode: 200}, nil }

$disconnect — remove the record (or let TTL do it)

When API Gateway drops a socket, it triggers $disconnect. We mirror that by deleting the registry entry.

func onDisconnect(ctx context.Context, evt events.APIGatewayWebsocketProxyRequest) (events.APIGatewayProxyResponse, error) { connID := evt.RequestContext.ConnectionID if connID == "" { return events.APIGatewayProxyResponse{StatusCode: 400}, fmt.Errorf("missing connectionId") } _, err := ddbClient.DeleteItem(ctx, &dynamodb.DeleteItemInput{ TableName: aws.String(connectionsTable), Key: map[string]types.AttributeValue{ "PK": &types.AttributeValueMemberS{Value: "CONN#" + connID}, "SK": &types.AttributeValueMemberS{Value: "CONN"}, }, }) if err != nil { log.Error("delete connection failed", "err", err, "connectionId", connID) // Still return 200: the socket is gone regardless; worst case TTL reaps it. } return events.APIGatewayProxyResponse{StatusCode: 200}, nil }

$default — optional: ack unknown routes

Not essential, but helpful during development.

func onDefault(ctx context.Context, evt events.APIGatewayWebsocketProxyRequest) (events.APIGatewayProxyResponse, error) { log.Info("default route", "body", evt.Body) return events.APIGatewayProxyResponse{StatusCode: 200}, nil }

Sending messages (the management endpoint loop)

Key idea recap: you never write back to wss://…. You POST frames to https://…/@connections/{connectionId}. Any Lambda with the table name and management endpoint can:

  1. query the connection registry → 2) loop connectionIds → 3) POST each via the APIGW Management API.

Below we demo a trivial broadcastAll route that scans for all CONN#* items, then sends a simple payload to each connection.

Utility: create a management client for a specific endpoint

  • When invoked from a WebSocket route, you may derive the endpoint from the event (domain + stage).
  • When invoked from a non-route Lambda, pull APIGW_MANAGEMENT_ENDPOINT from env.
func managementClientFor(ctx context.Context, endpoint string) (*apigwm.Client, error) { cfg, err := config.LoadDefaultConfig(ctx) if err != nil { return nil, err } // aws-sdk-go-v2 lets us override the endpoint for the management API. client := apigwm.NewFromConfig(cfg, func(o *apigwm.Options) { o.EndpointResolver = apigwm.EndpointResolverFromURL(endpoint) }) return client, nil } // If you're inside a route handler and prefer deriving instead of env: func deriveMgmtEndpointFromEvent(evt events.APIGatewayWebsocketProxyRequest) string { return fmt.Sprintf("https://%s/%s", evt.RequestContext.DomainName, evt.RequestContext.Stage) }

Utility: query all current connections (toy example)

We only need the connectionId, so we project just the PK and peel off the "CONN#" prefix.

type Connection struct { ID string } func listAllConnections(ctx context.Context) ([]Connection, error) { var out []Connection var lastKey map[string]types.AttributeValue for { res, err := ddbClient.Scan(ctx, &dynamodb.ScanInput{ TableName: aws.String(connectionsTable), ExclusiveStartKey: lastKey, ProjectionExpression: aws.String("PK"), }) if err != nil { return nil, err } for _, it := range res.Items { pk := it["PK"].(*types.AttributeValueMemberS).Value if len(pk) > 5 && pk[:5] == "CONN#" { out = append(out, Connection{ID: pk[5:]}) } } if len(res.LastEvaluatedKey) == 0 { break } lastKey = res.LastEvaluatedKey } return out, nil }

Utility: send a JSON payload to one connection

func postJSON(ctx context.Context, mgmt *apigwm.Client, connectionID string, payload any) error { b, err := json.Marshal(payload) if err != nil { return err } _, err = mgmt.PostToConnection(ctx, &apigwm.PostToConnectionInput{ ConnectionId: aws.String(connectionID), Data: b, }) return err }

$broadcastAll — minimal “send to everyone we know”

This demonstrates the registry → loop → management POST pattern from the intro. We purposefully ignore semantics like rooms, auth, etc.

func onBroadcastAll(ctx context.Context, evt events.APIGatewayWebsocketProxyRequest) (events.APIGatewayProxyResponse, error) { // 1) Find everyone currently in the registry conns, err := listAllConnections(ctx) if err != nil { log.Error("scan connections failed", "err", err) return events.APIGatewayProxyResponse{StatusCode: 500}, nil } if len(conns) == 0 { return events.APIGatewayProxyResponse{StatusCode: 200}, nil } // 2) Resolve the management endpoint (could optionally derive from event instead) mgmtEndpoint := os.Getenv("APIGW_MANAGEMENT_ENDPOINT") mgmt, err := managementClientFor(ctx, mgmtEndpoint) if err != nil { log.Error("management client init failed", "err", err) return events.APIGatewayProxyResponse{StatusCode: 500}, nil } // 3) Minimal payload (anything JSON-serializable works) payload := map[string]any{ "type": "broadcast", "time": time.Now().UTC().Format(time.RFC3339Nano), "note": "hello from Lambda via the management endpoint", } // 4) Loop through connectionIds and POST the message for _, c := range conns { if err := postJSON(ctx, mgmt, c.ID, payload); err != nil { // If a connection is stale (410 Gone), remove it (mirrors `$disconnect` cleanup). var gone *apigwm.GoneException if errors.As(err, &gone) { _, _ = ddbClient.DeleteItem(ctx, &dynamodb.DeleteItemInput{ TableName: aws.String(connectionsTable), Key: map[string]types.AttributeValue{ "PK": &types.AttributeValueMemberS{Value: "CONN#" + c.ID}, "SK": &types.AttributeValueMemberS{Value: "CONN"}, }, }) continue } log.Error("post failed", "connectionId", c.ID, "err", err) } } return events.APIGatewayProxyResponse{StatusCode: 200}, nil }

Minimal client (conceptual)

On the client, you just dial the WebSocket URL (the one provisioned for your API/stage), e.g.:

wss://<api-id>.execute-api.<region>.amazonaws.com/<stage>

After $connect, your serverless side has the connectionId registered. Your Lambdas can now push frames using the management endpoint as we do in onBroadcastAll above.


Conclusion

The shift that makes WebSockets on Lambda click is the dual-URL mental model:

  • wss:// is the client’s lane (API Gateway owns the socket).
  • https://…/@connections/{id} is your lane to talk back (the management endpoint).

From that, the implementation falls out neatly:

  1. $connectstore connectionId (+ metadata, TTL, endpoint) in DynamoDB.
  2. $disconnectremove it (or let TTL reap).
  3. any Lambdaquery IDsPOST messages via the management endpoint.

Happy broadcasting! 🚀

Read Entire Article