Context
GPU nodes on AWS (g6, g5 and p4 families) cost between one and thirty dollars per hour. In the context of labs and demonstrations, keeping these resources active around the clock translates into hundreds or thousands of dollars in monthly spend. Manual shutdown works — until a weekend oversight produces a significant cost overrun.
Conventional alternatives (CloudWatch schedules, Karpenter, Spot) are either too rigid or assume continuous workloads. The actual need was simple: a web button the user activates at the start of a session and releases when finished.
Architecture
The panel is served as a static site from S3 behind CloudFront, communicating with a protected endpoint on API Gateway. The Lambda function behind the endpoint fulfills three responsibilities:
- Validation of a simple token (signed cookie or header).
- Call to EKS to scale the GPU nodegroup’s
desired-capacity. - Return of current state to the panel to reflect transitions (starting, stopping, ready).
The frontend polls the status endpoint every few seconds, providing a near real-time user experience.
Stack rationale
- No server to maintain: the S3 + CloudFront + Lambda combination is 100% serverless and runs at literal cents per month at idle.
- CloudFront provides TLS, caching and adequate latency without the need for a load balancer.
- API Gateway isolates the Lambda and offers a single point to apply rate limiting or WAF when required.
- IAM grants the Lambda permission exclusively over the target nodegroup: any eventual compromise of the function limits the blast radius to a single resource, not the entire account.
Decisions that added value
State in the cluster, not in a database
Rather than maintaining an external table with each nodegroup’s state, the system queries EKS directly for desired-capacity. The source of truth resides in a single location, fully eliminating drift issues.
Automatic shutdown with timeout
An optional parameter enables starting the node “for two hours”. The Lambda schedules an EventBridge task that shuts down the resource if the user forgets to do so manually. A minor design decision that consistently translates into savings.
Deliberately simple authentication
For a personal or small-team panel, OAuth is over-engineered. A shared secret signed in a cookie is sufficient. The design contemplates a future migration to Cognito or IAM Identity Center without requiring modifications to the rest of the architecture.
Reuse
I currently apply this pattern to the FortiAIGate lab, but the code is conceived as a template: replacing the target nodegroup and branding extends its use to FortiEDR, malware analysis sandboxes or any GPU workload consumed in sessions.
Next steps
I am consolidating the panel into a Terraform module that takes the cluster and nodegroup as input and automatically provisions S3, CloudFront, API Gateway, Lambda and IAM. The goal is to enable any engineer to incorporate a “GPU power switch” into their EKS deployment with a single terraform apply.