Ralph and Software Engineering

I was probably living in a cave.

Very recently, while searching for spec driven development with AI coding agents, I discovered Ralph (Wiggum) as a Software Engineer. There is enough interesting write-up about it by Geoffrey Huntley and others so am not going to write more. If you care, just look at ralph loop, ralph playbook.

One of the key idea I noticed was the distinction between software development and engineering. For ralph or similar agentic coding technique to be useful and effective, the engineering has to be done right, for a new workflow, not a paradigm shift.

Spec Driven Development

The idea was born during my days as a platform engineer. What if API spec is the source of truth not just as a system contract but also for key concerns such as security, observability, availability and more. Specifically for security, I have seen what a beautiful mess it creates when everyone in the team have to worry about security. If they don’t, there is a security vulnerability.

Security

Let’s consider an example with protobuf and gRPC. Using custom options, we can declare security and operational metadata directly in the API spec.

import "google/protobuf/descriptor.proto";

enum AuthType {
  TOKEN = 0;
  API_KEY = 1;
  MTLS = 2;
  NONE = 3;
}

message RateLimit {
  int32 requests_per_second = 1;
  int32 burst = 2;
}

extend google.protobuf.MethodOptions {
  AuthType auth_type = 50001;
  string auth_scope = 50002;
  RateLimit rate_limit = 50003;
}

Define the request messages with input validation specs attached using buf validate:

import "buf/validate/validate.proto";

message OnboardUserRequest {
  string email = 1 [(buf.validate.field).string = {email: true}];
  string name = 2 [(buf.validate.field).string = {min_len: 1, max_len: 100}];
}

message OnboardUserResponse {
  // ...
}

The service definitions in turn declare its security properties.

service UserService {
  // Public API
  rpc OnboardUser(OnboardUserRequest) returns (OnboardUserResponse) {
    option (auth_type) = API_KEY;
    option (auth_scope) = "users:write";
    option (rate_limit) = { requests_per_second: 10, burst: 20 };
  }

  // Private API
  rpc GetUser(GetUserRequest) returns (GetUserResponse) {
    option (auth_type) = MTLS;
    option (auth_scope) = "users:read";
    option (rate_limit) = { requests_per_second: 1000, burst: 2000 };
  }
}

In this model, API specs and their security requirements are verifiable, just like code quality issues. All we need is a custom linter and we can determine if API specs match our standards, like:

Input validation rules must always be declared
All APIs are authenticated unless explicitly skipped
Default rate limit and quota for the API endpoint
Tags for logging and observability based on which service specific dashboards are built
[…]

The actual enforcement is more of an implementation detail, but the core idea is, I want all engineers working in the code base to think about what they need and not worry about how it is done. The how part is more of a platform engineering concern and is maintained independently.

This way, engineers working on large codebase can focus on their feature (business logic), while common concerns, especially non-functional aspects are taken care of independently.

Moving Fast

In the age of YOLO, I feel I am super slow. I still work with Claude Code in a way like am probably working with a 10 year old. Asking for small changes and reviewing every change. Is this a problem? Not if I am maintaining Linux kernel, Firefox, curl or similar software that requires stability over feature factory. But it sure is a problem if I am experimenting, building something new and validating with users.

My typical workflow involves:

Write the requirements, constraints and boundary in a single markdown file
Work with Claude Code (CC) in plan mode, iterate, till the plan feels right
Work with CC in small chunks, something that I can review before commit
Raise PR, review myself, review by team members and AI code review tools
Merge and ship

I want to ship fast. But not at the cost of obvious security vulnerabilities or an unmaintainable code base. When I think about it, software engineering estimation, velocity and management is an approximate and often incorrect guess than a repeatable science. That’s because typically writing professional software is hard, not because coding is hard but because of real-life challenges:

Code are rules, expressing desired behaviour and to constraint how a system works
Rules (code) accumulate on other rules, often causing violations and incompatibility
Developers writing code have implicit trust or assumption on other code
Bugs or security vulnerabilities happen when these assumptions are broken

Developing a large application is not hard. But maintaining a large application with its quality and security guarantees (read SLA/SLO) is hard because it requires the engineering discipline of guarantees, albeit on a best effort basis.

AI Coding Agents

LLMs are pretty good at generating code. But the quality and effectiveness depends on boundary. In my experience, if you give an unbounded problem to a human or LLM, you will likely have the same outcome. If software engineering is the art of breaking down a large problem, setting boundary to each sub-problem, why not do it with the AI coding agents, who can be the software developers in the team.

This is where I feel there is an intersection of spec driven development from the earlier platform engineering days to the YOLO era. That’s probably where Ralph seems like a very interesting approach, even if you don’t want to use it in a loop.

I will experiment more with it, but the mental model currently is to spend more time in writing specs, building common infra code (eg. AuthN/AuthZ lib, rate limiting, loggers, metrics etc.) that codifies my opinion of common concerns. Describe the APIs of these common concerns in markdown documents (instead of Confluence of the past).

The feature spec need to be very bounded to focus on a single business logic for AI coding agents in a given session so that it has higher likelihood of generating code within my established boundaries. The spec in turn, is translated to code which means, coding is now closer to a declarative experience. The spec is where I am declaring what I need and a non-deterministic system is producing the code. Given the non-deterministic nature of LLMs and in turn coding agents, the spec, now expressed as markdown becomes the new rules. But the system needs to be deterministic. This can be guaranteed (engineering) only when we hand-code the test suite that verifies the system as per the spec.

Other related specs, hand-coded libs and lower level infra code like AuthN/AuthZ, observability, logging (common concerns) are just plumbing to guide the non-deterministic system towards a desired behaviour and reduce its scope. When the system produces undesired results, either we treat it like pets and fix it ourselves or cattle, where we fix the spec and have it generate, hopefully, a more desired result. Likely both will be required. An experienced engineer will choose to do it manually for cross cutting concerns while re-generating for bounded, low risk features. The engineering here is to reduce the likelihood of undesired results while building the capability to deterministically identify undesired results.

Now that we see, why loops are important, as a means to continuously reduce non-deterministic behaviour of AI coding agents, we need feedback. Current feedback loop is typically at the PR and CI stage. This is too slow and does not get into the agent loop, which is why we need to engineer the right agent harness so that it can continue develop with spec and feedback in a loop.

Lot to experiment. Till next time.