Building A Markdown Compiler Agent For Openapi Terraform Mappers
Written by
Nova Neural
I got obsessed with a very specific workflow bug: every time my team exported an OpenAPI spec into Terraform-friendly artifacts, the “glue” logic lived in hand-written Markdown docs and ad-hoc scripts. The docs were the source of truth—but the scripts were the ones people actually ran. That mismatch caused subtle drift: sometimes an endpoint got updated in OpenAPI, but the Markdown examples (and therefore the generated mappings) lagged behind.
So I built a domain-specific LLM pipeline that treats a particular kind of Markdown as an input language—specifically, fenced code blocks inside <!--openapi-to-terraform--> sections—and turns them into a deterministic Terraform mapping plan.
The main idea: don’t build a generic “chatbot.” Build a small “compiler” that understands a tiny, boring DSL embedded in Markdown, and only then let the LLM help with the messy bits (schema-to-type mapping, naming conventions, and example normalization).
What I built: a Markdown-to-Terraform compiler
My input Markdown looks like this:
- It contains sections like:
<!--openapi-to-terraform--> ## endpoint: GET /v1/widgets/{widgetId} resource: aws_lambda_function.widgets_get_widget request: path: widgetId: string response: status: 200 body: type: Widget schemaRef: components.schemas.Widget mappingRules: - name: widgetId from: path.widgetId as: var.widget_id
- The compiler:
- Extracts those sections.
- Parses the YAML payload under each
<!--openapi-to-terraform-->block. - Uses an LLM to resolve
schemaRefinto Terraform type expressions. - Emits a Terraform mapping file with stable output formatting.
The niche part is the Markdown compiler: most domain-specific LLM efforts stop at RAG (retrieval) or generic codegen. Mine treats Markdown as a formal-ish interface and compiles it like a program.
The “domain model” I taught the system
I needed an explicit contract so the LLM didn’t improvise.
Terms
- OpenAPI: an API specification format describing endpoints and JSON schemas.
- Terraform: infrastructure-as-code tool; it wants typed expressions like
string,map(string),object({ ... }), etc. - Domain DSL: the mini language I defined inside Markdown.
The strict input/output shape
The Markdown compiler always produces:
plan.tf.json(Terraform JSON form so it’s easy to validate)- a
diagnosticslist for any unresolved schemas
End-to-end code (working)
Below is a complete, runnable example in Python that:
- extracts the special Markdown sections
- parses the YAML inside them
- calls an LLM to resolve a schema reference into a Terraform type
- writes a Terraform plan JSON
This uses OpenAI’s API, but the structure works the same with any chat-completions provider.
1) Project layout
md-openapi-terraform-compiler/ main.py sample.md schemas.json requirements.txt
requirements.txt:
openai pyyaml pydantic
Install:
pip install -r requirements.txt
2) Sample Markdown input
sample.md:
# Release Notes <!--openapi-to-terraform--> ## endpoint: GET /v1/widgets/{widgetId} resource: aws_lambda_function.widgets_get_widget request: path: widgetId: string response: status: 200 body: type: Widget schemaRef: components.schemas.Widget mappingRules: - name: widgetId from: path.widgetId as: var.widget_id <!--openapi-to-terraform--> ## endpoint: POST /v1/widgets resource: aws_lambda_function.widgets_post_widget request: body: schemaRef: components.schemas.NewWidget response: status: 201 body: schemaRef: components.schemas.WidgetCreated mappingRules: - name: payload from: body as: var.new_widget
3) A tiny OpenAPI schemas “index”
I’m simulating the OpenAPI schema registry using a local JSON file that maps schema refs to a simplified JSON schema representation.
schemas.json:
{ "components.schemas.Widget": { "type": "object", "properties": { "id": { "type": "string" }, "name": { "type": "string" }, "tags": { "type": "array", "items": { "type": "string" } } }, "required": ["id", "name"] }, "components.schemas.NewWidget": { "type": "object", "properties": { "name": { "type": "string" }, "tags": { "type": "array", "items": { "type": "string" } } }, "required": ["name"] }, "components.schemas.WidgetCreated": { "type": "object", "properties": { "widget": { "$ref": "components.schemas.Widget" }, "createdAt": { "type": "string", "format": "date-time" } }, "required": ["widget", "createdAt"] } }
4) The compiler (main.py)
import json import re from typing import Any, Dict, List import yaml from openai import OpenAI OPENAPI_BLOCK_RE = re.compile( r"<!--openapi-to-terraform-->\s*(.*?)\s*(?=<!--openapi-to-terraform-->|$)", re.DOTALL, ) ENDPOINT_HEADER_RE = re.compile(r"^##\s*endpoint:\s*(.+?)\s*$", re.MULTILINE) def extract_blocks(markdown: str) -> List[str]: return [m.group(1).strip() for m in OPENAPI_BLOCK_RE.finditer(markdown)] def parse_block(block: str) -> Dict[str, Any]: """ Each block starts with: ## endpoint: GET /v1/widgets/{widgetId} then a YAML document with keys like resource, request, response, mappingRules. """ header_match = ENDPOINT_HEADER_RE.search(block) if not header_match: raise ValueError("Missing '## endpoint: ...' header in openapi-to-terraform block") endpoint = header_match.group(1).strip() # Remove the markdown header line so the rest is pure YAML block_wo_header = re.sub(r"^##\s*endpoint:\s*.+?\s*$", "", block, flags=re.MULTILINE).strip() data = yaml.safe_load(block_wo_header) or {} data["endpoint"] = endpoint return data def terraform_type_from_json_schema(schema: Dict[str, Any], max_depth: int = 5) -> str: """ This is a deterministic fallback. The LLM will handle tricky $ref cases, but we still want a local mapper. """ if max_depth <= 0: return "any" if "$ref" in schema: # Leave unresolved for the LLM. return "any" t = schema.get("type") if t == "string": fmt = schema.get("format") if fmt == "date-time": return "string" # Terraform doesn't have datetime types; keep as string return "string" if t == "integer": return "number" if t == "number": return "number" if t == "boolean": return "bool" if t == "array": items = schema.get("items", {}) return f"list({terraform_type_from_json_schema(items, max_depth=max_depth-1)})" if t == "object": props = schema.get("properties", {}) required = set(schema.get("required", [])) # Terraform object({ ... }) is strict about keys; we keep it permissive by making keys optional-like # via a map conversion. This avoids brittle plans for missing fields. # If you want strict objects, replace with object({ ... }). if not props: return "map(string)" fields = [] for k, v in props.items(): fields.append(f"{k}={terraform_type_from_json_schema(v, max_depth=max_depth-1)}") # Use object({ ... }) form return "object({" + ", ".join(fields) + "})" return "any" def resolve_schema_ref_with_llm(client: OpenAI, schemas_index: Dict[str, Any], schema_ref: str) -> Dict[str, str]: """ Ask the LLM to produce a Terraform type expression for a given OpenAPI schema ref. The LLM sees: - the schema_ref name - the referenced schema JSON - already-known rules Output is structured JSON with: - terraform_type - explanation (for diagnostics) """ if schema_ref not in schemas_index: return {"terraform_type": "any", "explanation": "schemaRef not found in index"} schema = schemas_index[schema_ref] fallback = terraform_type_from_json_schema(schema) system = ( "You are a domain-specific compiler for OpenAPI schemas into Terraform type expressions. " "Return only valid JSON that matches the output schema." ) user = { "schemaRef": schema_ref, "openapiSchema": schema, "deterministicFallback": fallback, "rules": [ "Map OpenAPI primitive types to Terraform: string->string, integer/number->number, boolean->bool.", "Map arrays to list(<itemType>).", "Map objects to object({key=type,...}). If schema has no properties, use map(string).", "If schema contains $ref, resolve it conceptually to the referenced schema type if possible; otherwise use any.", "Prefer correctness over strictness; using 'any' is acceptable when uncertain." ], "Output format": { "terraform_type": "string", "explanation": "string" } } # NOTE: replace `gpt-4.1-mini` with your preferred model resp = client.chat.completions.create( model="gpt-4.1-mini", messages=[ {"role": "system", "content": system}, {"role": "user", "content": json.dumps(user)} ], temperature=0.0, response_format={"type": "json_object"}, ) content = resp.choices[0].message.content return json.loads(content) def compile_markdown_to_terraform_plan(markdown_path: str, schemas_path: str, out_path: str, api_key: str): client = OpenAI(api_key=api_key) markdown = open(markdown_path, "r", encoding="utf-8").read() schemas_index = json.load(open(schemas_path, "r", encoding="utf-8")) blocks = extract_blocks(markdown) if not blocks: raise ValueError("No <!--openapi-to-terraform--> blocks found") compiled: Dict[str, Any] = { "version": 1, "mappings": [] } diagnostics: List[Dict[str, str]] = [] for block in blocks: data = parse_block(block) endpoint = data["endpoint"] resource = data.get("resource") request = data.get("request", {}) or {} response = data.get("response", {}) or {} mapping_entry: Dict[str, Any] = { "endpoint": endpoint, "resource": resource, "request": {}, "response": {}, "mappingRules": data.get("mappingRules", []) } # Resolve schemaRef for request body if present req_body = request.get("body") if isinstance(request, dict) else None if isinstance(req_body, dict) and "schemaRef" in req_body: schema_ref = req_body["schemaRef"] resolved = resolve_schema_ref_with_llm(client, schemas_index, schema_ref) mapping_entry["request"]["bodyTerraformType"] = resolved["terraform_type"] diagnostics.append({ "endpoint": endpoint, "schemaRef": schema_ref, "resolution": resolved["terraform_type"], "explanation": resolved["explanation"] }) # Path params are already simple primitives in this DSL if isinstance(request, dict) and "path" in request: mapping_entry["request"]["path"] = request["path"] # Resolve schemaRef for response body if present resp_body = response.get("body") if isinstance(response, dict) else None if isinstance(resp_body, dict) and "schemaRef" in resp_body: schema_ref = resp_body["schemaRef"] resolved = resolve_schema_ref_with_llm(client, schemas_index, schema_ref) mapping_entry["response"]["bodyTerraformType"] = resolved["terraform_type"] diagnostics.append({ "endpoint": endpoint, "schemaRef": schema_ref, "resolution": resolved["terraform_type"], "explanation": resolved["explanation"] }) compiled["mappings"].append(mapping_entry) plan = { "terraform": { # This is a lightweight representation. In real life you might generate # HCL or terraform-provider-specific resources. "mappingPlan": compiled }, "diagnostics": diagnostics } with open(out_path, "w", encoding="utf-8") as f: json.dump(plan, f, indent=2) print(f"Wrote {out_path}") if __name__ == "__main__": # Usage: # export OPENAI_API_KEY="..." # python main.py import os compile_markdown_to_terraform_plan( markdown_path="sample.md", schemas_path="schemas.json", out_path="plan.tf.json", api_key=os.environ["OPENAI_API_KEY"], )
What happens when I run it
export OPENAI_API_KEY="YOUR_KEY" python main.py
It reads sample.md, finds both <!--openapi-to-terraform--> blocks, and parses each YAML payload. Then it calls the model for each schemaRef it sees:
components.schemas.Widget→ generates a Terraform type for an object withid,name,tagscomponents.schemas.NewWidget→ generates Terraform type for request bodycomponents.schemas.WidgetCreated→ generates type for response body, possibly using a$refinside
Finally it writes plan.tf.json.
A typical output (shape) looks like:
{ "terraform": { "mappingPlan": { "version": 1, "mappings": [ { "endpoint": "GET /v1/widgets/{widgetId}", "resource": "aws_lambda_function.widgets_get_widget", "request": { "path": { "widgetId": "string" } }, "response": { "bodyTerraformType": "object({id=string, name=string, tags=list(string)})" }, "mappingRules": [ { "name": "widgetId", "from": "path.widgetId", "as": "var.widget_id" } ] } ] } }, "diagnostics": [ { "endpoint": "GET /v1/widgets/{widgetId}", "schemaRef": "components.schemas.Widget", "resolution": "object({id=string, name=string, tags=list(string)})", "explanation": "Mapped OpenAPI object properties to Terraform object attributes; array tags to list(string)." } ] }
Why this is “domain-specific” (and not generic codegen)
The LLM isn’t free-form. It’s constrained to answer one question:
“Given this specific
schemaRef, what Terraform type expression should I use?”
Meanwhile, the compiler logic is deterministic:
- Markdown extraction
- YAML parsing
- plan structure
- stable JSON output
That combination matters. In my experiments, letting the model “write the plan” directly led to unstable key ordering, inconsistent schemas, and occasional hallucinated fields. Keeping the compiler strict made the system reliable even when the LLM was uncertain (it falls back to any).
A small reliability upgrade I added: deterministic fallback
Notice the terraform_type_from_json_schema function. It generates a best-effort Terraform type without calling the LLM. I used it as:
- a baseline shown to the model (
deterministicFallback) - a guardrail for cases where the
schemaRefis missing
This pattern is a practical way to keep a domain-specific LLM grounded in local, testable logic.
Conclusion
I built a niche Domain-Specific LLM system that compiles a very specific Markdown DSL (<!--openapi-to-terraform--> blocks) into a Terraform mapping plan, using the LLM only for the schema-ref-to-typed-expression step. The big win from my tinkering was reliability: deterministic parsing and strict plan structure prevented drift, while the LLM handled the messy schema translation work.