From Vibe Coding to Building Real Infrastructure with AI Agents

Vibe coding was hard to ignore in 2025. The idea was simple and seductive. You describe intent, and autonomous coding agents translate that intent into working software. No setup overhead, no boilerplate, no ceremony. Just momentum. Watching people spin up applications by talking to an AI felt less like a productivity improvement and more like a shift that challenged some very old assumptions about how software gets built.

As someone who works in product, that kind of shift is difficult to observe from the sidelines. It forces you to engage, if only to recalibrate your own mental models. So I did what most builders did. I tried it.

I started with the obvious use cases. Frontend mockups. Small prototypes. Short-lived experiments where ideas turned into clickable interfaces in minutes. The feedback loop was immediate and visually satisfying. It was easy to see why the excitement was real. For a while, it genuinely felt like friction had disappeared from the act of building.

But as the novelty wore off and I started pushing hard to get to the next level, a question kept resurfacing. Could this approach hold up when the problem was not a demo or a prototype, but a system that had to run continuously, evolve over time, and survive failure? The real question wasn’t whether AI agents could generate code. It was whether they could help build systems that survive contact with reality. Production systems do not fail because syntax is wrong. They fail because judgment is wrong. I wanted to understand whether AI meaningfully expands who can build such systems, without eliminating the need for engineering expertise itself.

In other words, could vibe coding or AI agents help build serious infrastructure, or was vibe coding fundamentally optimized for disposable software? Can AI-generated code be used in production, or do we no longer need engineers, i.e., can we expect a business person to now code?

These questions eventually pushed me to stop building toy apps/mockups altogether, and I started thinking about building something serious during the Thanksgiving timeframe last year (2025).

Choosing a problem that would not forgive shortcuts

I come from a computer science background, but I have not written serious production code in more than a decade. My work since then has been centered on the business and strategy side, thinking about systems, constraints, incentives, scale, and product–market fit. I spend far more time thinking about why systems should exist than about how individual functions are written.

My interest in vibe coding was never about becoming an engineer. It was about leverage. I wanted to understand whether someone who still understands systems and products, but no longer lives in code every day, could build something real with AI agents.

To answer that honestly, I needed a project where shortcuts would be exposed immediately. Infrastructure fit that requirement perfectly. Apps are forgiving. You can hide a lot of bad decisions behind a UI. Infrastructure is not. If permissions are wrong, things break. If storage assumptions are flawed, data disappears. If networking is misunderstood, nothing talks to anything. Also, app demos are easy to build, but building an auth system takes more time/energy.

Instead of building another app, I decided to build a private cloud.

Introducing ‘MyNodeOne’ – the repo I built

What I ended up building was MyNodeOne, a personal infrastructure cluster that I actually use.

This was not a hello-world Kubernetes setup. The goal was to build a system that could host applications, expose them securely to the internet, and evolve over time without collapsing under its own complexity. Conceptually, it sits closer to a personal platform-as-a-service than a homelab experiment.

At its core, MyNodeOne is a Kubernetes-based private cloud running on my own hardware. It handles node provisioning, cluster joining, internal networking, storage orchestration, and secure service exposure. On top of that sits an internal AppStore model, a curated set of installable services such as LLM chat interfaces, APIs, and infrastructure components that can be deployed with a single command.

What makes this meaningful is not that it works on a happy path. It is designed to survive change. Adding nodes, evolving storage layers, shifting workloads, and running continuously without manual babysitting are first-order concerns. That is the line between a demo and infrastructure.

The full repository, including documentation and commit history, lives here:
https://github.com/vinsac/MyNodeOne

To make the experiment real, I deliberately removed cloud safety nets.

I built two physical machines at home with 256 GB of RAM, roughly 80 TB of storage, and dual RTX 3090 GPUs. This was not about benchmarks or hardware flexing. It was about forcing myself to deal with the entire stack, from metal to application, and trying to see how I can integrate full metal into an infrastructure with AI agents in a repeatable way, i.e., letting anyone use this repo to set up their infra.

When something broke, there was no managed service to blame and no abstraction to hide behind. The system either worked or it didn’t. That constraint shaped every design decision and surfaced a simple truth early on. Infrastructure engineering has physics. AI assistance does not remove those constraints. It just helps you encounter them faster.

Working with AI agents as a system, not a tool

Over the course of the project, I did not commit to a single AI tool. I used Windsurf, Cursor, Claude Code, Anti-Gravity, and cycled through models from Anthropic, OpenAI, Gemini, xAI, and IDE-tuned proprietary models.

What stood out was how different their strengths were. Some were excellent at producing large volumes of code quickly. Others were far better at reasoning through architectural tradeoffs and explaining why a system should be designed a certain way. ChatGPT became particularly valuable for system-level thinking and design discussions. Claude’s models were consistently effective at translating intent into working code.

What emerged over time was a very different workflow. I was not “using AI” as a single capability. I was composing different cognitive strengths and applying them deliberately. (i.e., Hey AI assistant- go write code for me for feature A, did not result in a great outcome unless I was deep into the problem space/context with AI). That distinction matters, especially as a product person; it is not about replacing engineers, but about amplifying judgment and increasing speed.

When speed starts to lie

Early progress felt intoxicating. I was writing more code in weeks than I had in years. Services came online quickly. The system worked, at least on the surface.

That speed hid a trap.

Whenever I trusted the AI blindly, I paid for it later. Not because the code was sloppy, but because it optimized locally without understanding the larger system. Hardcoded assumptions crept in where configuration should have existed. Shortcuts limited extensibility. Decisions were made for what worked immediately, not for what would survive change.

The issue was not intelligence. It was context. The AI could not infer architectural intent unless I explicitly explained it. I am not against the speed of code generation, but who is the judge of what system needs to be built and what optimizations are needed?

Speed hides architectural decisions. A foundation designed for a two-story building can look perfectly correct until you decide to add eight more floors. Code generation can be fast, but someone still has to decide what kind of structure is being built. So code generation speed is great (and has some great use cases), but the architectural decisions are far more important.

You cannot prompt what you do not understand

This became the most important lesson of the project.

If I did not understand horizontal scaling, I could not ask the AI to design for it. If I did not understand Unix permissions, the system would fail in ways that were invisible until runtime. If I did not understand storage semantics, I could not reason about data safety or recovery.

When I stayed at the level of business requirements, the resulting system was fragile. When I went deeper and explained why constraints existed, the quality of the system improved dramatically. Vibe coding does not replace engineering fundamentals. It amplifies them. The better you understand the system you are trying to build, the more powerful the AI becomes. Without that understanding, it produces something that looks right and fails quietly. You can not build a dam without understanding the physics and engineering behind it; it will only be a model on paper. You can speed it up by using automation, but the real engineering decisions make all the difference.

Git as institutional memory

One unexpected insight came from how Git functioned in this setup.

I committed everything. Successful changes, failed experiments, broken approaches, and explicit notes on what did not work and why. In AI-assisted development, deleting failed code is often a mistake. Agents struggle with negative context, with knowing what not to try.

By preserving that history and asking agents to read it before proposing changes, the quality of suggestions improved noticeably. In this model, failed attempts became institutional memory. Dead ends were not garbage. They were documentation.

Much of the advice online suggests letting AI agents generate multiple branches and discarding anything that does not work. I found this approach deeply flawed. Humans accumulate judgment through failed attempts. AI agents need those failures preserved explicitly. Git history became the closest thing to institutional memory.

Making system state visible

Logging played a similar role.

AI agents are blind to runtime state unless you show it to them. I logged aggressively during development. Variable values, file paths, permissions, ownership, and state transitions were all made explicit. This grounded the AI in reality and reduced hallucinated assumptions.

Once the system stabilized, much of that logging was removed. During convergence, however, it dramatically increased velocity. It also highlighted how much room there is for tooling that treats observability as a first-class input for AI agents.

This made one thing obvious. Most observability tooling is designed for humans. As AI agents become first-class collaborators, we will need logging and debugging systems designed for machines to reason over, not just dashboards for people to read.

The illusion of being “almost done.”

At one point, after getting the cluster running with several services, I told an engineering acquaintance I felt about eighty percent done.

Without even looking at the system, he said I was probably closer to thirty percent.

He was right. I had confused the happy path with production readiness. AI excels at getting you from zero to one. It struggles with everything between one and scale. Edge cases, regressions, drift, and failure modes still demand judgment.

This wasn’t an AI failure. It was a reminder that steering still matters, that understanding what you actually need, not just what is “best in class,” is critical. To use our construction analogy: if you don’t know whether every room needs a window, what kind of windows they should be/single pane vs double pane, or even whether a window belongs in a particular room at all, you can not outsource those decisions to the contractor. You have to know/decide first.

Technical debt as a deliberate choice

MyNodeOne carries plenty of technical debt. Early designs assumed a single machine. Adding nodes exposed those assumptions. Storage and networking decisions had to be revisited. At one point, MinIO was layered on top of Longhorn in a way that was clearly suboptimal for the long term. I knew it was wrong and took the debt anyway.

That was a conscious tradeoff. I needed to reach a form of product fit, even if the product was only for myself. Once the value was clear, it became worth paying down the debt. Good infrastructure is not about theoretical purity. It is about constraints, timing, and tradeoffs.

What this changed for me

This project changed how I think about building products in the AI era.

PRDs still matter, but their role is shifting. They need to be much better at defining business problems rather than prescribing solutions. Experiments matter more. Feedback loops matter most. This experience taught me that in the AI era, prototype outcomes matter more than spec documents. Rapid feedback loops replace slow upfront plans but only if you retain architectural context and don’t confuse speed with stability.

AI agents dramatically reduce the cost of iteration, but they do not replace judgment. They accelerate learning when you are willing to engage with reality instead of abstraction. What became clear very quickly is that AI does not eliminate the need for engineering thinking. It lowers the barrier to execution, not the bar for judgment. More people can now build, but only those who understand systems can build things that last.

At this point, I have been able to build serious infrastructure with AI agents.

It is imperfect; It is evolving; It is real. I do not remember the last time I enjoyed building this much. The next system will take a fraction of the time, not because AI improved, but because I did.

The agents are here. The hardware is here. What still matters is curiosity and the willingness to understand how things actually work. That part cannot be automated. And it turns out that is also the part I enjoyed the most.

The full repository, including the commit history that tells this story, is here:
https://github.com/vinsac/MyNodeOne

Disclaimer: https://vinaysachdeva.com/disclaimer/. The opinions expressed in the blog post are my own and do not reflect the view(s) of my employer.

How useful was this post?

Click on a star to rate it!

Average rating 4.8 / 5. Vote count: 9

No votes so far! Be the first to rate this post.