Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
I am building a cloud (crawshaw.io)
928 points by bumbledraven 16 hours ago | hide | past | favorite | 458 comments
 help



> Making Kubernetes good is inherently impossible, a project in putting (admittedly high quality) lipstick on a pig.

So well put, my good sir, this describes exactly my feelings with k8s. It always starts off all good with just managing a couple of containers to run your web app. Then before you know it, the devops folks have decided that they need to put a gazillion other services and an entire software-defined networking layer on top of it.

After spending a lot of time "optimizing" or "hardening" the cluster, cloud spend has doubled or tripled. Incidents have also doubled or tripled, as has downtime. Debugging effort has doubled or tripled as well.

I ended up saying goodbye to those devops folks, nuking the cluster, booted up a single VM with debian, enabled the firewall and used Kamal to deploy the app with docker. Despite having only a single VM rather than a cluster, things have never been more stable and reliable from an infrastructure point of view. Costs have plummeted as well, it's so much cheaper to run. It's also so much easier and more fun to debug.

And yes, a single VM really is fine, you can get REALLY big VMs which is fine for most business applications like we run. Most business applications only have hundreds to thousands of users. The cloud provider (Google in our case) manages hardware failures. In case we need to upgrade with downtime, we spin up a second VM next to it, provision it, and update the IP address in Cloudflare. Not even any need for a load balancer.


If you spin up Kubernetes for "a couple of containers to run your web app", I think you're doing something wrong in the first place, also coupled with your comment about adding SDN to Kubernetes.

People use Kubernetes for way too small things, and it sounds like you don't have the scale for actually running Kubernetes.


It depends what you're doing it.

My app is fairly simple node process with some side car worker processes. k8s enables me to deploy it 30 times for 30 PRs, trivially, in a standard way, with standard cleanup.

Can I do that without k8s? Yes. To the same standard with the same amount of effort? Probably not. Here, I'd argue the k8s APIs and interfaces are better than trying to do this on AWS ( or your preferred cloud provider ).

Where things get complicated is k8s itself is borderline cloud provider software. So teams who were previously good using a managed service are now owning more of the stack, and these random devops heros aren't necessarily making good decisions everywhere.

So you really have three obvious use cases:

a) You're doing something interesting with the k8s APIs, that aren't easy to do on a cloud provider. Essentially, you're a power user. b) You want a cloud abstraction layer because you're multi-cloud or you want a lock-in bargaining chip. c) You want cloud semantics without being on a cloud provider.

However, if you're a single developer with a single machine, or a very small team and you're happy working through contended static environments, you can pretty much just put a process on a box and call it done. k8s is overkill here, though not as much as people claim until the devops heros start their work.


Call me old fashion but I prefer tools like Dokploy that make deployment across different VPS extremely easy. Dokploy allows me to utilize my home media server, using local instances of forgejo to deploy code, to great effect.

k8s appears to be a corporate welfare jobs program where trillion dollar multinational monopolistic companies are the only ones who can collectively spend 100s of millions sustaining. Since most companies aren't trillion dollar monopolies, adopting such measures seems extremely poor.

All it signals to me is that we have to stop letting SV + VC dictate the direction of tech in our industry, because their solutions are unsustainable and borderline useless for the vast majority of use cases.

I'll never forget the insurance companies I worked at that orchestrated every single repo with a k8s deployment whose cloud spend was easily in the high six figures a month to handle a work load of 100k/MAU where the concurrent peak never went more than 5,000 users, something the company did know with 40 years of records. Literally had a 20 person team whose entire existence was managing the companies k8s setup. Only reason the company could sustain this was that it's an insurance company (insurance companies are highly profitable, don't let them convince you otherwise; so profitable that the government has to regulate how much profit they're legally allowed to make).

Absolute insanity, unsustainable, and a tremendous waste of limited human resources.

Glad you like it for your node app tho, happy for you.


K8s is just a standardized api for running "programs" on hardware, which is a really difficult problem it solves fairly well.

Is it complex? Yes, but so is the problem it's trying to solve. Is its complexity still nicer and easier to use than the previous generation of multimachine deployment systems? Also yes.


I wrote a scheduler for VMs a long time ago. k8s is basically just the same thing but for containers.

It really confuses me how someone can argue for cloud providers over a decent open solution without realising their argument is simply they don't want to be managing the thing.

And that's fine, most teams shouldn't be neck deep in managing a platform. But that doesn't make the solution bad.


I took over tech for a POS company some years ago. They were a .net shop with about 80 developers, less than 200 concurrent connections, 6 figures spend cloud, and 0 nines uptime with a super traditional setup.

Point being, it's not the tools the causes the probem.


Just as a quick aside, I tried Coolify, Dokploy, Dockge, and Komodo, and if you're trying to do a Heroku-style PaaS, Dokploy is really good. Hands down the best UX for delivering apps & databases. It's too bad about the licensing. (e.g. OIDC + audit logs behind a paid enterprise license.)

Coolify is full of features, but the UX suffers and they had a nasty breaking bug at one point (related to Traefik if you want to search it.) Dockge is just a simple interface into your running Docker containers and Komodo is a bit harder to understand/come up with a viable deployment model, and has no built-in support for things like databases.


If you're open, love to get your thoughts on https://miren.dev. We've doing similar things, but leaning into the small team aspects of these systems, along with giving folks an optional cloud tie in to help with auth, etc.

I use Cosmos Cloud on a free 24g oracle VM. Nice UI, solid system

Cosmos Cloud looks neat! At a first glance from looking at the web page, it looks more focused on delivering a "personal cloud" or "1-click deploy apps."

Dokploy is more Heroku-styled: while you can deploy third-party apps (it's just Docker after all), it seems really geared towards and intended for you to be deploying your own apps that you developed, alongside a "managed" database (meaning, the DB is exposed in the UI, includes backup functionality, and can even be temporarily exposed publicly on the internet for debugging.)

Coolify feels a bit like a mix of the two deployment models, while Dockge is "bring your own deployment" and Komodo offers to replace Terraform/Ansible/docker-compose through its own declarative GitOps-style file-based config but lacks features like managed databases, or built-in subdomain provisioning.


Totally, it's all about the primitives. I'm curious where exe.dev is gonna build on the the base, or just leave it up to folks to add all their own bespoke stuff to do containers, logs, etc.

The last 20 years has given us a lot of great primitives for folks to plug in, I think that lots of people don't want to wrangle those primitives, they just want to use them.


> I'd argue the k8s APIs and interfaces are better than trying to do this on AWS

I think Amazon ECS is within striking distance, at least. It does less than K8S, but if it fits your needs, I find it an easier deployment target than K8S. There's just a lot less going on.


I ran renderapp in ECS before I ran it in k8s.

The deployment files / structure were mostly equivalent with the main differences being I can't shell into ECS and I lose kubectl in favour of looking at the AWS GUI ( which for me is a loss, for others maybe not ).

The main difference is k8s has a lot of optionality, and folks get analysis paralysis with all the potential there. You quickly hit this in k8s when you have to actually need the addon to get cloudwatch logs.

This is also where k8s has sharp edges. Since amazon takes care of the rest of the infrastructure for you in ECS, you don't really need to worry about contention and starving node resources resulting in killing your logging daemon, which you could technically do in k8s.

However, you'll note that this is a vendor choice. EKS Auto Mode does away with most of the addons you need to run yourself, simplifying k8s, moving it significantly closer to a vendor supported solution.


That or fargate if your just running a few containers.

> a) You're doing something interesting with the k8s APIs, that aren't easy to do on a cloud provider. Essentially, you're a power user. b) You want a cloud abstraction layer because you're multi-cloud or you want a lock-in bargaining chip. c) You want cloud semantics without being on a cloud provider.

This is well put and it's very similar to the arguments made when comparing programming languages. At the end of the day you can accomplish the same tasks no matter which interface you choose.

Personally I've never found kubernetes that difficult to use[1]. It has some weird, unpredictable bits, but so does sysvinit or docker, that just ends up being whatever you're used to.

[1] except for having to install your own network mesh plugin. That part sucked.


Depends. For personal projects, yeah definitely. But at work? Typically the “Platform” team can only afford to support 1 (maybe 2) ways of deployment, and k8s is quite versatile, so even if you need 1 small service, you’ll go with the self-service-k8s approach your Platform team offers. Because the alternative is for you (or your team) to own the whole infrastructure stack for your new deloyment model (ecs? lambda? Whatever): so you need to setup service accounts, secret paths, firewalls, security, pipelines, registries, and a large etc. And most likely, no one will give you access rights for all of that , and your PM won’t accept the overhead either.

So having everyone use the same deployment model (and that’s typically k8s) saves effort. I don’t like it for sure


This is where I'm at. Using Podman daily to run Python scripts and apps and it's been going great! However trying to build things like monitoring, secure secret injection, centralized inventory, remote logging, etc. has fallen on us. Has lead to some shadow IT (running our own container image registry, hashicorp vault instance, etc.) which makes me hesitant to share with others in the company how we're operating.

I like to think if we had a K8s environment a lot of this would be built out within it. Having that functionality abstracted away from the developer would be a huge win in my opinion.


Are you doing that across a fleet of machines or just one?

We have 4 servers we run containers on. Calling that a fleet feels too generous. Not much rhyme or reason as to what containers run on which server

I totally agree, but that's not what happens in reality: the average devops knows k8s and will slap it onto anything they see (if only so they can put in on their resume). The average manager hears about k8s, gets convinced they need and hires beforementioned devops to build it.

> the average devops knows k8s and will slap it onto anything they see

This is certainly the case from all the third person accounts I hear. Online. I never actually met a single one that is like that, if anything, those same people are the ones that are first to tell me about their Hetzner setups.


To be fair, I have k8s on my hetzner :p

DevOps here.

The trouble is that we are literally expected to do this everywhere we go. I've personally advocated for approaches which use say, a pair of dedicated servers, or VMs as in GPs example. If you want it outside of AWS/GCP/Azure, you're regarded as a crazy person. If you don't adopt "best practices" (as defined by vendors) then management are scared. Management very often trust the sales and marketing departments of big vendors more than their own staff. Many of us have given up fighting this, because what it comes down to is a massive asymmetry of information and trust.


There is a kernel of validity lurking in the heart of all this, which is that immutable images you have the ability to throw away and refresh regularly are genuinely better than long-running VMs with an OS you've got to maintain, with the scope for vulnerabilities unrelated to the app you actually want to run. Management has absorbed this one good thing and slapped layer after layer of pointless rubbish on it, like a sort of inverse pearl. Being able to say "we've minimised our attack surface with a scratch image" (or alpine, or something from one of the secure image vendors) is a genuinely valuable thing. It's just the all of the everything that goes along with it...

Sure.

The challenge is convincing people that "golden images" and containers share a history, and that kubernetes didn't invent containers: they just solved load balancing and storage abstraction for stateless message architectures in a nice way.

If you're doing something highly stateful, or that requires a heavy deployment (game servers are typically 10's of GB and have rich dynamic configuration in my experience) then kubernetes starts to become round-peg-square-hole. But people buy into it because the surrounding tooling is just so nice; and like GP says: those cloud sales guys are really good at their jobs, and kubernetes is so difficult to run reliably yourself that it gets you hooked on cloud.

There's a literal army of highly charismatic, charming people who are economically incentivised to push this technology and it can be made to work so- the odds, as they say, are against you.


> If you want it outside of AWS/GCP/Azure, you're regarded as a crazy person. If you don't adopt "best practices" (as defined by vendors) then management are scared. Management very often trust the sales and marketing departments of big vendors more than their own staff. Many of us have given up fighting this, because what it comes down to is a massive asymmetry of information and trust.

I think this is the crux of the matter. Also, "everybody is doing it, so they must be right" is also a very common way of thinking amongst this population.


The following happened to a friend.

Around the time of the pandemic, a company wanted to make some Javascript code do a kind of transformation over large number of web-pages (a billion or so, fetched as WARC files from the web archive). Their engineers suggested setting up SmartOS VMs and deploying Manta (which would have allowed the use of the Javascript code in a totally unmodified way -- map-reduce from the command-line, that scales with the number storage/processing nodes) which should have taken a few weeks at most.

After a bit of googling and meeting, the higher ups decided to use AWS Lambdas and Google Cloud Functions, because that's what everyone else was doing, and they figured that this was a sensible business move because the job-market must be full of people who know how to modify/maintain Lambda/GCF code.

Needless to say, Lambda/GCF were not built for this kind of workload, and they could not scale. In fact, the workload was so out-of-distribution, that the GCP folks moved the instances (if you can call them that) to a completely different data-center, because the workload was causing performance problems, for _other_ customers in the original data-center.

Once it became clear that this approach cannot scale to a billion or so web-pages, it was decided to -- no, not to deploy Manta or an equivalent -- but to build a custom "pipeline" from scratch, that would do this. This system was in development for 6 months or so, and never really worked correctly/reliably.

This is the kind of thing that happens when non-engineers can override or veto engineering decisions -- and the only reason they can do that, is because the non-engineers sign the paychecks (it does not matter how big the paycheck is, because market will find a way to extract all of it).

One of the fallacies of the tech-industry (I do not mean to paint with too broad a brush, there are obviously companies out there that know what they are doing) is that there are trade-offs to be made between business-decisions and engineering-decisions. I think this is more a kind of psychological distortion or a false-choice (forcing an engineering decision on the basis of what the job market will be like some day in the future -- during a pandemic no less -- is practically delusional). Also, if such trade-offs are true trade-offs, then maybe the company is not really an engineering company (which is fine, but that is kind of like a shoe-store having a few podiatrists on staff -- it is wasteful, but they can now walk around in white lab-coats, and pretend to be a healthcare institution instead of a shoe-store).

Personally, I believe that the tech industry sustains itself via technical debt, much like the real economy sustains itself on real debt. In some sense, everyone is trying to gaslight everyone else into incurring as much technical debt as possible, so that a way to service the debt can be sold. Most of the technical debt is not necessary, and if people were empowered to just not incur it, I suspect it would orient tech companies towards making things that actually push the state of the art forward.


> Personally, I believe that the tech industry sustains itself via technical debt, much like the real economy sustains itself on real debt. In some sense, everyone is trying to gaslight everyone else into incurring as much technical debt as possible, so that a way to service the debt can be sold.

This feels like a reminder that everything "Cloud" is still basically the same as IBM's ancient business model. We've always just been renting time on someone else's computers, and those someone else people are always trying to rent more time. The landlords shift, but the game stays the same.


There was a moment ca. 2020 when everyone was losing their minds over Lambda and other cloud services like SQS and S3 because they're "so cheap!!11". Innumeracy is a hell of a drug.

Still is, just details change.

A lot of criticism of k8s is always centered about some imagined perfect PaaS, or related to being in very narrow goldilocks zone where the costs of "serverless" are easier to bear...


> Management very often trust the sales and marketing departments of big vendors more than their own staff.

They're getting kickbacks from cloud vendors. Prove me wrong.


not sure if this is a thing with Cloud vendors, but e.g. in Finance, you'll definitely get the opportunity to call your rep over for free fancy dinners or whatever you want, because those are "customer meetings"

better than nothing, I don't blame em.


And the average developer doesn't even know where to start to deploy things in prod. When the feature product asks passes QA... to the next sprint! we are done!

Whose responsibility is it to establish the prerequisite CICD pipelines, HITL workflows, and Observability infr in order for devs to shepherd changes to prod (and track their impact)? Hint: it's not the developer's.

This was the point of "devops" (the concept, not the job title): the team should be responsible for development and operations, so one isn't prioritised hugely over the other.

But those things all require more pods on the cluster! We've looped back around to the beginning.

Exactly my point. But then developers: "I just want to go to my Heroku days again!" but then with a sufficient big company there are maaany developers doing things their slightly different way, and then other effects start compounding, and then costs go up because 15 different teams are using 27 different solutions and and and...

But yeah, let's just spin-up a shadow IT VM with Debian like GP said, it's easy!


> But yeah, let's just spin-up a shadow IT VM with Debian like GP said, it's easy!

That’s literally how they sold AWS in the beginning.

Cloud won not because of costs or flexibility but because it allowed teams to provision their own machines from their budget instead of going through all the red tape with their IT departments creating… a bunch of shadow IT VMs!

Everything old is new again, except it works on an accelerated ten year cycle in the IT industry.


Indeed. And it stems from the illusion that what works in solo/small teams/scrappy startup works the same when you are bigger, and that a developer can take over all the corollary work to the actual product development.

And yes, a dev that's able to do that properly (stress on properly) is indeed a signal of a better overall developer but they are a minority and anyway as orgs scale up there is just too much of "side salad" that it becomes a separated dish.


> the average devops knows k8s

If you'd know Kubernetes, you know not to use it. I say that as someone who used to do consulting for it.

The reality is that yet again "making money" completely collides with efficient, quality, sane productive work.

For me one of the main reasons to leave that space is that I couldn't really deal with the fact that my work collides with a client's success. That said I have helped to get off that stuff and other things that they thought they needed, that just wasted time and money. It just feels odd going into a company that hired you to consult on a topic only to end up telling them "The best approach for you is not doing that at all". Often never. Like some people thought "Well, if we have hundreds of thousands or even millions of users" and the reality was that even in these scenarios if you went away from that abstract thought and discussed a hypothetical based on their product they realized that they'd still be better off without it. Besides the fact that this hypothetical often was in a future that made it likely that they said they'd likely have completely different setup so preparing for that didn't even make sense.

I think a big thing related to that was/is the microservice craze where people end up moving to a complex architecture for not many good reasons and then they increase complexity way faster than what they actually deliver in terms of the product, because it somehow feels good. I know it does, I've been there. When in reality the outcome often is just a complex mess with what could have been a relatively simple monolith. And these monoliths do work. And in the vast majority of cases they are easy to scale, because your problem switches from "how do we best allocate that huge amount of very different services across our infrastructure" to (for the most part) "how do we spin up our monolith on one more server" which tends to be a way easier to tackle service.

And nothing stops you from still using everything else if you want. Just because it's a monolith doesn't mean you need to skip on any of the cloud offerings, etc. For some reason there seems to be that idea that if you write a monolith you are somehow barred from using modern tooling, infrastructure, services, etc. Not sure where that comes from.


I think one big problem is that using microservice architecture doesn't mean that literally everything has to be a "microservice". if you don't truly need granual scaling (i.e. your "app" doesn't get a bunch of asymmetric loads across different paths), then you can just have more monolithic "microservices" until they need to be split up

imo this should achieve a nice balance?


In some sense, Kubernetes is just a portable platform for running Linux services, even on a single node using something like K3s. I almost see it as being an extension of the Linux OS layer.

This is what I do for small stuff, debian vm, k3s on it for a nicer http based deployment api.

Then why can't we put a wrapper onto systemd and make that into a light weight k8s?

This may be familiarity bias, but I often find `kubectl` and related tools like `k9s` more ergonomic than `systemctl`/`journalctl`, even for managing simple single-replica processes that are bound to the host network.

See Podman quadlets.

Systemd is on the wrong layer here. You need something that can set your machine up, like docker.

Systemd seems to be moving in that direction, the features are coming together to actually enable this.

Though imagining the unholy existence of an init system who's only job is to spin up containers, that can contain other inits, OS images, or whatever ..... turtles all the way down.


I don't see why not. Maybe it should be on the same layer going forward - for true cloud compute (including on-premise cloud)

Okay it sets the machine up, but not the underlying host machine though.

Remember fleet?

Yep, this is the way. Linux is just a platform for running services on one or more computers without needing to know about those computers individually, and even if your scale is 1, it's often easier to install k3s and manage your services with it rather than memorizing a bunch of disparate tools with their own configuration languages, filepath conventions, etc. It's just a lot easier to use k3s than it is to cobble together stuff with traditional linux tools. It's a standard, scalable pane of glass and as much as I may dislike kubectl, it's worlds better than systemctl and journalctl and the like.

> People use Kubernetes for way too small things, and it sounds like you don't have the scale for actually running Kubernetes.

This is a problem I've run into enterprise deployments. K8s is often the lowest common denominator semi small platform engineering teams arrive on. At my current employer, a platform managed K8s namespace is the only thing we got in terms of PaaS offering, so it is what we use. Is it overpowered? Yes. Is it overly complex for our usecase? Definitely. Could we basically get by hosting our services on a few cheap mini computers with no performance penalty? Also yes.


I know that "resume-driven development" exists, where the tradeoffs between approaches aren't about the technical fit of the solution but the career trajectory. I've seen people making plain workstation preparation scripts using Rust, only to have something to flex about in interviews.

I'm not surprised even in the slightest that DevOps workers will slap k8s on everything, to show "real industry experience" in a job market where the resume matches the tools.


Your first example sound very sensible to me?

Using new technology in something small and unimportant like a setup script is a perfect way to experiment and learn. It would be irresponsible to build something important as the first thing you do in a new language.


For your own use, yes.

But if you're working with others, you should default to using standard industry tools (absent a compelling reason not to) because your work will be handed off to others and passed on to new team members. It's unreasonable to expect that a new Windows or Linux sysadmin or desktop support tech must learn Rust to maintain a workstation setup workflow.


agreed. I think if we all went with this HN mindset of "html4 and PHP work just fine" we wouldn't have gone anywhere with regards to all the technical advancements we enjoy today in the software space

We are building a religion, we are building it bigger We are widening the corridors and adding more lanes We are building a religion, a limited edition We are now accepting coders linking new AI brains

(Apologies to Cake. And coders.)


there are alsp people with devops title that do not know anything else than the hammer, and then everything is a hammer problem.

I mean, I worked with people who were suprised that you can run more applications inside ec2 vm than just 1 app.


> there are alsp people with devops title that do not know anything else than the hammer, and then everything is a hammer problem.

To be fair though, that's true for every profession or skill.

> I mean, I worked with people who were suprised that you can run more applications inside ec2 vm than just 1 app.

I've seen something similar where people were surprised that you can use an object storage (so effectively "make HTTP requests") from every server.


Conversely, we had millions of server huggers before, who each knew their company's stuff in a way that wasn't really applicable if they went somewhere else.

Every company used to have a bespoke collection of build, deployment, monitoring, scaling, etc concerns. Everyone had their own practices, their own wikis to try to make sense of what they had.

I think we critically under-appreciate that k8s is a social technology that is broadly applicable. Not just for hosting containers, but as a cloud-native form of thinking, where it becomes much easier to ask: what do we have here, and is it running well, and to have systems that are helping you keep that all on track (autonomic behavior/control loops).

I see such rebellion & disdain for where we are now, but so few people who seem able to recognize and grapple with what absolute muck we so recently have crawled out of.


Doing Kubernetes like doing Agile is mandatory nowadays. I've been asked to package a 20 line worth of bash script as docker image so it can be delivered via CI/CD pipeline via Kubernetes pods in cloud.

Value is not that I got job done at a day's notice. It is black mark that I couldn't package it as per industry best practices.

Not doing would mean out of job/work. Whether it is happening correctly is not something decision makers care as long it is getting done anyhow.


There are many organizations which still ship software without Kubernetes. Perhaps even the vast majority.

Of course. I used to think I am working for one such organization for long time. Until leadership decided "modernization" as top priority for IT teams as we are lagging far.

I don't think there are any other industry best practices you could have followed.

That's basically why k8s is so compelling. It's tech is fine but it's a social technology that is known and can be rallied behind, that has consistent patterns that apply to anything you might dream of making "cloud native". What you did to get this script available for use will closely mirror how anyone else would also get any piece of software available.

Meanwhile conventional sys-op stuff was cobbling together "right sized" solutions that work well for the company, maybe. These threads are overrun with "you might not need k8s" and "use the solution that fits your needs", but man, I pity the companies doing their own frontiers-ing to explore their own bespoke "simple" paths.

I do think you are on to something with there not being food taste making, with not good oversight always.


> it sounds like you don't have the scale for actually running Kubernetes.

You don't set up k8s because your current load can't be handled, you do for future growth. Sometimes that growth doesn't pan out and now you're left with a complex infrastructure that is expensive to maintain and not getting any of the benefit.


We have a hobby web based app that consists of multiple containers. It runs in docker compose. Serves 1000 users right now (runs 24/7). Single VM.

No Kubernetes whatsoever.

I agree with you.


Docker compose is brilliant while your stack remains on a single box, and will scale quite nicely for some time this way for most applications with minimum maintenance overhead.

My personal strategy has always been to start off in docker compose, and break out to a k8s configuration later if I have to start scaling beyond single box.


k8s is useful when you have services that must spin up and down together, and you want to swap out services and deploy all/some/one.

and then also package this so that you and other developers can get the infrastructure running locally or on other machines.


Even if using just one VM, I'll probably slap k3s on it and manage my application using manifests. It's just so much easier than dealing with puppet or chef or vanilla cloud-init. Docker compose works too, but at that point it's just easier to stick with k3s and then I can have nice things like background jobs, a straightforward path to HA, access to an ecosystem of existing software, and a nicer CLI.

Thats what I don't get when people bring up this idea k8s is complicated.

All of those other tools are complicated and fragile


I think the things that trip people up are:

1. People expect k8s to be an opinionated platform and it's very happy to let you make a mess

2. People think k8s is supposed to be a cross platform portability layer and ... it maybe can be if you're very careful, but it's mostly not that

3. People compare k8s/cloud/etc to some monolithic application with admin permissions to everything and they compare that to the "difficulty" of dealing with RBAC/IAM/networking/secrets management

4. People don't realize how much more complicated vanilla Linux tooling and how much more accidental complexity is involved


They use it for inflating their resume for career progression rather than actually evaluating if they need it in the first place.

This is why you get many folks over-thinking the solution and picking the most hyped technologies and using them to solve the wrong problems without thinking about what they are selling.

You don't need K8s + AWS EC2 + S3 just to host a web app. That tells me they like lighting money on fire and bankrupting the company and moving to the next one.


Often the alternatives presented as cheaper to me in discussions are actually burning money.

But given how I always see "you don't need k8s because you're not going to scale so fast" I am feel like even professional k8s operators have missed the fundamental design goals of it :/ (maximizing utilization of finite compute)


yeah it's like wanting to drive to the mall in the Space Shuttle and then complaining how its too complicated

The problem with Kubernetes is that it doesn't scale down to small deployments very well, but it sure as shit doesn't scale up to large ones either. Large shared multi-tenant clusters have massive problems even when running parts of the same application with the same incentives, it falls apart completely when the tenants are diverse.

Nomad has neither of these problems.


I have nom doubt that there are legit use cases for something like k8s at Google or other multi-billion companies.

But if its use was confined to this use case, pretty much nobody would be using it (unless as a customer of the organization's infra) and barely would be talking about it (like how there isn't too much talk about Borg).

The reason k8s is a thing in the first place is because it's being used by way too many people for their own goods. (Most people having worked in startups have met too many architecture astronauts in our lives).

If I had to bet, I'd wager that 99% of k8s users are in the “spin a few containers to run your web app” category (for the simple reason that for one billion-dollar tech business using it for legit reasons, there's many thousands early startups who do not).


The legit use case for companies like Google/Amazon etc is only to sell it to customers. None of these companies use K8s internally for real critical workloads.

Ehm, that is simply not true. Google built it for themselves first. It is essentially the open source version of the internal architecture. It gets used.

I worked at google. k8s does not really look at all like what they used internally when I was there, aside from sharing some similar looking building blocks.

Yeah, but is the internal tool simpler? I'd be surprised.

Simpler to use? yes. Simpler under the hood? No.

Also Amazon definitely uses k8s for stuff.

Teams are free to use EKS internally.


Google uses Kubernetes' grandpa, called Borg, for everything.

But to quote someone: "you are not Google".


I said “something like k8s” above, and Google for sure uses something like k8s called Borg.

Yes, it is Borg. Not k8s. Granted it is similar

Nomad is substantially more like Borg than Kubernetes is. And, funnily enough, scales better in both directions than Kubernetes does!

And those devops folks just let your single debian VM be? It sounds like you have, like many of us, an organizational/people problem, not a k8s problem.

Maybe those devops folks only pay attention to k8s clusters and you're flying under their radar with your single debian VM + Kamal. But the same thinking that results in an overtly complex, impossible to debug, expensive to run k8s cluster can absolutely result in the same using regular VMs unless, again, you are just left to your own devices because their policies don't apply to VMs, yet.

The problem usually is you're one mistake away from someone shoving their nose in it. "What are you doing again? What about HA and redundancy? slow rollout and rollback? You must have at least 3 VMs (ideally 5) and can't expose all VMs to the internet of course. You must define a virtual network with policies that we can control and no wireguard isn't approved. You must split the internet facing load balancer from the backend resources and assign different identities with proper scoping to them. Install these 4 different security scanners, these 2 log processors, this watchdog and this network monitor. Are you doing mtls between the VMs on the private network? what if there is an attacker that gains access to your network? What if your proxy is compromised? do you have visibility into all traffic on the network? everything must flow throw this appliance"


I mean, it's pretty clear the only reason they even got to swap to a single VM and take the glory is because they fired the devops in question. As in, they're the actual boss of a small operation. That's what saying goodbye and nuking the cluster implies here.

A single VM is indeed the most pragmatic setup that most apps really need. However I still prefer to have at least two for little redundancy and peace of mind. It’s just less stressful to do any upgrades or changes knowing there is another replica in case of a failure.

And I’m building and happily using Uncloud (https://github.com/psviderski/uncloud) for this (inspired by Kamal). It makes multi-machine setups as simple as a single VM. Creates a zero-config WireGuard overlay network and uses the standard Docker Compose spec to deploy to multiple VMs. There is no orchestrator or control plane complexity. Start with one VM, then add another when needed, can even mix cloud VMs and on-prem.


People have it backwards.

If you have an app and you want to run a single app yeah silly to look for K8s.

If you have a beefy server or two you want to utilize fully and put as many apps on it without clashing dependencies you want to use K8s or docker or other containers. Where K8s enables you to go further.


That looks pretty interesting. Is it being used in production yet (I mean serious installs) ?

Yes but at small scale. Myself and a handful of others from our Discord run it in production. The core build/push/deploy workflows are stable and most of the heavy lifting at runtime is done by battle-tested projects: Docker, Caddy, WireGuard, Corrosion from Fly.io.

Radboud University recently announced they're rolling it out for managing containers across the faculty which is the most "serious install" I know about, but there could be other: https://cncz.science.ru.nl/en/news/2026-04-15_uncloud/


Did you improve the security concerns? E.g the way it executes in a `curl | bash` level. I was a bit concerned about that.

TBF, the documentation says you can download and review the script, then run it. Or use other methods like a homebrew or (unofficial) Debian package, or you can just install the binary where you want it, which is all the install.sh script (107 lines, 407 words) does.

https://uncloud.run/docs/getting-started/install-cli/#instal...


I mean how commands are run on the servers - indirectly or indirectly. It's likely a code quality issue?

this is dope work.

I don't get it, I think that k8s is the best software written since win95. It redefines computing in the same way IMHO. I have some experience in working with k8s on prod and I loved every moment of it. I'm definitely missing something.

Took a while to find this. K8s is great, IMO most of the people with alternative setups are just rebuilding (usually worse) or compressing (specific to their use case) k8s features that have been GA for a long time.

Spend some time learning it, using it to deploy simple apps, and you won't go back to deploying in a VM again imo.

This only gets better with ai-assisted development, any model is going to produce much better results for k8s given the huge training set vs someone's bespoke build rube-goldberg machine.


I deploy prod by running a shell script I wrote that rsyncs the latest version of the codebase to my server, then sshs into the server and restarts the relevant services

how could k8s improve my deployment process?


What happens when your new version is broken? Kubernetes would rollback to old version. You have to rerun the deployment script and hope you have the old version available. Kubernetes will even deploy new version to some copies, test it, and then roll out the whole thing when it works.

Also, Kubernetes uses immutable images and containers so you don't have to worry about dependencies or partial deploys.


You know your app better than me, but here are some practical reasons for the typical B2C app:

split deployments -- perhaps you want to see how an update impacts something: if error rates change, if conversion rates change, w/e. K8s makes this pretty easy to do via something like a canary or blue green deployment. Likewise, if you need to rollback, you can do this easily as well from a known good image.

Perhaps you need multiple servers -- not for scale -- but to be closer to your users geographically. 1 server in each of -5-10 AZs makes the updates a bit more complicated, especially if you need to do something like a db schema update.

Perhaps your traffic is lumpy and peaks during specific times of the year. Instead of provisioning a bigger VM during these times, your would prefer to scale horizontally automatically. Likewise, depending on the predictable-ness of the distribution of traffic, running a larger machine all the time might be very expensive for only the occasional burst of traffic.

To be very clear, you can do all of this without k8s. The question is, is it easier to do it with or without? IMO, it is a personal decision, and k8s makes a lot of sense to me. If it doesn't make a ton of sense for your app, don't use it.


I think it's just that k8s allows you to shoot yourself in the foot, thus it gets all the blame.

when in reality, you can go very bare-bones with k8s, but people pretend like only the most extreme complexity is what's possible because it's not easy to admit that k8s is actually quite practical in a lot of ways, especially for avoiding drift and automation

that's my take on it


Can you expand how it redefined computing for you personally?

I noticed in his article he said something like 'and then devops team puts a ton of complexity...' which doesnt seem like a k8s problem.

You're not missing anything. There's legions of amateurs that dislike k8s because they don't understand the value.

> the best software written since win95

This feels like what us Brits would call "damning with faint praise".

Windows 95 was terrible. Really bad. If you really mean to say that Kubernetes is revolutionary and well-engineered, Windows 2000 would be a much better example.


I thought we collectively learned this with stack overflows engineering blog years ago.

Scale vertically until you can't because you're unlikely to hit a limit and if you do you'll have enough money to pay someone else to solve it.

Docker is amazing development tooling but it makes for horrible production infrastructure.


Docker is great development tooling (still some rough edges, of course).

Docker Compose is good for running things on a single server as well.

Docker Swarm and Hashicorp Nomad are good for multi-server setups.

Kubernetes is... enterprise and I guess there's a scale where it makes sense. K3s and similar sort of fill the gap, but I guess it's a matter of what you know and prefer at that point.

Throw on Portainer on a server and the DX is pretty casual (when it works and doesn't have weird networking issues).

Of course, there's also other options for OCI containers, like Podman.


> Docker Swarm

IS that a thing still?

> Kubernetes is... enterprise

I would contest that. Its complex, but not enterprise.

Nomad is a great tool for running processes on things. The problem is attaching loadbalancers/reverse proxies to those processes requires engineering. It comes for "free" with k8s with ingress controllers.


This is why there's an endless cycle of shitty SaaS with slow APIs and high downtime. People keep thinking that scale is something you can just add later.

What's a more reasonable general approach then?

Let's say you're a team of 1-3 technical people building something as an MVP, but don't necessarily want to throw everything away and rewrite or re-architect if it gets traction.

What are your day 1 decisions that let you scale later without over-engineering early?

I'm not disagreeing with you btw. I genuinely don't know a "right" answer here.


I'd argue on the contrary that it's the last decades' over-engineering bender that's coming home to roost. Now too many things have too many moving parts to keep stable.

Clearly, Kubernetes wasn’t the right solution for your case, and I also agree that using it for smaller architectures is overkill. That said, it’s the standard for large-scale production platforms that need reproducibility and high availability. As of today I don’t see many *truly* viable alternatives and honestly I haven't even seen them.

As the strongest engineer I ever worked with commented: "Across multiple FAANG-adjacent companies, I've never seen a k8s migration go well and not require a complete reimplementation of k8s behind the APIs."

Is that because kubernetes was the right fit from the beginning, or because the initial implementation was designed around kubernetes, which caused the migration to eventually end up taking that same shape?

I have designed a backend with exactly the same underlying philosophy as you ended up: load balancer? Oh, a problem. So better client-side hashing and get rid of a discovery service via a couple dns tricks already handled elsewhere robustly.

I took it to its maximum: every service is a piece that can break ---> fewer pieces, fewer potential breakages.

When I can (which is 95% of the time, I add certain other services inside the processed themselves inside the own server exes and make them activatable at startup (though I want all my infra not to drift so I use the same set of subservices in each).

But the idea is -- the fewer services, the fewer problems. I just think, even with the trade-offs, it is operationally much more manageable and robust in the end.


Yes, I mean, I’m an engineer on a cloud Kubernetes service, and I don’t run Kubernetes for my home services. I just run podman quadlets (systems units). But that is entirely different from an enterprise scale setup with monitoring, alerting, and scale in mind…

Similar deal here. My $dayjob title is "Cloud Engineer" and I spend a lot of my time working with AKS and Istio. But for some recent personal projects at home, I've just been running Docker Swarm on a single server. It's just lighter and less complicated, and for what I'm doing it more than satisfies my needs. Now if this was going to production at mass scale, I might consider switching to K8S, but for experimentation and initial development, it would be way overkill.

> But that is entirely different from an enterprise scale setup with monitoring, alerting, and scale in mind

Do you have experience with Kubernetes solving these issues? Would love to hear more if so.

Currently running podman containers at work and trying to figure out better solutions for monitoring, alerting, etc. Not so worried about scale (my simple python scripts don't need it) but abstracting away the monitoring, alerting, secure secret injection, etc. seems like it'd be a huge win.


At a previous job, our build pipeline

* Built the app (into a self contained .jar, it was a JVM shop)

* Put the app into a Ubuntu Docker image. This step was arguably unnecessary, but the same way Maven is used to isolate JVM dependencies ("it works on my machine"), the purpose of the Docker image was to isolate dependencies on the OS environment.

* Put the Docker image onto an AWS .ami that only had Docker on it, and the sole purpose of which was to run the Docker image.

* Combined the AWS .ami with an appropriately sized EC2.

* Spun up the EC2s and flipped the AWS ELBs to point to the new ones, blue green style.

The beauty of this was the stupidly simple process and complete isolation of all the apps. No cluster that ran multiple diverse CPU and memory requirement apps simultaneously. No K8s complexity. Still had all the horizontal scaling benefits etc.


I started using GKE at a seed stage company in 2017. It's still going fine today. I had zero ops experience and I found it rather intuitive. We brought in istio for mtls and outbound traffic policies and that worked pretty well too. I can only remember one fairly stressful outage caused by the control plane but it ended up remedying itself. I would certainly only use a managed k8s.

So I guess I'm a fan. I use a monolith for most of my stuff if I have the choice, but if I'm working somewhere or on something where I have to manage a bunch of services I'm most certainly going to reach for k8s.


Cloud providers have put a lot of time and effort into making you believe every web app needs 99.9999% availability. Making you pay for auto scaled compute, load balancers, shared storage, HA databases, etc, etc.

All of this just adds so much extra complexity. If I'm running Amazon.com then sure, but your average app is just fine on a single VM.


And funnily recently many of the Big Serious Cloud Websites are shitting the bed of availability aggressively.

Marketing has such a gigantic influence in our field. It is absolutely insane. It feels unavoidable, since IT is (was?) constantly filled with new blood that picks up where people left off.

That thought crossed my mind recently as well. Not to mention the huge software stacks and the potential supply chain vulnerabilities that entails.

Well, you used a tank to plow a field then complained about maintenance and fuel usage.

If you have actual need to deploy few dozen services all talking with eachother k8s isn't bad way to do it, it has its problems but it allows your devs to mostly self-service their infrastructure needs vs having to process ticket for each vm and firewall rules they need. That is saying from perspective of migrating from "old way" to 14 node actual hardware k8s cluster.

It does make debugging harder as you pretty much need central logging solution, but at that scale you want central logging solution anyway so it isn't big jump, and developers like it.

Main problem with k8s is frankly nothing technical, just the "ooh shiny" problem developers have where they see tech and want to use tech regardless of anything


I dunno the more people dig into this approach they will probably end up just reinventing Kubernetes.

I use k3s/Rancher with Ansible and use dedicated VMs on various providers. Using Flannel with wireguard connects them all together.

This I think is reasonable solution as the main problem with cloud providers is they are just price gouging.


I always feel like I am taking crazy pills when I read these threads. The k8s API and manifests config feels like a create standardardized way to deploy containers. I wouldn't want to run a k8s cluster from scratch but EKS has been pretty straightforward to work with. Being able to use kind locally for testing is amazing and k9s is my new favourite infra monitoring tool.

Even if you just run on 2 nodes with k3s it seems worth it to me for the standardized tooling. Yes, it is not a $5 a month setup but frankly if what you host can be served by a single $5 a month VM I don't particularly care about your insights, they are irrelevant in a work context.


Not advocating for complexity or k8s, but if your workflow can be served by a single VM, then you are magnitudes away from the volume and complexity that would push you to have k8s setup and there is even no debate of it.

There are situations where a single VM, no matter how powerful is, can do the job.


>> Then before you know it, the devops folks have decided that they need to put a gazillion other services and an entire software-defined networking layer on top of it.

I don't work that closely with k8s, but have toyed with a cluster in my homelab, etc. Way back before it really got going, I observed some OpenStack folks make the jump to k8s.

Knowing what I knew about OpenStack, that gave me an inkling that what you describe would happen and we'd end up in this place where a reasonable thing exists but it has all of this crud layered on top. There are places where k8s makes sense and works well, but the people surrounding any project are the most important factor in the end result.

Today we have an industry around k8s. It keeps a lot of people busy and employed. These same folks will repeat k8s the next time, so the best thing people that who feel they have superior taste is to press forward with their own ideas as the behavior won't change.


I'm very happy with my k8s setup for my small startup. I believe it would have been much harder for me to get it off the ground, manage it etc. without it.

> nuking the cluster, booted up a single VM with debian, enabled the firewall and used Kamal to deploy the app with docker.

Absolutely brilliant. Love it.


> I ended up saying goodbye to those devops folks,

The irony is that "DevOps" was supposed to be a culture and a set of practices, not a job title. The tools that came with it (=Kubernetes) turned out to be so complex that most developers didn't want to deal with them and the DevOps became a siloed role that the movement was trying to eliminate.

That's why I have an ick when someone uses devops as a job title. Just say "System Admin" or "Infrastrcutre Engineer". Admit that you failed to eliminate the siloes.


Yep, "Cloud Infrastructure Engineer" is what I prefer.

I am primarily a backend developer but I do a lot of ops / infra work because nobody else wants to do it. I stay as far away from k8s as possible.


And if you need a cluster, Hashicorp Nomad seems like a more reasonable option than full blown kubernetes. I've never actually used it in prod, only a lab, but I enjoyed it.

We run nomad at work. I’m very happy with it from an administrative standpoint.

> Then before you know it, the devops folks have decided that they need to put a gazillion other services and an entire software-defined networking layer on top of it.

I'm not familiar with kubernetes, but doesn't it already do SDN out of the box?


> doesn't it already do SDN out of the box

Yes and no. Kubernetes defines specification about network behavior (in form of CNI), but it contains no actual implementation. You have to install the network plugin basically as the first setup step.


That is good but at bigger orgs with massive workloads and the teams to build it out k8s makes sense. It is a standard and brilliant tech.

Kubernetes is not bad, it's just low level. Most applications share the exact same needs (proof: you could run any web app on a simple platform like Heroku). That's why some years ago I built an open source tool (with 0 dependencies) that simplify Kubernetes deployments with a compact syntax which works well for 99% of web apps (instead of allowing any configuration, it makes many "opinionated" choices): https://github.com/cuber-cloud/cuber-gem I have been using it for all the company web apps and web services for years and everything works nicely. It can also auto scale easily and that allows us to manage huge spikes of traffic for web push (Pushpad) at a reasonable price (good luck if you used a VM - no scaling - or if you used a PaaS - very high costs).

It's not just low level, in most cases, it's also overkill.

Most companies aren't "web scale" ™ and don't need an orchestrator built for google level elasticity, they need a vm autoscaling group if anything.

Most apps don't need such granular control over fs access, network policies, root access, etc, they need `ufw allow 80 && ufw enable`

Most apps don't need a 15 stage, docker layer caching optimized, archive promotion build pipeline that takes 30 minutes to get a copy change shipped to prod, they need a `git clone me@github.com:me/mine.git release_01 && ln -s release_01 /var/www/me/mine/current`

This is coming from someone who has had roles both as a backend product engineer and as a devops/platform engineer, who has been around long enough to remember "deploy" to prod was eclipse ftping php files straight to the prod server on file save. I manage clusters for a living for companies that went full k8s and never should have gone full k8s. ECS would have worked for 99% of these apps, if they even needed that.

Just like the js ecosystem went bat shit insane until things started to swing back towards sanity and people started to trim the needless bloat, the same is coming or due for the overcomplexity of devops/backend deployments


If this works `git clone me@github.com:me/mine.git release_01 && ln -s release_01 /var/www/me/mine/current` then your Docker builds should also be extremely quick. Where I have seen extremely slow docker builds is with Python services using ML libraries. But those I reallly don't want to be building on the production servers.

"ECS would have worked for 99% of these apps, if they even needed that."

I used to agree with that but is EKS really that much more complicated? Yes you pay for the k8s control plane but you gain tooling that is imho much easier to work with than IaC.


Wait a minute, if this is on AWS then what are we talking about? On-prem k8s sounds fine to me but you don't have the ECS option.

We've reduced our costs on Hetzner to about 10% on what we've paid on Heroku, for 10x performance. Kamal really kicks ass, and you can have a pretty complicated infrastructure up in no time. We're using terraform, ansible + kamal for deploys, no issues whatsoever.

Can you elaborate a bit on what terraform and mandible are doing for you in your setup?

We've configured our Hetzner servers with terraform, so we can easily spin up a new one in case we notice that we need another slave to handle extra work (1-2 mins). Ansible is responsible for configuring the server, installing all the required packages and software (not all our infrastructure is deployed with Kamal, for instance we have clickhouse instances, DBs, redis etc and normal app slaves). TLDR; it helps us have a new instance up an runing in minutes, or recreating our infrastructure for a new client environment

So... if you're at the point where you're using a single VM, I have to ask why bother with docker at all? You're paying a context switch overhead, memory overhead, and disk overhead that you do not need to. Just make an image of the VM in case you need to drop it behind an LB.

There's one extra process that takes up a tiny bit of CPU and memory. For that, you get an immutable host, simple configuration, a minimal SBOM, a distributable set of your dependencies, x-platform for dev, etc.

Yes but NixOS does all of these things already, without the process overhead

Even the minimal SBOM part? It's hard to be more minimal than a busybox binary.

That’s fair, NixOS avoids the direct stuff from Docker itself but if you’re basing on an Alpine image or something that would probably be more minimal / smaller

How is docker a context switch overhead? It's the same processes running on the same kernel.

You're adding all of the other supporting processes within the container that needn't be replicated.

It depends, you could have an application with something like

FROM scratch

COPY my-static-binary /my-static-binary

ENTRYPOINT “/my-static-binary”

Having multiple processes inside one container is a bit of an anti-pattern imo


Sidecars? Not in a simple app.

If you've ever had the displeasure of seeing the sorry state of VM tooling you would have known that building custom VM images is a very complicated endeavour compared to podman build or docker build.

I once tried to build a simple setup using VM images and the complexity exploded to the point where I'm not sure why anyone should bother.

When building a container you can just throw everything into it and keep the mess isolated from other containers. If you use a VM, you can't use the OCI format, you need to build custom packages for the OS in question. The easiest way to build a custom package is to use docker. After that you need to build the VM images which requires a convoluted QEMU and libvirt setup and a distro specific script and a way to integrate your custom packages. Then after all of this is done you still need to test it, which means you need to have a VM and you need to make it set itself up upon booting, meaning you need to learn how to use cloud-init.

Just because something is "mature" doesn't mean it is usable.

The overhead of docker is basically insignificant and imperceptible (especially if you use host networking) compared to the day to day annoyances you've invited into your life by using VM images. Starting a a VM for testing purposes is much slower than starting a container.


This comment chain is probably talking about like aws images, amis, which is just an api call and it snapshots the vm for you. Or use packer

What scale is this story operating at? My experience managing a fleet of services is that my job would take 10x as long without k8s. It's hard, not bad.

My first and really only experience with Kubernetes was a project I did about six years ago. I was tasked with building a thing that did some lightly distributed compute using Python + Dask. I was able to cobble together a functioning (internal) product, and we went to production.

Not long after, I found that the pods were CONSTANTLY getting into some weird state where K8s couldn't rebuild, so I had to forcibly delete the pods and rebuild. I blamed myself, not knowing much about K8s, but it also was extremely frustrating because, as I understood/understand it, the entire purpose of Kubernetes is to ensure a reliable deployment of some combination of pods. If it couldn't do that and instead I had to manually rebuild my cluster, then what was the point?

In the end, I ended up nuking the entire project -- K8s, Docker containers, Python, and Dask -- and instead went with a single Rust binary deployed to an Azure Function. The result was faster (by probably an order of magnitude), less memory, cheaper (maybe -80% cost), and much more reliable (I think around four nines).


Yes, I've had similar experiences. My life has been much easier since I migrated to ECS Fargate - the service just works great. No more 2AM calls (at least not because of infra incidents), no more cost concerns from my boss.

First time I’ve heard of Kamal. Looks ideal!

Do you pair it with some orchestration (to spin up the necessary VM)?


DevOps lost the plot with the Operator model. When it was being widely introduced as THE pattern I was dismayed. These operators abstract entirely complex services like databases behind yaml and custom go services. When going to kubecon i had one guy tell me he collects operators like candy. Answers on Lifecycle management, and inevitable large architectural changes in an ever changing operator landscape was handwaved away with series of staging and development clusters. This adds so much cost.. Fundamentally the issue is the abstractions being too much and entirely on the DevOps side of the "shared responsibility model". Taking an RDBMS from AWS of Azure is so vastly superior to taking all that responsibility yourself in the cluster.. Meanwhile (being a bit of an infrastructure snob) I run Nixos with systemd oci containers at home. With AI this is the easiest to maintain ever.

Those managed databases from the big cloud providers have even more machinery and operator patterns behind them to keep them up and running. The fact that it's hidden away is what you like. So the comparison makes no sense.

I think this comment and replies capture the problem with Kubernetes. Nobody gets fired for choosing Kubernetes now.

It's obvious to you, me and the other 2 presumably techie people who've responded within 15 mins that you shouldn't have been using Kubernetes. But you probably work in a company of full of techie people, who ended up using Kubernetes.

We have HN, an environment full of techie people here who immediately recognise not to use k8s in 99% of cases, yet in actually paid professional environments, in 99% of cases, the same techie people will tolerate, support and converge on the idea they should use k8s.

I feel like there's an element of the emperors new clothes here.


> and an entire software-defined networking layer on top of it.

This is one of the main fuckups of k8s, the networking is batshit.

The other problems is that secrets management is still an afterthought.

The thing that really winds me up is that it doesn't even scale up that much. 2k nodes and it starts to really fall apart.


If you replaced k8s with a single app on a single VM then you’ve taken a hype fuelled circuitous route to where you should have been anyway.

Not so surprised that the architecture approach pushed by cloud vendors are... increasing cloud spend!

Your use case is very small and simple. Of course a single VM works. You’re changing a literal A record at CF to deploy confirms this.

That is not what kube is designed for.


This feels like the microservices versus monolith problem. You can use cloud services or not, and that's orthogonal to running your app in Kubernetes or in a VM.

Similarly, I suspect (based on your "hardening" grievance) that a lot of your tedium is just that cloud APIs generally push you toward least-privileges with IAM, which is tedious but more secure. And if you implement a comparably secure system on your single VM (isolating different processes and ensuring they each have minimal permissions, firewall rules, etc) then you will probably have strictly more incidents and debugging effort. But you could go the other way and make a god role for all of your services to share and you will spend much less time debugging or dealing with incidents.

Even with a single VM, you could throw k3s on it and get many of the benefits of Kubernetes (a single, unified, standardized, extensible control plane that lots of software already supports) rather than having to memorize dozens of different CLI utilities, their configuration file formats, their path preferences, their logging locations, etc. And as a nice bonus, you have a pretty easy path toward high availability if you decide you ever want your software to run when Google decides to upgrade the underlying hardware.


There exists a sweet spot between docker swarm and docker, not quite portainer, but a bit more.

The tools in this space can really help get a few containers in dev/staging/production much more manageable.


And nowadays with Claude you can spin up clusters of vps machines in a few hours. All bare Debian without anything except nginx and the apps. Mass configuring without any tools using only Claude. Works perfectly. The costs saved without all the overhead is massive.

> It always starts off all good with just managing a couple of containers to run your web app. Then before you know it, the devops folks have decided that they need to put a gazillion other services and an entire software-defined networking layer on top of it.

As a devops/cloud engineer coming from a pure sysadmin background (you've got a cluster of n machines running RHEL and that's it) i feel this.

The issues i see however are of different nature:

1. resumeè-driven development (people get higher-paying job if you have the buzzwords in your cv)

2. a general lack of core-linux skills. people don't actually understand how linux and kubernetes work, so they can't build the things they need, so they install off-the-shelf products that do 1000 things including the single one they need.

3. marketing, trendy stuff and FOMO... that tell you that you absolutely can't live without product X or that you must absolutely be doing Y

to give you an example of 3: fluxcd/argocd. they're large and clunky, and we're getting pushed to adopt that for managing the services that we run inside the cluster (not developer workloads, but mostly-static stuff like the LGTM stack and a few more things - core services, basically). they're messy, they add another layer of complexity, other software to run and troubleshoot, more cognitive load.

i'm pushing back on that, and frankly for our needs i'm fairly sure we're better off using terraform to manage kubernetes stuff via the kubernetes and helm provider. i've done some tests and frankly it works beautifully.

it's also the same tool we use to manage infrastructure, so we get to reuse a lot of skills we already have.

also it's fairly easy to inspect... I'm doing some tests using https://pkg.go.dev/github.com/hashicorp/hcl/v2/hclparse and i'm building some internal tooling to do static analysis of our terraform code and automated refactoring.

i still think kubernetes is worth the hassle, though (i mostly run EKS, which by the way has been working very good for me)


Potentially useful context: OP is one of the cofounders of Tailscale.

> Traditional Cloud 1.0 companies sell you a VM with a default of 3000 IOPS, while your laptop has 500k. Getting the defaults right (and the cost of those defaults right) requires careful thinking through the stack.

I wish them a lot of luck! I admire the vision and am definitely a target customer, I'm just afraid this goes the way things always go: start with great ideals, but as success grows, so must profit.

Cloud vendor pricing often isn't based on cost. Some services they lose money on, others they profit heavily from. These things are often carefully chosen: the type of costs that only go up when customers are heavily committed—bandwidth, NAT gateway, etc.

But I'm fairly certain OP knows this.


i was just curious so i tested this actually.

Using fio

Hetzner (cx23, 2vCPU, 4 GB) ~3900 IOPS (read/write) ~15.3 MB/s avg latency ~2.1 ms 99.9th percentile ≈ ~5 ms max ≈ ~7 ms

DigitalOcean (SFO1 / 2 GB RAM / 30 GB Disk) ~3900 IOPS (same!) ~15.7 MB/s (same!) avg latency ~2.1 ms (same!) 99.9th percentile ≈ ~18 ms max ≈ ~85 ms (!!)

using sequential dd

Hetzner: 1.9 GB/s DO: 850 MB/s

Using low end plan on both but this Hetzner is 4 euro and DO instance is $18.


I love Hetzner so much. I'm not affiliated I'm a really happy customer these guys just do everything right.

As long as you never have to interact with them. If you run into issues they have caused themselves, you'll find yourself dealing with a unique mix of arrogance and incompetence.

Are you sure they're not just being German and you are misinterpreting what they are saying?

I've been using Hetzner for ~20 years and every single support interaction I've ever had with them has been top tier. Never AI bots, always humans who are helpful, courteous and prompt. I can't think of a single company, let alone hosting company, whose customer service has been so consistently good.

It certainly helps the service never does anything wonky that requires a support interaction in the first place.

Just for comparison I use the cheapest netcup root server:

RS 1000 G12 AMD EPYC™ 9645 8 GB DDR5 RAM (ECC) 4 dedicated cores 256 GB NVMe

Costs 12,79 €

Results with the follwing command:

fio --name=randreadwrite \ --filename=testfile \ --size=5G \ --bs=4k \ --rw=randrw \ --rwmixread=70 \ --iodepth=32 \ --ioengine=libaio \ --direct=1 \ --numjobs=4 \ --runtime=60 \ --time_based \ --group_reporting

IOPS Read: 70.1k IOPS Write: 30.1k IOPS ~100k IOPS total

Throughput Read: 274 MiB/s Write: 117 MiB/s

Latency Read avg: 1.66 ms, P99.9: 2.61 ms, max 5.644 ms Write avg: 0.39 ms, P99.9: 2.97 ms, max 15.307 ms


That is a bit of a unfair comparison. The Hetzner and DO instances are shared hosting, you are using dedicated ressources.

Using a Netcup VPS 1000 G12 is more comparable.

read: IOPS=18.7k, BW=73.1MiB/s

write: IOPS=8053, BW=31.5MiB/s

Latency Read avg: 5.39 ms, P99.9: 85.4 ms, max 482.6 ms

Write avg: 3.36 ms, P99.9: 86.5 ms, max 488.7 ms


Hetzner has dedicated resources too, but they also have 2 levels of shared resources, "Cost-Optimized" and "Regular Performance". The 3900 IOPS CX23 above is "Cost-Optimized".

Here are some "Regular Performance" shared resource stats

Hetzner CPX11 (Ashburn, 2 CPUs, 2GB, 5.49€ or $6.99/month before VAT)

read: IOPS=36.7k, BW=144MiB/s, avg/p99.9/max 2.4/6.1/19.5ms

write: IOPS=15.8k, BW=61.7MiB/s, avg/p99.9/max 2.4/6.1/18.7ms

Hetzner CPX22 (Helsinki, 2 CPUs, 4GB, 7.99€ or $9.49/month before VAT)

read: IOPS=48.2k, BW=188MiB/s, avg/p99.9/max 1.9/5.7/10.8ms

write: IOPS=20.7k, BW=80.8MiB/s, avg/p99.9/max 1.8/5.8/10.9ms

Hetzner CPX32 (Helsinki, 4 CPUs, 8GB, 13.99€ or $16.49/month before VAT)

read: IOPS=48.3k, BW=189MiB/s, avg/p99.9/max 1.9/6.2/36.1ms

write: IOPS=20.7k, BW=81.0MiB/s, avg/p99.9/max 1.8/6.3/36.1ms


Storage performance is practically always a shared resource, and that's what y'all are talking about here...

Nice, on Hetzner AX41-nvme (~50 eur, from 2020) non-raid I get:

IOPS: read 325k, write 139k

Throughput: read 1271MB/s, write 545MB/s

Latency: read avg 0.3ms, P99.9 2.7ms, max 20ms; write: 0.14ms, P99.9 0.35ms max 3.3ms

so roughly 100 times iops and throughput of the cloud VMs


I've run an Openstack cloud. Local to the host NVME's directly attached to VMs is unbeatable. All clouds offer this. But that storage is ephemeral and it was when I implemented it in Openstack too.

There's not enough redundancy. You could raid1 those NVME's when before they get attached to a VM and that helps with hardware failures, but you get less of them to attach. Even if you RAID them, there's not a good way to move that VM to another host if there's a RAM or CPU or other hardware issue on that host.

These VM's with NVME's directly attached have to basically be treated as bare metal servers and you have to do redundancy at the application layer (like database replication).

But again, all of the major cloud services offer these types of machines if you NEED NVME IO speed. There are quirks though. For example, in Azure it seems like you have to be able to expect the VM to be moved whenever Azure feels like it and expect that ephemeral data to be wiped. Whereas in Openstack, we would do local block level migrations if we HAD to move the VM to another host. That block level migration required the VM to be turned off but it did copy the local NVME data to another host. If this happened it was all planned and the particular application had app level redundancy built in so it was not a problem. If the host crashed, that particular VM would just be down till the host was fixed and came back online.


Many cloud vendors have you pay through the nose for IOPS and bandwidth.

Edit: I posted this before reading, and these two are the same he points out.


Yes, but you can’t directly compare SAN-style storage with a local NVMe. But I agree that it’s too expensive, but not nearly as insane as the bandwidth pricing. If you go to a vendor and ask for a petabyte of storage, and it needs to be fully redundant, and you need the ability to take PIT-consistent multi-volume snapshots, be ready to pay up. And this is what’s being offered here.

And yes, IO typically happens in 4kb blocks, so you need a decent amount of IOPS to get the full bandwidth.


Sure, but a petabyte of block storage with redundancy and PIT backups is a poor abstraction to build on, in large part because it’s not a thing that can be built without either paying an wild amount of money or taking a huge performance hit or both. If you do your PIT recovery at a higher layer, you have to work a bit harder but you get far better cost, perf and recovery.

That latter part is a big deal, too. If I buy 1PB of block storage, I’m decently likely to be running a fancy journaled or WAL-ed or rollback-logged thing on top, and that thing might be completely unable to read from a read only snapshot. So actually reading from a PIT snapshot is a pain regardless of what I paid for it. Even using EBS or similar snapshots is far from being an amazing experience.


>3000 IOPS

If that's true, I wonder if this is a deliberate decision by cloud providers to push users towards microservice architectures with proprietary cloud storage like S3, so you can't do on-machine dbs even for simple servers.


It's probably a combination of high density storage nodes getting I/O bound and SSDs having finite write endurance. Anything that improves the first problem costs them money to improve it and then makes the second problem worse, and the second one costs them money again, so why would they want to make the default something that costs then more twice if most people don't need it?

Instead they make the default "meager IOPS" and then charge more to the people who need more.


I'm not sure about this but I remember that a lot of servers at my old company stuck with hard disks as late as 2018 - exactly for the same reason - HDDs for all their faults dont have write endurance issues. This was quite surprising to me back then.

How often is the storage in cloud providers even local vs how often are laptops doing anything other than raw access to a single local disk with a basic FS?

I remember my worked laptop's IOPS beating a single VM on the first SSD based SAN I deployed as well. Of course, the SAN scaled well beyond it with 1,000 VMs.


> Cloud vendor pricing often isn't based on cost.

Business 101 teaches us that pricing isn't based on cost. Call it top down vs bottom up pricing, but the first principles "it costs me $X to make a widget, so 1.y * $X = sell the product for $Y is not how pricing works in practice.


Just to spell this out more clearly for the back row.of the classroom:

The price is what the customer will pay, regardless of your costs.


Economics teaches us that a big difference between cost and price attracts competition which should make the price trend towards the cost.

Practice taught me that that "should" is doing a lot of heavy lifting here and it's often not the case, even across long time periods (years) that should allow competitors to emerge.

For example I calculated the cost of a solar install to be approximately: Material + Labour + Generous overhead + Very tidy profit = 10,000€

In practice I keep getting offers for ~14,000€, which will be reduced to 10,000€ with a government subsidy and my request for an itemized invoice is always met with radio silence.


Only if the barrier of entry is low.

Which it won't be, if at every turn you choose the hyperscaler.


If this is the case, cheap bandwidth for AWS, when?

A big difference between cost and price is often won at the expense of many years of concerted R&D, though

Economics has a lot of other lessons teaching us why prices of major clouds have remained somewhat expensive relative to cost

Exactly.

That's not a business 101.

> That's not a business 101.

It kinda is, but obscured by GP's formula.

More simply; if it costs you $X to produce a product and the market is willing to pay $Y (which has no relation to $X), why would you price it as a function of $X?

If it costs me $10 to make a widget and the market is happy to pay $100, why would I base my pricing on $10 * 1.$MARGIN?


Exactly. The mechanism by which the price ends up as X plus margin is just competition. Others enter the market and compete with you until the returns are driven down to the rental rate of capital. Any barriers to entry result in higher margins.

But that is an equilibrium result, and famously does not apply to monopolies, where elasticity of substitution will determine the premium over the rental rate of capital.


There's a common conversation that goes on around AI: some people swear its a complete waste of time and total boondoggle, some that its a good tool when used correctly, and others that its the future and nothing else matters.

I see the same thing happen with Kubernetes. I've run clusters from various sizes for about half a decade now. I've never once had an incident that wasn't caused by the product itself. I recall one particular incident where we had a complete blackout for about an hour. The people predisposed to hating Kubernetes did everything they could to blame it all on that "shitty k8s system." Turns out the service in question simply DOS'd by opening up tens of thousands of ports in a matter of seconds when a particular scenario occurred.

I'm neither in the k8s is the future nor k8s is total trash. It's a good system for when you genuinely need it. I've never understand the other two sides of the equation.


The complaints I see about Kubernetes are typically more about one of two things: (a) this looks complex to learn, and I don't have a need for it - existing deployment patterns solve my use case, or (b) Kubernetes is much less inefficient than running software on bare-metal (energy or cost.)

Usually they go hand in hand.


Which is an interesting perspective, considering I've led a platform based on Kubernetes running on company-owned bare-metal. I was actually hired because developers were basically revolting at leaving the cloud because of all the "niceties" they add (in exchange for that hefty cloud tax) which essentially go away on bare-metal. The existing DevOps team was baffled why the developers didn't like when they were handed a plain Ubuntu VM and told to deploy their stack on it.

By the time I left, the developers didn't really know anything about how the underlying infrastructure worked. They wrote their Dockerfiles, a tiny little file to declare their deployment needs, and then they opened a platform webpage to watch the full lifecycle.

If you're a single service shop, then yeah, put Docker Compose on it and run an Ansible playbook via GitHub Actions. Done. But for a larger org moving off cloud to bare-metal, I really couldn't see not having k8s there to help buffer some of the pain.


For many shops, even Docker Compose is not necessary. It is still possible to deploy software directly on a VM/LXC container.

I agree that Kubernetes can help simplify the deployment model for large organizations with a mature DevOps team. It is also a model that many organizations share, and so you can hire for talent already familiar with it. But it's not the only viable deployment model, and it's very possible to build a deployment system that behaves similarly without bringing in Kubernetes. Yes, including automatic preview deployments. This doesn't mean I'm provided a VM and told to figure it out. There are still paved-path deployment patterns.

As a developer, I do need to understand the environment my code runs in, whether it is bare-metal, Kubernetes, Docker Swarm, or a single-node Docker host. It impacts how config is deployed and how services communicate with each other. The fact that developers wrote Dockerfiles is proof that they needed to understand the environment. This is purely a tradeoff (abstracting one system, but now you need to learn a new one.)


It can be inefficient because controllers (typically ~40 per cluster) can maintain big caches of resource metadata, and kubelet and kube-proxy usually operate pretty tight while-loops. But such things can be tuned and I don't really consider those issues. The main issue I've actually encountered is that etcd doesn't scale

> (b) Kubernetes is much less inefficient than running software on bare-metal (energy or cost.)

You surely meant "much less efficient than"


I did, thanks for the correction.

There also seems to be confusion about what I meant by "bare-metal." I wasn't intending to refer to the server ownership model, but rather the deployment model where you deploy software directly onto an operating system.


The funniest thing is that kubernetes was designed for bare metal running, not cloud...

Yeah if someone says that k8s is costing them energy they are either using it very, very incorrectly, or they just don't know what they are talking about.

Running a Kubernetes deployment requires running many additional orchestration services that bare-metal deployments (whether running on-prem or in the cloud) do not.

"bare metal" "cloud" - pick one.

Also, those simpler deployments usually burn more money per utilized compute, or involve reinventing 80% of kids, often badly


Everything is about trading convenience for knowledge/know how.

It's up to the individual to choose how much knowledge they want to trade away for convenience. All the containers are just forms of that trade.


Seems like this can be applied to an increasingly large pool of subjects, where things are polarized by default and having a moderate/indifferent opinion is unusual. For example, I thought of US politics while reading your comment

Good insight. It's always easy to blame that which you don't understand. I know nothing about k8s, and my eyes kinda glaze over when our staff engineer talks about pods and clusters. But it works for our team, even if not everyone understands it.

When all you have is a hammer, every problem starts to look like a nail. And the people with axes are wondering how (or indeed even why) so many people are trying to chop wood with a hammer. Further, some axewielders are wondering why they are losing their jobs to people with hammers when an axe is the right tool for the job. Easy to hate the hammer in this case.


Yeah, I would attribute that to tribalism. There's an intense amount of dogma in the Kubernetes community, likely stemming from the billions of dollars that get fed into the ecosystem by Big Tech. I genuinely think people adopt it as part of their identity and then become hostile to anyone who "doesn't understand the excellence of Kubernetes." I only say this because I've had many lunch time conversations with random strangers at the various KubeCon conferences I've attended - and let's just say some were pretty eye opening.

I would also say that a lot of people, even people who are professional k8s operators, don't understand enough of the "theory" behind it. The "why and how", to put it shortly.

And the end result is often that you have two tribes that have totally incorrect idea of even what tools they are using themselves and how, and it's like you swapped them an intentionally wrong dictionary like in a Monthy Python sketch.


At the end of the day it's all different levels of abstractions and whether or not you're using the abstraction correctly. With k8s, the best practices are mostly set in a lot of use cases. For LLMs, we still have no idea what the best practices are.

Funnily enough the post isn't shitting on k8s, it's shitting on cloud and that k8s (lipstick) can't fix the pig (cloud)

That part was really surprising to me because for the kind of compute lake he’s talking about building, k8s seems like a pretty good fit for the layer that sits just above it.

We run k8s with several VMs in a couple different cloud providers. I’d love it if I could forget about the VMs entirely.

Is there a simpler thing than k8s that gets you all that? Probably. But if you don’t use k8s, aren’t you doomed to reimplement half of it?

Like these things:

- Service discovery or ingress/routing (“what port was the auth service deployed on again?”)

- Declarative configuration across the board, including for scale-out

- Each service gets its own service account for interacting with external systems

- Blue/green deployments, readiness checks, health checks

- Strong auditing of what was deployed and mutated, when, and by whom


Yeah, I meant to respond to another thread (the top one currently) that was talking more specifically about k8s-hate.

The point about VMs being the wrong shape because they’re tied to CPU/memory resonates hard. The abstraction forces you to pay for time, not work.

I ended up buying a cheap auctioned Hetzner server and using my self-hostable Firecracker orchestrator on top of it (https://github.com/sahil-shubham/bhatti, https://bhatti.sh) specifically because I wanted the thing he’s describing — buy some hardware, carve it into as many VMs as I want, and not think about provisioning or their lifecycle. Idle VMs snapshot to disk and free all RAM automatically. The hardware is mine, the VMs are disposable, and idle costs nothing.

The thing that, although obvious, surprised me most is that once you have memory-state snapshots, everything becomes resumable. I make a browser sandbox, get Chromium to a logged-in state, snapshot it, and resume copies of that session on demand. My agents work inside sandboxes, I run docker compose in them for preview environments, and when nothing’s active the server is basically idle. One $100/month box does all of it.


Vms hosted in Hetzner auction instances is exactly how shellbox works. I published more details in here: https://shellbox.dev/blog/race-to-the-bottom.html

This is pretty cool, I turned a NUC at home into this, and would probably rather use you guys instead. However, is there a way for me to keep a session open without being connected? Sometimes I want the session to be there so I can connect/disconnect to check up on it, so I want "just disconnecting for a bit" to be different from "I don't care about this any more, destroy it".

At home, I've done that with a Zellij session (everything is tied to the session, and quitting Zellij completely means "I'm done with this". Merely disconnecting keeps it running).


Love the website and how you have implemented payments! Also giving each box an email is a nice touch.

Thank you for sharing!


This looks interesting at first blush.

My only feedback so far is that a lot of the documentation, though thorough and useful, looks clearly AI-written. That's not bad in and of itself, but it could be more concise. I especially love the "design decisions" section as I learned something new already.

Have you posted it on "Show HN" already? If not, you should.


Thank you for the feedback! I appreciate it, looking forward to you trying it out and logging any issues.

I am aware of the documentation, it’s what I have been focusing on before I can post on HN. I want to make it a delight to read for other people!

As for the design decisions, I have tried keeping all the plans I made in the repo too. I wouldn’t have been able to make bhatti in a month without LLMs.


> My agents work inside sandboxes

Out of interest, what sandboxing solution do you use?


Not sure what you mean. I use the above linked personal project, bhatti, which internally uses Firecracker microVMs.

Ah, for some reason I didn't make the connection between your VM setup and your agent sandbox setup and thought those were two separate things. Sorry about that!

No worries! I too have been using machines/sandboxes/VMs/microVMs very interchangeably these days. Former is understood by a broader group, but latter is more precise. Trying to find a balance.

OT - but Bhatti looks really cool! Well done!

Thank you :)

Bhatti is a great name!

> Agents, by making it easiest to write code, means there will be a lot more software. Economists would call this an instance of Jevons paradox. Each of us will write more programs, for fun and for work.

There is already so much software out there, which isn't used by anyone. Just take a look at any appstore. I don't understand why we are so obsessed with cranking out even more, whereas the obvious usecase for LLMs should be to write better software. Let's hope the focus shifts from code generation to something else. There are many ways LLMs can assist in writing better code.


I think we, as engineers, are a bit stuck on what “software” has traditionally been. We think of systems that we carefully build, maintain, and update. Deterministic systems for interacting with computers. I think these “traditional” systems will still be around. But AI has already changed the way users interact with computers. This new interaction will give rise to another type of software. A more disposable type of software.

I believe right now we are still in the phase of “how can AI help engineers write better software”, but are slowly shifting to “how can engineers help AI write better software.” This will bring in a new herd of engineers with completely different views on what software is, and how to best go about building computer interactions.


Sometimes “better” means “customized for my specific use case.” I expect that there will be a lot of custom software that never appears in any app store.

The amount of single purpose scripts in my ~/playground/ folder has increased dramatically over the past year. Super useful, wouldn’t have had the time for it otherwise, but not in any way shareable. Eg “parse this excel sheet I got from my obscure bank and upload it to my budgeting app’s REST API”. Wouldn’t have had the time or energy to do this before, now I have it and it scratches an itch.

This. Just today I added a full on shopping list system to our internal dashboard at work (small business) simply because it was slightly annoying and could be solved in 3 prompts and 15 minutes.

If we take it a step further, in a few years, why would anyone purchase generic software anymore? If we can perfectly customise software for our needs and preferences for almost free, why would anyone purchase generic software from an App Store? I genuinely think Apple's business model is in jeopardy.

Most apps aren’t standalone and the services they depend on are nontrivial to build. For example, maybe you could vibe code a guitar tuner app, but not a ride share app.

I agree. The services which will be left standing will be those with a competitive moat: critical mass (Tinder, Facebook), content (YouTube, AppleTV), and scale (frontier AI models requiring expensive hardware), etc.

That said, if you look at the apps on your phone, I wager a large proportion don't have these moats. Translation, passwords, budget, reminders, email, to do, project management, messaging, browser, calendar, fitness, games, game tracking, etc.


Customization often turns out to be a long term liability. Funnily enough, my employer learned this 20 years ago with our ERP and we are still paying the price.

That's not what Jevons paradox means though. He's just name dropping some concept.

Jevons paradox would be if despite software becoming cheaper to produce the total spend on producing software would increase because the increase in production outruns the savings

Jevons paradox applies when demand is very elastic, i.e. small changes in price cause large changes in quantity demanded. It's a property of the market.


> Agents, by making it easiest to write code, means there will be a lot more software.

He's saying that agents make code much cheaper, therefore there will be a large increase in demand for code. This appears to be exactly what you're describing.


> I don't understand why we are so obsessed with cranking out even more... the obvious usecase for LLMs should be to write better software

I honestly think this is ideal. Video games aside, I think one day we'll look back and realize just how insane it was that we built software for millions or even billions of users to use. People can now finally build the software that does exactly what they've wanted their software to do without competing priorities and misaligned revenue models working against them. One could argue this kind of software, by definition, is higher quality.


I don't think this will be true for average consumers. Perhaps for nerds like us, who enjoy a bit of tinkering and can put up with weird behaviors. I mean, are you envisioning that everyone would have their own custom messaging app, for example? Or email? Or banking app? I mean, I think most people's demands for those things are all extremely homogenous. I want messages to arrive, I want emails to get spam filtered a little but not too much, and I want my bank to only allow me to log in and see my balances, etc.

I could see maybe more customization of said software, but not totally fresh. I do agree that people will invent more one-off throwaway software, though.


I think you’re glossing over a lot of use cases. For example, I want my email’s spam controls much tighter.

maybe it will be something like excel where people have their custom workflows

> Let's hope the focus shifts from code generation to something else. There are many ways LLMs can assist in writing better code.

My view is actually the opposite. Software now belongs to cattle, not pet. We should use one-offs. We should use micro-scale snippets. Speaking language should be equivalent to programming. (I know, it's a bit of pipe dream)

In that sense, exe.dev (and tailscale) is a bit like pet-driven projects.


The most recent software paradigm has been SaaS - software as a service. Capex is distributed among all customers and opex is paid for through the subscription. This avoids the large upfront capex and provides easy cost and revenue projections for both sides of the transaction. The key to SaaS is that the software is maximally generic. Meaning is works well for the largest number of people. This necessitates making tough cuts on UX and functionality when they only benefit small parts of the userbase.

Vibe coding or LLM accelerated development is going to turn this on its head. Everyone will be able to afford custom software to fit their specific needs and preferences. Where Salesforce currently has 150,000 customers, imagine 150,000 customers all using their own customised CRM. The scope for software expansion is unbelievably large right now.


SaaS is not a new idea and has been renamed multiple times.

In the 70s, it was called "time-sharing". Instead of buying a mainframe, you got a CICS application instance on a mainframe and used that. (tangentially, spare time on these built-out nation-wide dialup-supported networks is what gave birth to CompuServe and GEnie).

In the dot-com era, it was called "application service providers". Salesforce and actually started in this era (1999). So did NetSuite. This was the first attempt to be browser-based but bandwidth and browsers sucked then.

I think PaaS is a more recent software paradigm, albeit a far less successful one.


Both will likely happen to some degree.

As for the average quality: it’s unclear.

My intuition is that agents lift up the floor to some degree, but at the same time will lead to more software being produced that’s of mediocre quality, with outliers of higher quality emerging at a higher rate than before.


Big agree. I would love the focus to be on contributing, improving, and consolidating around existing open-source solutions. Unfortunately, most AI-enabled contributions have been slop and the maintenance burden of open source has increased

Alas, we shifted from quality to quantity somewhere in the mid 19th century.

Humans have been making quality versus quantity decisions since the time we first grew these big giant brains of ours a million or two years ago, maybe longer.

If you wanted to, you could make an argument about the principal-agent problem - that as hunter-gatherers or subsistence, farmers, our quality versus quantity decisions only affected us, whereas in a market economy, you could argue that one person’s quality versus quantity decision affects someone else.

But dismantling capitalism will not solve this problem. It just moves the decision-making to a different group of people. Those people will face the same trade-offs and the same incentives. After the Revolution, even the most loyal comrade will have to contend with the fact that they can choose to provide the honourable working class with more of a thing if they drop the quality.


For software?


What does that have to do with the mid 19th century?

In the California gold rush, the people who got rich were the ones selling shovelware.

There will be only 1 Microsoft® Excel, 1 Google Sheets and 1 LibreOffice and the rest are billions of dead vibe-coded "Excel killers" that no-one uses.

Democratization of software through SaaS & new engineers brought Airtable, Smartsheet, Baserow, Monday, and many more that I can't remember though.

Except that list originally had one item, and that item was Visicalc. Times change, but that list is going to stop being relevant before Excel gets knocked off the list.

If you're doing anything complicated, Excel just doesn't make sense anymore. it'll still the be data exchange format (at least, something more advanced than csv), but it's no longer the only frontend.

"No one uses" is no longer the insult it once was. I don't need or want to make software for every last person on the world to use. I have a very very small list of users (aka me) that I serve very well with most of the software that I generate these days outside of work.


> "No one uses" is no longer the insult it once was.

It certainly is for lots of businesses, otherwise they go out of business.

There is something called 'revenue' which they need to make from customers which are their 'users', and that revenue pays for the 'operating costs' which includes payroll, office rent, infrastructure etc.

This just means that it is important than ever to know what to build just as how it is built. It is unrealistic for a business to disregard that and to build anything they want and end up with zero users.

No users, No revenue. No revenue, No business.


Yes, and most applications still have GUIs, where we could be just talking to an LLM instead.

Numerous people are denigrating DevOps people - resume padding, over-complexity, etc.

I think that's startup-thinking, at least in my experience. Maybe in a small company the DevOps guy does all infra.

In my experience, especially in financial services, who runs the show are platform engineering MDs - these people want maximum control for their software engineers, who they split up into a thousand little groups who all want to manage their own repos, their own deployments, their own everything. It's believed that microservices gives them that power.

I guarantee you devops people hate complexity, they're the ones getting called at night and on the weekend, because it's supposedly always an "infrastructure issue" until proven otherwise.

Also the deployment logs end up in a log aggregation system, and god forbid software developers troubleshoot their own deployments by checking logs. It's an Incident.

Are microservices a past fad yet?


Nice post. exe.dev is a cool service that I enjoyed.

I agree there is opportunity in making LLM development flows smooth, paired with the flexibility of root-on-a-Linux-machine.

> Time and again I have said “this is the one” only to be betrayed by some half-assed, half-implemented, or half-thought-through abstraction. No thank you.

The irony is that this is my experience of Tailscale.

Finally, networking made easy. Oh god, why is my battery doing so poorly. Oh god, it's modified my firewall rules in a way that's incompatible with some other tool, and the bug tracker is silent. Now I have to understand their implementation, oh dear.

No thank you.


> No thank you.

I hope this wasn't interpreted towards exe.dev. That really is a cool service!


I find it difficult to configure Tailscale for my use case because they seem to completely not support making ACL rules based on the identity of the device rather than a part of the address space. I'm not configuring a router here, I'm configuring a peer-to-peer networking layer... or at least I'm supposed to be...

I remember from the docs you can use node names. At the very least you can use tags for sure. Assign tags to nodes and define the ACL based on those.

Last I read the docs while troubleshooting this very problem, you cannot specify node names as the source or destination of a grant. You can specify direct IP address ranges, node groups (including autogenerated ones) or tags, but not names.

Tags permanently erase the user identity from a device, and disable things like Taildrop. When I tried to assign a tag for ACLs, I found that I then could not remove it and had to endure a very laborous process to re-register a Tailscale device that I added to Tailscale for the express purpose of remotely accessing


You can ack based on groups, and you can out users into groups. So if you auth a node, it’s now your node and the ACL for your user / group will apply.

But yes I don’t think you can ACL based o the hostname


Hi there, I work at Tailscale.

Part of the reason that we don't (currently) let you do this is that a hostname is a user-reported field, and can change over time; it's not a durable form of identity that you can write ACLs on. One could imagine, for example:

1. Creating an ACL rule that allows hostname "webserver" to hostname "db".

2. (time passes)

3. Hostname "webserver" is deleted/changed to "web"/etc.

4. Someone can now register a user device with the system hostname set to "webserver"

Should they be allowed to inherit the pre-existing ACL rule?

However, you can accomplish something very close to what you're asking for, I think, by defining a "host" in the policy file (https://tailscale.com/docs/reference/syntax/policy-file#host...) that points to a single Tailscale IP. Since we don't allow non-admins to change their Tailscale IP, this uniquely identifies a single device even if the hostname changes, and thus you can write a policy similar to:

  "hosts": {
    "myhost": "100.64.1.2",
  },
  "grants": [
    {
      "src": ["myhost"],
      "dst": ["tag:db"],
    },
  ]

> because they seem to completely not support making ACL rules based on the identity of the device rather than a part of the address space

Could you rephrase that / elaborate on that? Isn't Tailscale's selling point precisely that they do identity-based networking?

EDIT: Never mind, now I see the sibling comment to which you also responded – I should have reloaded the page. Let's continue there!


i just use Hetzner.

Everything which cloud companies provide just cost so much, my own postgres running with HA setup and backup cost me 1/10th the price of RDS or CloudSQL service running in production over 10 years with no downtime.

i directly autoscales instances off of the Metrics harvested from graphana it works fine for us, we've autoscaler configured via webhooks. Very simple and never failed us.

i don't know why would i even ever use GCP or AWS anymore.

All my services are fully HA and backup works like charm everyday.


I founded a hosting company 25 years ago when User-Mode Linux was the hot new virtualisation tech. We aspired to just replicate the dedicated server experience because that was obviously how you deploy services with the most flexibility, and UML made it so cheap! Through the 2010s I (extremely wrongly) assumed that being metered on each little part of their stack was not something most developers would choose, for the sake of a little convenience.

Does a regular 20-something software engineer still know how to turn some eBay servers & routers into a platform for hosting a high-traffic web application? Because that is still a thing you can do! (I've done it last year to make a 50PiB+ data store). I'm genuinely curious how popular it is for medium-to-big projects.

And Hetzner gives you almost all of that economic upside while taking away much of the physical hassle! Why are they not kings of the hosting world, rather than turning over a modest €367M (2021).

I find it hard to believe that the knowledge to manage a bunch of dedicated servers is that arcane that people wouldn't choose it for this kind of gigantic saving.


> I find it hard to believe that the knowledge to manage a bunch of dedicated servers is that arcane that people wouldn't choose it for this kind of gigantic saving.

Managing servers is fine. Managing servers well is hard for the average person. Many hand-rolled hosting setups I've encountered includes fun gems such as:

- undocumented config drift.

- one unit of availability (downtime required for offline upgrades, resizing or maintenance)

- very out of date OS/libraries (usually due to the first two issues)

- generally awful security configurations. The easiest configuration being open ports for SSH and/or database connections, which probably have passwords (if they didn't you'd immediately be pwned)

Cloud architecture might be annoying and complex for many use-cases, but if you've ever been the person who had to pick up someone else's "pet" and start making changes or just maintaining it you'll know why the it can be nice to have cloud arch put some of their constraints on how infra is provisioned and be willing to pay for it.


> And Hetzner gives you almost all of that economic upside while taking away much of the physical hassle! Why are they not kings of the hosting world, rather than turning over a modest €367M (2021).

Hetzner is an oldschool German company, it is not surprising to see them act this way. They are very profitable (165M Euro in 2024) and have very little debt. They also seem to be mostly bootstrapped and are not VC funded

https://www.northdata.com/Hetzner%20Online%20GmbH,%20Gunzenh...


Companies buy cloud services because they want to reduce in-house server management and operations, for them it's a trade-off with hiring the right people. But you are right, when you can find the right people doing it yourself can be a lot cheaper.

In some sense I'm starting to think it has more to do with accounting. Hardware, datacenters and software licenses (unless it's a subscription, which is probably is these days) are capital expenses, cloud is an operation expense. Management in a lot of companies hates capital expenditures, presumable because it forces long term thinking, i.e. three to five years for server hardware. Better to go the cloud route and have "room for manoeuvrability". I worked for a company that would hire consultants, because "you can fire those at two weeks notice, with no severance". Sure, but they've been here for five years now, at twice the cost of actual staff. Companies like that also loves the cloud.

Whether or not cloud is viable for a company is very individual. It's very hard to pin point a size or a use case that will always make cloud the "correct" choice.


Another point (but my common observation) is the responsibility. By going SaaS or using cloud - any kind of data protection, rules/responsibility etc is moved away. and in many ways it is better - Google, dropbox or Onedrive will have better PR to take the pain if something goes crazy. Tickbox compliance is easy.

Something I know nothing about is whether the depreciation on server hardware outpaces the value it creates for a business, creating a tax incentive to own your own metal.

Right... That's why the hire "AWS Certified specialist ninja"

I get the feeling that with LLMs in the mix, in-house server management can do a lot more than it used to.

The internet of 20 years ago was awash with info for running dedicated servers, fragmented and badly-written in places but it was all there. I can absolutely believe LLMs would enable more people to find that knowledge more easily.

Perhaps it saves some time looking through the docs, but do you really trust an LLM to do the actual work?

Yes and an LLM checks it as well. I am yet to find a sysadmin task that an LLM couldn't solve neatly.

A nice bonus is that sysadmin tasks tend to be light in terms of token usage, that’s very convenient given the increasingly strict usage limits these days.

Yes, with a lot of reviewing what its doing/asking questions, 100%

By this point? Absolutely. They still get stuck in rabbit holes and go down the wrong path sometimes, so it's not fully fire and forget, but if you aren't taking advantage of LLMs to perform generic sysadmin drudgery, you're wasting your time that could be better spent elsewhere.

Also using Hetzner.

But I came across Mythic Beasts (https://www.mythic-beasts.com/) yesterday, similar idea, UK based. Not used them yet but made the account for the next VPS.


This is way way more expensive than hetzner. Not even comparable?

Agree, I used to always use Heroku or Render style platforms for my own software, but nowadays I just have a Linux server with Docker Compose and a Cron job. The cron job every minute runs docker pull (downloads latest image) and docker up -d (switches to new version only if there is a new version). And put caddy in front for the HTTPS. This has been very cheap and reliable for years now.

What images are you running that you'd need the latest version up after just a minute?

I'm not the OP but I'd clarify the cron check for new versions is done every minute. So when new images are pushed they're picked up quickly.

OP is not saying they push new versions at such a high frequency they need checks every one minute.

The choice of one minute vs 15 minute is implementation detail and when architected like this costs nothing.

I hope that helps. Again this is my own take.


When I push new images via CI, I want it to go in production immediately. Like Heroku/Render/Dokku

One annoyance (I don't know if they've since fixed it) was that Docker Hub would count pulls that don't contain an update towards the rate limit. That ultimately prompted me to switch to alternate repositories.

one way is to host a manifest file (can host one on r2) and update it on each deploy and when manifest changes, new container image is pulled.

Especially these days you can SSH to a baremetal server and just tell Claude to set up Postgres. Job done. You don't need autoscaling because you can afford a server that's 5X faster from the start.

You just use docker.

It is like 4 lines of config for Postgres, the only line you need to change is on which path Postgres should store the data.


You also probably want the Postgres storage on a different (set) of disks.

Maybe change the filesystem?


I find it interesting that Hetzner was never a consideration, until... LLMs started recommending them.

Hetzner was raved about before AI was cool. I know since based on those good reviews I moved half of my apps from DigitalOcean to Hetzner. My DigitalOcean droplet was lacking in RAM and it was more expensive for me to grow it than move some stuff to another small VPS on Hetzner.

Do you run containers? What orchestrator or deploy tool do you use?

Honestly I like Hetzner a lot but lately it has been very unstable for us. https://status.hetzner.com/ this page always has couple of incidents happening at the same time. I really appreciate the services they provide but i wish they were more stable.

There are several things going on even now, 1 hour after your comment. But I appreciate that they list them. That hopefully means that they have a good culture of honesty, and they can improve.

I looked through the issues and basically only ongoing thing is that backup power is not working in one of the data centers (could be a problem). The rest are warnings about planned shutdown of some services and speed limitation of object storage in one location.

I am sure it's luck but we have few hetzner VPSes in both German locations and in last 5 years afaik they've never been down. On our http monitor service they have 100s of days uptime only because we restarted them ourselves.


we've done both. Hetzner dedicated was genuinely fine, until a disk started throwing SMART warnings on a Sunday morning and we remembered why we pay 10x elsewhere for some things. probably less about the raw cost and more about which weekends you want back.

Well, you gotta take all that into consideration before your build out.

You can use block storage if data matters to you.

Many services do not need to care about data reliability or can use multiple nodes, network storage or many other HA setups.


Isn't this nature of every dedicated server? You also take on the hardware management burden - that's why they can be insanely cheap.

But there is middleground in form of VPS, where hardware is managed by the provider. It's still way way cheaper than some cloud magic service.


VPS comes at the cost of potential for oversubscription - even from more reputable vendors. You never really know if you're actually getting what you're paying for.

They also offer dedicated VPS with guaranteed resource allocation.

Because if I have a government service with millions of users, I don’t want the cheap shitter servers to crap out on me.

An employee is going to cost anywhere between 8k and 50k per month. Hiring an employee to save 200/month on servers by using a shitty VPS provider is not saving you any money.


If you have millions of users, you absolutely need to have someone whose whole job is managing infrastructure. Expecting servers or cloud services to not crap out on you without someone with the skills and time to keep things running seems foolish.

There are plenty of alternatives out there. I built https://shellbox.dev, which gives you instant vms via ssh where unlike exe you pay only for what you use-- scale to zero. It is also regular linux, supporting vscode and zed remote, Nested virtualization, etc.

If you're looking to invest im fine with only $5M :)


The other day I vibed a very stable codeserver (vscode in browser) instance with zellij browser mode (console in browser), syncthing (filesyncing), ssh, pi agent and wireguard. No exposed ports, every web frontend is password secured.

I don't want to make that public, it's my way of an isolated dev environment and it runs on my private raspberry behind my tv. Costs me nothing.

I hope you have a good success with your service.


Neat service. Website doesn't provide enough information for me to trust any workloads to it. Not clear where the underlying infrastructure is, what security guarantees I get, etc.

Thanks for the feedback. Ill improve it. In the meanwhile, there are more technical details on this blog post: https://shellbox.dev/blog/race-to-the-bottom.html

This is the problem for me with the cloud:

> Finally, clouds have painful APIs. This is where projects like K8S come in, papering over the pain so engineers suffer a bit less from using the cloud. But VMs are hard with Kubernetes because the cloud makes you do it all yourself with lumpy nested virtualization. Disk is hard because back when they were designing K8S Google didn’t really even do usable remote block devices, and even if you can find a common pattern among clouds today to paper over, it will be slow. Networking is hard because if it were easy you would private link in a few systems from a neighboring open DC and drop a zero from your cloud spend. It is tempting to dismiss Kubernetes as a scam, artificial make work designed to avoid doing real product work, but the truth is worse: it is a product attempting to solve an impossible problem: make clouds portable and usable. It cannot be done.

Please learn from Unix's mistakes. Learn from Nix. Support create-before-destroy patterns everywhere. Forego all global namespaces you can. Support rollbacks everywhere.

If any cloud provider can do that, cloud IaC will finally stop feeling so fake/empty compared to a sane system like NixOS.


Virtual machines are the wrong abstraction. Anyone who has worked with startups knows that average developers cannot produce secure code. If average developers are incapable of producing secure code, why would average non-technical vibe-coders be able to? They don't know what questions to ask. There's no way vibe coders can produce secure backend software with or without AI. The average software that AI is trained on is insecure. If the LLM sees a massive pile of fugly vibe-coded spaghetti and you tell it "Make it secure please", it will turn into a game of Whac-a-Mole. Patch a vulnerability and two new ones appear. IMO, the right solution is to not allow vibe-coders to access the backend. It is beyond their capabilities to keep it secure, reliable and scalable, so don't make it their responsibility. I refuse to operate a platform where a non-technical user is "empowered" to build their own backend from scratch. It's too easy to blame the user for building insecure software. But IMO, as a platform provider, if you know that your target users don't have the capability to produce secure software, it's your fault; you're selling them footguns.

Shameless plug: https://clawk.work/

`ssh you/repo/branch@box.clawk.work` → jump directly into Claude Code (or Codex) with your repo cloned and credentials injected. Firecracker VMs, 19€/mo.

POC, please be kind.


I’m curious about it do you have a page with more details on specs configs and what else goes on in there?

This looks nice, when did you launch this? Do you have validation / paying users?

honestly sounds interesting

at 19€/mo are you subsidizing it given the sharp rise of LLM costs lately?

or are you heavily restricting model access. surely there is no Opus?


The 19€/mo is infra only. Claude Code inside the VM signs in via OAuth to the user's own Anthropic account. I'd love to explore bundling open models (Qwen, etc..) into the subscription down the line, but that needs product validation first, not going to ship something I'm not sure people actually want.

I'm not sure if this is the direction the OP is going, but I would love to see a world where local small-time investors can get a bank loan, rent a facility, set up a bunch of computers, and run open-source cloud software on them that provides 95% of the features that most businesses need.

Running a cloud data center could be a business like operating a self-storage facility or a car wash. Small investors love this kind of operation.


These are called co-los (co-location facilities). Probably any medium to large city has a few.

That's insane funding so congrats.

Just shows I'm the Dropbox commentator. I have what exe provides on my own and am shocked by the value these abstractions provide everyone else!! One off containers on my own hardware spin up spin down run async agents, etc, tailscale auth, team can share or connect easily by name.


Investment is done by relationships, belief in a future vision and team, and growth metrics like number of paying customers.

The technology itself in its current form is not valuable


Sobering comment for all the little people like myself who dream of owning a business based on a vision of cool tech that just does what it promises (as opposed to all the corporate shovelware out there)

Author here.

Almost every VC rejected us when we went to get seed funding for Tailscale, we knew none of them. Friends of friends of acquaintances got us meetings. Fundraising is very possible for you if you are committed to building a business. Most important thing is don't think of fundraising as the goal, it is just a tool for building a business. (And some businesses don't need VC funding to work. Some do.)

The biggest challenge is personal: do you want to build a business or do you want to work with cool tech? Sometimes those goals are aligned, but usually they are not. Threading the needle and doing both is difficult, and you always have to prioritize the business because you have to make payroll.


How did you eventually get funding after those initial rejections? What changed?

One VC (well, two) understood it. Despite what you hear, there is a lot of variation. Speak to a lot of people.

You can still do that. Not every business needs to be a hyperscaling startup.

Agreed! Over at https://pico.sh we are chugging along and having a blast. Profitable but at a scale that is manageable for us. Cheers

I really like exe.dev's pricing model where I pay a fixed monthly fee for compute and then can split it up into as many VMs as I want. I use exe.dev to run little vibe-coded apps and it's nice to just leave them running without a spend meter ticking up.

We're thinking about switching to this pricing model for our own startup[1] (we run sandboxed coding agents for dev teams). We run on Daytona right now for sandboxes. Sometimes I spin up a sandboxed agent to make changes to an app, and then I leave it running so my teammate can poke around and test the running app in the VM, but each second it's running we (and our users) incur costs.

We can either build a bunch of complicated tech to hibernate running sandboxes (there's a lot of tricky edge cases for detecting when a sandbox is active vs. should be hibernated) or we can just provision fixed blocks of compute. I think I prefer the latter.

[1] https://github.com/gofixpoint/amika


Europe is crying out for sovereign clouds. If this is to be a viable alt cloud, US jurisdiction is a no.

Not sure we can move away from cpu/memory/io budgeting towards total metal saturation because code isn't what it used to be because no one handles malloc failure any more, we just crash OOM


Europe is already moving into the EU cloud. Hetzner, OGH Cloud and so on as well as local data centers where partner companies set up own cloud with various things to rival office 365. So far it's mainly the public sector. My own city cut their IT budget by 70% by switching from Microsoft.

The key point is the partner companies. Almost nobody is actually running their own clouds the way they would with various 365 products, AWS or Azure. They buy the cloud from partners, similar to how they used to (and still do) buy solutions from Microsoft partners. So if you want to "sell cloud" you're probably going to struggle unless you get some of these onboard. Which again would probably be hard because I imagine a lot of what they sell is sort of a package which basically runs on VM's setup as part of the package that they already have.


For anybody interested, the meat of 'EU sovereign' means EU companies, not US or UK companies with EU servers. (because of CLOUD Act and the UK-US bilateral arrangement connected to it).

International visitors might tell us more about benefits of non EU, US or UK nexus companies/legal/rights.


Ok, what am I missing? exe.dev says: "$20/month for your VMs One price, no surprises. You get 2 CPUs, 8 GB of RAM, and 25 GB of disk".

Fine, their UI is different, but I don't see any real difference from other providers.


You get one machine.

On that machine you can (easily) make an arbitrary number of VMs.

Each VM has their own URL that you can share (or make private).

See features: https://exe.dev/docs/customization


Comparing laptop SSD to cloud network drive is misleading.

EC2 provides the *d VMs that have SSDs with high IOPS at much lower cost than network SSDs. They are ephemeral, but so is laptop and its SSD - it can loose the data. From AWS docs "If you stop, hibernate, or terminate an instance, data on instance store volumes is lost.".


I'm excited to see what they put together, because this raises a number of similar gripes I have with public cloud in its current state:

* Insistence on adding costly abstractions to overcome the limitations of non-fungible resources

* Deliberate creation of over or under-sized resource "pieces" instead of letting folks consume what they need

* Deliberate incompatibility with other vendors to enforce lock-in

I pitched a "Universal Cloud" abstraction layer years ago that never got any traction, and honestly this sounds like a much better solution anyhow. When modern virtualization is baked into OS kernels, it doesn't make a whole lot of sense to enforce arbitrary resource sizes or limits other than to inflate consumption.

Kubernetes without all the stuff that makes it a bugbear to administrate, in other words. Let me buy/rent a pool of stuff and use it how I see fit, be it containers or VMs or what-have-you.


Hahaha! Have fun! I‘m doing the same - together with Claude Code. Since August. With https (mTLS1.3) everywhere, because i can. Just my money, just my servers, just for me. Just for fun. And what a fun it is!

Me too. I already moved our products to it and it is getting fairly robust. Guess many smaller companies got tired with the big guys asking a lot of money for things that should be cheap.

Yeah i feel like it's getting cloudy

> The standard price for a GB of egress from a cloud provider is 10x what you pay racking a server in a normal data center.

Oh, that’s too kind. More like 100x to 1000x. Raw bandwidth is cheap.


It was a weird point to make in the post given that exe.dev charges $0.07/GB for transfer. That's arguably worse than the major clouds, who charge about the same for egress but give you free ingress.

Author here.

I need to fix our transfer pricing. (In fact I'm going to go look at it now.) I set that number when we launched in December, and we were still considering building on top of AWS, so we put a conservative limit based on what wouldn't break the bank on AWS. Now that we are doing our own thing, we can be far more reasonable.


Lots of negativity towards k8s in here. It's always funny to me when $WILDLY_POPULAR_TECH gets ripped apart like this, as though no one has ever had a positive experience with it. I've seen similar pile-ons for React, microservices, git, PHP, JavaScript, cloud services, really anything that's been adopted at scale.

It’s only natural that seeing frequent complaints mostly happens for tech that has high adoption. Stuff that nobody uses doesn’t get many complaints.

HN has had a hate boner for K8s for as long as I can remember.

In my experience, K8s is a million times better than legacy shit it is usually replacing. The Herokus, the Ansible soup, the Chef/Puppet soup before that etc. The legacy infra that was held together by glue and sweat that everybody was afraid to touch.


As SRE, totally agree. Most companies I've been at where we implement K8S, which is around 30-50 VMs, ends up building their own, shittier Kubernetes. This blog post: https://www.macchaffee.com/blog/2024/you-have-built-a-kubern... is a favorite of mine.

I have had an eye on this for a while (found via pi.dev) but I don't really have a solid use case for it, but the idea/concept of is appealing where the price is not. I can buy a £100-150 mini-pc with better hardware to run 24/7 for my own VMs extending my homelab (granted my ISP doesn't put any restrictions on me, I know many others can't say the same).

You can see their base docker image here - https://github.com/boldsoftware/exeuntu


This makes me wonder if I could get a few million in funding to rent out some Oxide racks. I'd love to touch some Oxide hardware and this seems like a good way to do it.

I have trouble seeing how this is different to linode, if i invest time in a new VM api, this has to work for cloud or my own machines transparently. Lastly as much as i share the disappointment in k8s promise, this seems a bit too simple, there is a reason homelabs mostly standardised on compose files.

linode came to mind as well but OP seems to be much more focused on building cloud as a hardware service rather than VM

It seems really cool, but the entry-level tier just seems too expensive. I can get a single pain-in-the-ass OVH VPS for $7. I just need something better than that for the same price.

Half the work of pricing a solo SaaS is modeling what a bad month looks like if egress or IOPS spike. You end up pricing defensively to protect against your own infra bill instead of pricing to the value you provide. A fixed bucket of compute with clear limits is way easier to build a business on than a meter that could run anywhere.

I have mixed feelings about this concept, I agree that the way clouds work now is far from great and stronger abstractions are possible. But this article offers nothing of the sort, it just handwaves 'we solve some problem and that saves you tokens'???

Checking the current offering, it's just prepaid cloud-capacity with rather low flexibility. It's cheap though, so that is nice I guess. But does this solve anything new? Anything fly.io orso doesn't solve?

What is the new idea here? Or is it just the vibes?


Hilariously, they have this linked (“That must be worst website ever made.”: https://news.ycombinator.com/item?id=46399903) under What people are saying.

The 2.0 website they never wanted:

https://exe-muttha-fukken-dev.exe.xyz/


I think I am interested in this? I run a bunch of small web apps, currently as fly.io machines. I love fly, but it adds up when I have a bunch of small things that I want isolated — I wish I could have even smaller Fly instances. Exe.dev seems like a good middleground where I can allocate the compute from tiny to large. (?)

I use both fly and exe. Exe isn't really "docker image as the app"-focused like fly, but if you want to sort of mimic the fly deploy process you kinda sorta could make it work for you I would think. This might help:

https://exe.dev/docs/customization


I can't see why I would want this, but I do love Tailscale so I'm excited to see what new stuff he comes up with here.

I don't get it, what is this, how is it different?

You choose a region. Then you pay for some compute size (vcpu and mem), and then you can create a lot of VMs using those limits. If some VM's don't consume all resources, others can consume it in burst.

VMs have a built-in gateway to cloud providers with a fixed url with no auth. You can top that in via the service itself. No need for your own keys.

So likely a good tool for managing AI agents. And "cloud" is a bit of a stretch, the service is very narrow.

The complete lack of more detailed description of the regions except city name makes it really only suitable for ephemeral/temporary deployments. We don't know what the datacenters are, what redundancy is in place, no backups or anything like that.


As I understand, a cloud provider where instead of paying for each VM (with a set of resources), you pay for the resources, and can get as many VMs as you can fit on these resources.

I just want to say this has been the greatest experience I have ever had for signing up for a new service etc. I really really loved this entire experience. It has truly inspired me!

I wish you great luck! I want this to succeed.

The author seems to have no clue what is cloud problem, and what is k8s problem, and is blaming everything on k8s. The whole post reeks of ignorance. I have no love to k8s but he is just flat out putting out false information.

> Finally, clouds have painful APIs. This is where projects like K8S come in, papering over the pain so engineers suffer a bit less from using the cloud.

K8s's main function isn't to paint over existing cloud APIs, that is just necessity when you deploy it in cloud. On normal hardware it's just an orchestration layer, and often just a way to pass config from one app to another in structured format.

> But VMs are hard with Kubernetes because the cloud makes you do it all yourself with lumpy nested virtualization.

Man discovered system designed for containers is good with containers, not VMs. More news at 10

> Disk is hard because back when they were designing K8S Google didn’t really even do usable remote block devices, and even if you can find a common pattern among clouds today to paper over, it will be slow.

Ignorance. k8s have abstractions over a bunch of types of storage, for example using Ceph as backend will just use KVM's Ceph backend, no extra overhead. It also supports "oldschool" protocols used for VM storage like NFS or iSCSI. It might be slow in some cases for cloud if cloud doesn't provide enough control, but that's not k8s fault.

> Networking is hard because if it were easy you would private link in a few systems from a neighboring open DC and drop a zero from your cloud spend.

He mistakes cloud problems with k8s problems(again). All k8s needs is visibility between nodes. There are multiple providers to achieve that, some with zero tunelling, just routing. It's still complex, but no more than "run a routing daemon".

I expect his project to slowly reinvent cloud APIs and copying what k8s and other projects did once he starts hitting problems those solutions solved. And do it worse, because instead of researching of why and why not that person seems to want to throw everything out with learning no lessons.

Do not give him money


You should do it in Europe, so much demand for European clouds and very weak offerings.

US company doing cloud in Europe changes nothing because of the CLOUD Act: https://en.wikipedia.org/wiki/CLOUD_Act

hezner, OVH? in terms of price and just having a vps that works european clouds are better than the american ones, for me is easier to understand a vps that is just *Linux* that whatever AWS or GCP are doing.

This looks like an excellent platform for running a "homelab" in the cloud (no, the irony is not lost on me) for lighter stuff like Readeck, Calibre-web, Immich. Maybe even Home Assistant too if we can find a way (Tailscale?) to get the mDNS/multicast traffic tunnelled.

With pricing 100gb/8usd Immich would be wildly uneconomical. Better to wait for upcoming immich hosting to support the project or use ente.io - those are 1tb/10usd.

That's a good tip, thanks. What I meant to say was that there's probably at least a handful of self-hosted services you could run to offset that $20/mo.

Another one could be Bitwarden, although I don't host my own password manager personally. Or netbird. You get the point


That's really cool!

One thing I'm confused with is how to create a shared resources like e.g. a redis server and connect to it from other vms? It looks now quite cumbersome to setup tailscale or connect via ssh between VMS. Also what about egress? My guess is that all traffic billed at 0.07$ per GB. It looks like this cloud is made to run statefull agents and personal isolated projects and distributed systems or horizontal scaling isn't a good fit for it?

Also I'm curious why not railway like billing per resource utilization pricing model? It’s very convenient and I would argue is made for agents era.

I did setup for my friends and family a railway project that spawns a vm with disk (statefull service) via a tg bot and runs an openclaw like agent - it costs me something like 2$ to run 9 vms like this.


This is being accurately called "cloud for developers". If it were for enterprise, it should cost 1000x to create thousands of positions, multiple VPs, executives, etc with a bill in 100s of millions of dollars. Execs wants high capex/opex and a massive headcount. BIGG numbers mean bigger titles and compensation.

The "one price" is oddly small for a cloud company. I'm sure it's nice and fast but the $20/mo seems smaller than some companies' free tiers, especially for disk.

The main reason clouds offer network block devices is abstraction.


Don’t worry - that will certainly change in the future if they have any kind of success :)

I really want an open source version of Firebase with feature parity.

I don’t care about how the backend works. Superbase requires magical luck to self host.

A lot of cloud providers have very generous free tiers to hook you and then the moment things take off , it’s a small fortune to keep the servers on.


Convex's open source version is OK as long as you don't expect huge load.

I'm still new to cloud computing. I've only ever used linode. What is this supposed to be? I couldn't figure out a specific design through the article well. Pls help

as an exe customer i'm really happy to see this. i don't even use half of their features (such as the https proxy, or the LLM agent) but it's just a reliable computer that i can ssh into from my laptop or phone. i use hetzner too in the same way for a bit of redundancy but exe seems less likely to delete all my machines and data.

every time i've had an issue or question, it's been the same sympathetic people helping me out. over email, in plain text.


I think clouds pay a huge abstraction penalty to allow tiny VMs. I guess it helps with onboarding and $10 personal VPNs. But I have never needed a fraction of a computer. I want to rent some number of full computers of various sizes, consisting of CPU, memory, and flash disk. Hetzner is closer than AWS, and I think/hope that’s what Crawshaw is aiming for.

Allow? I understood tiny VM's to be something (at least AWS) added to try to squeeze more utilization out of idle hardware.

I understand the appeal from AWS's perspective. Customer A pays for a 32 vCPU VM, which they run on 32-core hardware. Then they can also squeeze in customer B's 1 vCPU instance running a blog, and no one notices. Free money!

But I don't want to be either of those customers. It means the whole system has an extra layer of abstraction, so they can juggle VMs around. It's why you need slow EBS instead of just getting a flash drive in the same case as the CPU, with 0.01x the latency.


The key to renting a fraction of a computer is scaling up. If I can rent 1/8th of a computer, I can also rent 3/8ths and 1/2 and then go to a full computer, if that capacity is necessary.

The key to scaling up is to have big-enough hardware on the backend. If Hetzner is renting out bare metal instances then they can only rent out the sizes that they have. If a cloud provider invests in really big single systems, they can offer fractions of those systems to multiple tenants, some of whom scale up to use the entire system, and some who don't. I think that is a win-win.

A fractional VM is also a fungible VM. If the tenant calls to spin up a certain size VM, then the backend can find suitable hardware for it from a menu of sizes. Smaller VMs can slot in anywhere there is room, not just on a designated bare-metal system.

A cloud provider is always going to want to maximize their rack space, wattage/heat, and resource usage. So they will invest in high-density systems at every chance. On the other hand, cloud tenants will have diverse needs, including some fraction of those big computers.


I will follow this one for sure. There are a few more companies with the extremely ambitious goal of "a better AWS", and I am interested in the various strategies they take to approach that goal incrementally.

A service offering VMs for $20 is a long way from AWS, but I see how it makes sense as a first step. AWS also started with EC2, but in a completely different environment with no competition.


With LLMs there is no real dev velocity penalty of using high perf. langs like say Rust. A pair of 192 Core AMD EPYC boxes will have enough headroom for 99.9% of projects.

That’ll be true for the 0.1% of project that were limited by the speed of their programming language. For the other 99.9% of projects their vibe coded rust can fly and their database, network, or raw computation will still be the bottleneck.

(Percentages cited above are tongue-in-cheek, actual numbers are probably different)


just take a look at hetzner cloud. Its everything 99% of the people need, good pricing. Convert that ux to terminal and you done

AWS. Months of complex dev work to build using their CDK. Terrible disk speed. Frustrating permissions systems. Tiny deployments that take 30 minutes. Rollbacks that get stuck for hours. What you end up with is about 4 CPUs and 16Gb of RAM for $1000+ per month. No wonder Bezos could send his wife and Katie Perry on a jolly into space. The world's richest man 1 IOP at a time.

For that money I can get 5 big bare metal boxes on OVH with fast SSDs, put k0s on them, fast deploy with kluctl, cloudflare tunnels for egress. Backups to a cheap S3 bucket somewhere. I'll never look at another cloud provider.


If you're using cloudflare tunnels, you don't even need to be on OVH. You could seriously host anywhere, like your own basement.

exe.dev. 111 IN A 52.35.87.134

52.35.87.134 <- Amazon Technologies Inc. (AT-88-Z)


Hello, author here.

Our exe.dev web UI still runs on AWS. We also have a few users left on our VM hosts there, as when we launched in December we were considering building on AWS. Now almost all customer VMs are on other bare metal providers or machines we are racking ourselves. We built our own GLB with the help of another vendor's anycast network. You can see that if you try any of the exe.xyz names generated for user VMs.

We would move exe.dev too, but we have a few customers who are compliance sensitive going through it, so we need to get the compliance story right with our own hardware before we can. It is a little annoying being tied to AWS just for that, but very little of our traffic goes through them, so in practice it works.


Their first location (PDX) is on Amazon I believe and not accepting new customers. They’ve said it’s much more expensive for them than the others. Their other locations are listed here:

https://exe.dev/docs/regions


Well yes, because they needed high availability and flexibility and tons of features…

Hey wait a minute!


"I am white labeling a cloud"

FTA “Hence the Series A: we have some computers to buy.”

Article doesn’t really tell what fundamental problems will be solved, except fancy VM allocation. Nothing about hardware, networking, reliability, tooling and such. Well, nice, good luck.

Much respect for the ambitous plan, I wish I could do such bold thinking. I am running a small PHP PaaS (fortrabbit) for more than 10 years. For me, it's not only "scratch your own itch", but also "know your audience". So, a limited feature set with a high level of abstraction can also be useful for some users > clear path.

finally a cloud 'vendor' that understands that modern computers are fast.

if we go back to the principle that modern computers are really fast, SSDs are crazy fast

and we remove the extra cruft of abstractions - software will be easier to develop - and we wouldn't have people shilling 'agents' as a way for faster development.

ultimately the bottleneck is our own thinking.

simple primitives, simpler thinking.


Have we already forgotten about the NSA's "SSL added and removed here! :)" slide that Snowden showed us?

https://news.ycombinator.com/item?id=6641378


I don’t understand the point you’re trying to make.

Cloud is bad?


Nevermind, I misread their HTTPS proxy documentation. Cloud is fine.

“Everything is shit. Believe me. We will do something better, just believe me.”

Jokes aside: - k8s is insane peace of software. A right tool for a big problem. Not for your toys. Yes, it is crazy difficult to setup and manage. Then what?

- cloud has bad and slow disk. BS. They have perfectly fast NVME.

Something else? That’s it.

Why I am so confident? I used to setup and manage kubernetes for 2 years. I have some experience. Do I use it more? Nope. Not a right tool for me. Ansible with some custom Linux tools fits better for Me.

I also build my own cloud. But if I say it less loud: hosting to host websites for https://playcode.io. Yea, it is hard and with a lot of compromises. Like networking, yes I want to communicate between vms in any region. Or disks and reliability. What about snapshots? And many bare metal renters gives only 1Gbt/s. Which is not fine. Or they ask way more for 10Gbt uplink. So it is easy to build some limited and unreliable shit or non scalable.


Wondering what runtime is the infra under the hood. Firecracker? Traditional VM? Docker Containers?

Author here. Most of our infra is custom, the VMM is based on cloud-hypervisor (a project spiritually similar to Firecracker). We have a lot of work to do, including on the VMM, but right now there is more value for users if we spend our time on the VM management layer and GLB.

How difficult is it to build a second startup on the side?

I welcome the initiative but it’s pretty costly compared to the bare metal cloud providers. So the value as to be the platform as service too.

You can run several VM's or containers with isolation on your phone hardware, why even use the cloud when you just want to show your friends?

For me it’s so my coding agent keeps running when I close my laptop lid and it goes to sleep. VM in the cloud because I’m too lazy to set up a computer to be running as a server all the time.

Congrats. Just checked your homepage. I love the fact you also show this comment

"That must be worst website ever made"

Made me love the site and style even more


What will happen to my "Grandfathered Plan" I signed up to test it, don't recall if I gave you my credit card

Why is an imperative SSH interface a better way of setting cloud resources than something like OpenTofu? In my experience humans and agents work better in declarative environments. If an OpenTofu integration is offered in the future, will exe.dev offer any value over existing cost-effective VPS providers like Hetzner? Technically, Hetzner, for example, also allows you to set up shared disk volumes:

https://github.com/hetzneronline/community-content/blob/mast...

It also has a CLI, hcloud. Am I getting any value with exe.dev I couldn't get with an 80 line hcloud wrapper?


I don't think SSH vs OpenTofu is the core issue here.

For agents, declarative plans are still valuable because they are reviewable. The interesting question is whether exe.dev changes the primitive: resource pools for many isolated VM-like processes, or just nicer VPS provisioning.


It doesn't do either at competitive rates by the looks of it.

Let me see if I understand it. The TL;DR is that instead of asking for VMs and fit things there you reserve the CPU and RAM and do with that whatever you want? Number of mVMs, etc.?

From the linked blog post:

> The standard price for a GB of egress from a cloud provider is 10x what you pay racking a server in a normal data center.

From the exe.dev pricing page:

> additional data transfer $0.07/GB/month

So at least on the network price promise they don't seem to deliver, still costs an arm and a leg like your neighbourhood hyperscaler.

Overall service looks interesting, I like simplicity with convenience, something which packet.net deliberately decided not to offer at the time.


> $20/month for your VMs

>One price, no surprises. You get 2 CPUs, 8 GB of RAM, and 25 GB of disk—shared across up to 25 VMs.

This might sounds like a good thing compared to the current state of clouds, but what’s better than that is having your own. The other day I got a used optiplex for $20, it had 2TB hdd, 265gb ssd, 16gb, and corei7. This is a one time payment, not monthly. You can setup proxmox, have dozens of lxc and vm, and even nest inside them whatever more lxc too, your hardware, physically with you, backed up by you, monitored by you, and accessed only by you. If you have stable internet and electricity, there’s really no excuse not to invest on your own hardware. A small business can even invest in that as well, not just as a personal one. Go to rackrat.net and grab a used server if you are a business, or a good station for personal use.


HeavyBit is absolutely gross. I've heard lots of horrible things about them from multiple founders.

One of my friends was told to come to a sex party that was all male and he is straight. It soured his relationship with the firm so much he ended up winding down the business.


Does that any anything to do with exe.dev?


Thank you! <3

exe.dev landing page is sublime. The call to action is "ssh exe.dev" and you can bet it works.

I love the line on the landing page with a link back to hn:

> That must be worst website ever made.

the level of confidence (this is a second time founder after all) to put that on their website gives me confidence that they can make this work


Thank you, but no thanks

Very cool signup!

https://orbit-disk.exe.xyz:8000/

I like the way you can tell it what you want and it makes it. Very cool.


Gotta set it to public if you want it public:

https://exe.dev/docs/proxy


So much good stuff is happening at https://exe.dev, keep it up guys!

Such statement is so off:

"In some tech circles, that is an unusual statement. (“In this house, we curse computers!”) I get it, computers can be really frustrating. But I like computers. I always have. It is really fun getting computers to do things. Painful, sure, but the results are worth it. Small microcontrollers are fun, desktops are fun, phones are fun, and servers are fun, whether racked in your basement or in a data center across the world. I like them all."

The reality: Everyone reading his blog or this HN entry loves computers.


Did... did you just scare Microsoft? They now announced a similar thing https://x.com/satyanadella/status/2047033636923568440

I know its a personal blog but the writing style is really full of himself. What a martyr, starting a second company.

It's hard to see the scale of what he's doing. Could be:

- I'm building a server farm in my homelab.

- I'm doing a small startup to see if this idea works.

- We're taking on AWS by being more cost effective. Funding secured.


Not an answer, but it this provides some illumination on the question: https://github.com/tailscale/tailscale/commit/d539a950ca4a66...

If you click the first link in the post, about funding, you’ll see they just raised $35mil.

If someone is building a new cloud, worth learning a few lessons from Cloudflare.

Perhaps the VM idea is old. The unit is a worker encapsulated in some deployable container.

In the world of Cloudflare workers - especially durable objects that are guaranteed to have one of them running in the world with a tightly bound database.

The way I think of apps has changed.

My take is devs want a way to say “run this code, persist this info, microsecond latency, never go down, scale within this $ budget”

It’s crazy how good a deal $5/mo cloudflare standard plan is.

Obviously many startups raise millions and they gotta spend millions.

However the new age of scale to zero, wake up in millisecond, process the request and go back to sleep is a new paradigm.

Vs old school of over provision for max capacity you will ever need.

Google has a similar, scale to zero container story but their cold startup time is in seconds. Too slow.


should log the journey down and os it!

I mean the whole ebs complaint is invalid you are complaining about a san disk vs local disk. If you want high speed local storage use a d instance with nvme storage.

Now that we're talking about clouds... what happened to the word 'webhosting'?

Isn't it high time to figure out a distributed physical layer / swarm internet or whatever the buzzword is? Would be perfect for distributed AI too..

As someone who has built and managed clouds, good luck to them, you'll need it :)

Tangential.

Is there a name for this style of writing? I come across it regularly.

I'd describe it as forcefully modest, "I'm just a simple guy" kind of thing. With a dash of "still a child on the inside". I always picture it as if the guy from the King of Queens meme wrote it.

"I guess I'm just really into books, heh" - Bezos (obviously non-real, hypothetical quote, meant to illustrate the concept)

This style is also very prevalent in Twitter bios.

Since it's a "literary" style that is quite common, I'm sure it has been characterized and named.

GPT says it's "aw-shucks", but I think that's a different thing.


Ilike to working with reap name

How this is different from getting dedicated server from any other provider? Typically you need to pay a bit more - $40-$50 but you get more RAM and cores.

And what it has to do with the "cloud"? Cloud means one use cloud-provided services - security, queue, managed database, etc. and that's their selling point. This exe.dev is a bare server where I can install what I want, this is fine, but this is not a cloud and, frankly speaking, nothing new.


I appreciate the confidence that comes with a clear vision - but please make docs useful from day 1. But remember, while you know what's in your mind, the user does not.

These are nice declarative statements but have almost no meaningful substance.

> Setup scripts have a maximum size. Use indirection. [What's the maximum size?] > Shelley is a coding agent. It is web-based, works on mobile. [Cool model bro. Any details you want to share?]


> The standard price for a GB of egress from a cloud provider is 10x what you pay racking a server in a normal data center.

> $160/month

  50 VM
  25 GB disk+
  100 GB data transfer+
100GB/mo is <1mbps sustained lmao

Hi David, thanks for trying to fix the cloud. There is a persistent problem with all cloud providers that none of them has fixed yet (and I don't expect any ever will). I imagine users will not care about this issue, so this might not be worth solving. But if you'd like to have the only cloud provider (or technology in general) that can solve this problem, it would make cloud computers less annoying.

If you want to run a website in the cloud, you start with an API, right? A CRUD API with commands like "make me a VPC with subnet 1.2.3.4/24", "make me a VM with 2GB RAM and 1 vCPU", "allow tcp port 80 and 443 to my VM", etc. Over time you create and change more things; things work, everybody's happy. At some point, one of the things changes, and now the website is broken. You could use Terraform or Ansible to try to fix this, by first creating all the configs to hopefully be in the right state, then re-running the IaC to re-apply the right set of parameters. But your website is already down and you don't really want to maintain a complex config and tool.

You can't avoid this problem because the cloud's design is bad. The CRUD method works at first to get things going. But eventually VMs stop, things get deleted, parameters of resources get changed. K8s was (partly) made to address this, with a declarative config and server which constantly "fixes" the resources back to the declared state. But K8s is hell because it uses a million abstractions to do a simple thing: ensure my stuff stays working. I should be able to point and click to set it up, and the cloud should remember it. Then if I try to change something like the security group, it should error saying "my dude, if you remove port 443 from the security group, your website will go down". Of course the cloud can't really know what will break what, unless the user defines their application's architecture. So the cloud should let the user define that architecture, have a server component that keeps ensuring everything's there and works, and stops people from footgunning themselves.

Everything that affects the user is a distributed system with mutable state. When that state changes, it can break something. So the system should continuously manage itself to fix issues that could break it. Part of that requires tracking dependencies, with guardrails to determine if a change might break something. Another part requires versioning the changes, so the user (or system) can easily roll back the whole system state to before it broke. This abstraction is complicated, but it's a solution to a complex problem: keeping the system working.

No cloud deals with this because it's too hard. But your cloud is extremely simple, so it might work. Ideally, every resource in your cloud (exe.dev) should work this way. From your team membership settings, to whether a proxy is public, the state of your VM, your DNS settings, the ssh keys allowed, email settings, http proxy integration / repo integration settings / their attachments, VM tags & disk sizes, etc. Over time your system will add more pieces and get more complex, to the point that implementing these system protections will be too complex and you won't even consider it. But your system is small right now, so you might be able to get it working. The end result should be less pain for the user because the system protects them from pain (fixing broken things, preventing breaking things), and more money for you because people like systems that don't break. But it's also possible nobody cares about this stuff until the system gets really big, so maybe your users won't care. It would be nice to have a cloud that fixes this tho.


> 100 GB data transfer+

> $20 a month

2025 or 2005, what's the difference?


inflation



Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: