How much money does Lemmy.ml need to temporarily boost their servers?

OsrsNeedsF2P@lemmy.ml · 2 years ago

How much money does Lemmy.ml need to temporarily boost their servers?

Milan@discuss.tchncs.de · 2 years ago

in vservers, it depends on the memory … and storage option for the one starting at 30…

nutomic@lemmy.ml · 2 years ago

It currently has 8gb and only uses 6gb or so. CPU is the only limitation.

Milan@discuss.tchncs.de · 2 years ago

It does not sound like OVHs vServers offer dedicated cores, and it is common to quickly become a bottleneck with VPS offerings across hosters and for example with the initial Mastodon hypes, i had to learn that shared hardware lesson the hard way. For the price you are currently paying, maybe something like a used dedicated (or one of the fancy AMD ones) server at Hetzner is of interest: https://www.hetzner.com/sb

nutomic@lemmy.ml · 2 years ago

Hetzner is great but they are very strict about piracy, so its not an option for lemmy.ml. For now the load has gone down so I will leave it like this, but a dedicated OVH server might be an option if load increases again.

Leigh@lemmy.ml · 2 years ago

You should use this relatively quiet time to migrate to a larger server, because when the time comes where you need to do it, you’re going to be in for a world of hurt. This is the calm before the storm–take advantage of it.

Ultimately, you need to scale horizontally. You need to shard your database and separate out your different functions (database, front end, whatever back end applications you use, etc) onto different servers, all fronted by load balancers. That’s going to be the only way to even begin to handle increasing load. If you don’t have a small team of experienced engineers with a deep understanding of how to build for scale, and you get a sudden mass exodus of users from Reddit, you’re fucked. So if I were you, here’s what I’d do:

Scale up to the largest instance type you can. If possible, switch (at least temporarily) to AWS and use something in the c6i instance family, such as the c6id.32xlarge. Billing for AWS instances is done by the hour, so you wouldn’t need to pay for an entire month up front if you only need that extra horsepower for a few days (such as when the blackouts are planned from the 12th through 14th).
Because the above will do nothing but buy you time until you crash–and if you get a huge spike of users, without horizontal scaling, you WILL crash–migrate your DNS to something like Cloudflare. From there, configure workers to respond when health checks to your site fail, so that users attempting to access the site can be shown a static page directing them to something like http://join-lemmy.org or someplace, instead of simply getting 5xx errors.
Once the hug of death is over, evaluate where you stand. Reduce your instance size, if you can, and start investigating what it’s going to take to scale horizontally.

I’m not a SQL expert, but I am a principal network architect, and my day job for the last 15 years has been working on scale and automation for the world’s largest companies, including 7 years spent at AWS. In my world, websites like Reddit, as large as they are, are still considered to be of ‘average’ size. I can’t help you with database, but I’m happy to provide guidance around networking, DNS, scale, automation, security, etc.

sam_uk@slrpnk.net · 2 years ago

I believe @ernest is just about to do a backend re-factor on https://kbin.social/ if you had the time and inclination a ticket here outlining some optimisations for horizontal scaling might be timely https://codeberg.org/Kbin/kbin-core

sysgen@lemmy.ml · 2 years ago

Hexbear ran (runs?) on Hetzner, I don’t recall them ever having an issue.