How do I know which pics on my Lemmy instance are safe to delete

maor@lemmy.org.il · 1 year ago

Nah I thought the same but then I manually checked it. In most of the image posts I see, the image URL starts with lemmy.org.il, which made me wonder whether they’re actually downloaded or it’s some kind of whacky proxy. So I downloaded some of these pics and looked for files of identical size and hash digest, and indeed they were on my disk!

It’s not a bad decision to cache pics, because it does make the experience really smooth, and I’m not complaining about it. Mastodon does this as well

maor@lemmy.org.il · 1 year ago

Yeah, pretty big storage requirement due to the way pictrs works. Pictrs is the piece of software Lemmy relies upon to manage image storage, uploads, and most importantly: caches pictures from other instances. This takes up a HUGE amount of storage space, and there’s no official way to clear this up, see these posts I recently made: first one, second one. The solution I resorted to is renting a 1TB storage box from Hetzner for 3 euros per month, pretty sweet deal but I was kinda annoying by it. So the cheapest deal I could find costs me 6 euros per month: 3 for an Alma Linux ARM VPS from Hetzner, and 3 for that storage box. If you’re in for the fun in tinkering (I sure as hell am in), then get ready for a good time. Other than that, if your main line of reasoning is to take burden off of lemmy.world, then I think just go ahead and join another instance. Better yet: join croud funding of another instnace:)

maor@lemmy.org.il · edit-2 1 year ago

Okay, you may not gonna like it but I rented a 1TB storage box from Hetzner for 3 euros a month, just to get that foot off my neck. It’s omega cheap and mountable via CIFS so life is good for now. I’m still interested in what I described in the OP, and I even started scribbling some Python, but I’m too scared of fucking anything up as of now.

The annoying part in writing that script was discovering that the filenames on disk don’t match the filenames in the URLs. E.g., given this URL:
https://lemmy.org.il/pictrs/image/e6a0682b-d530-4ce8-9f9e-afa8e1b5f201.png. You’d expect that somewhere inside volumes/pictrs you’d find e6a0682b-d530-4ce8-9f9e-afa8e1b5f201.png, right…? So that’s not how it works, the filenames are of the exact same format but they don’t match.

So my plan was to find non-local posts from the post table, check whether the thumbnail_url column starts with lemmy.org.il (assuming that means my instance cached it), then finding the file by downloading it via the URL and scanning the pictrs directory for files that match the exact size in bytes of the downloaded files. Once found, compare their checksums to be sure it’s the same one, then delete it and delete its post entry in the database.

When get close to 1TB I’ll get back here for this idea… :P

maor@lemmy.org.il · 1 year ago

Haha I’m literally on it right now. My instance crashed a couple of hours ago because of it, so I emptied ~/.rustup to get some time, but idk how to go about it from here. LPP didn’t do anything. That seems really curious, does literally everyone use S3?

maor@lemmy.org.il · 1 year ago

Thanks a lot, I was looking for this exact kind of community. Posted there <3

maor@lemmy.org.il · 1 year ago

I should’ve mentioned it in the post, but I already tried deleting pics modified more than X days ago. The catch is that I don’t wanna delete pics uploaded to my server, I just want to delete pocs cached from other instances :(

maor@lemmy.org.il · 1 year ago

How do I know which pics on my Lemmy instance are safe to delete

maor@lemmy.org.il · 1 year ago

Yep, I manage my servers and local machine with Ansible so I abstracted it with a role. This is indeed not that bad of a con because it’s still plaintext so automation is easy, but it’s still a minor issue ;)

maor@lemmy.org.il · 1 year ago

Love me some systemd timers. Much more fun than cron.

Sane handling of environment variables with EnvironmentFile=
Out of the box logging. Especially useful is the ability to journalctl -f to watch long-running processes, which I’m not sure whether possible with cron
The ability to trigger the service manually rather than setting the timer to * * * * *, then forgetting it’s supposed to run in a minute, get distracted, come back in 15 minutes

My only complaint is it’s a bit verbose. I’d rather have it as an option inside the .service file. The .timer requires some boilerplate like [Unit].description (it… uh… triggers a service. that’s the description), and WantedBy=timers.target. But these are small prices to pay

maor@lemmy.org.il · 1 year ago

Yessss I love how the algorithm here isn’t tailored towards sucking me in

maor@lemmy.org.il · 1 year ago

It took me so much fucking time to realize how it works. There it is:

https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/

I learned Kubernetes in a hurry in my previous job, so I skimmed over lots of “obvious” things (in my manager’s eye) and this was one of them:(

maor@lemmy.org.il · 1 year ago

I just rented a VPS from Hetzner because that’s the workflow I’m already familiar with. Lowest tier, 5$, and since it’s ARM it’s also beefy enough to never need an upgrade I hope :P

maor@lemmy.org.il · 1 year ago

What’s that?

maor@lemmy.org.il · 1 year ago

Frankly I’m not sure what it does haha. What are the “best communities”? Which communities? Also what does it actually do, subscribes all users on your instance to those “best communities”?

maor@lemmy.org.il · 1 year ago

You nailed it, it only pulls posts from communities that someone on your instance subbed to. It doesn’t even pull retroactively; your instance only starts pulling posts created after the first subscriber on the instance subbed.

I’m more concerned regarding media, because just like Mastodon, the pics themselves are copied from other instances onto yours. I hope it will be enough to just find -mtime -delete once in a while

maor@lemmy.org.il · 1 year ago

“vanity purposes” lmao I love it

maor@lemmy.org.il · 1 year ago

Also seeing the federation happening live at tail -f /var/log/nginx/access.log is so satisfying. I think I like computers

maor@lemmy.org.il · 1 year ago

Self hosting my own Lemmy instance was so much fun

maor@lemmy.org.il · 1 year ago

Oh there’s an --exec flag as well? That’s great. This seems like a totally viable solution for cases where the crux of the container is a small script, with a handful of decision branches so the surface area to cover is manageable, but it also needs to come in a non-alpine distro because I assume that’s the hefty part that we’re like to remove. But that’s just off the top of my head, I’m sure there’s more. It’s genuinely a good idea and it deserves a respectful README as well :(

maor@lemmy.org.il · edit-2 1 year ago

It ptraces the main container process and cuts off unused files. It also fires some customizable HTTP requests to trigger any dynamically loading libraries. Clever idea. If I understand correctly, the problems that arise to me are:

Undoubtedly some essential files will be omitted. Unless my image consists merely of scratch and an executable, I can’t imagine myself successfully covering all edge cases.
What about files that aren’t loaded by HTTP requests?

I’m not shitting on this program at all. These are two problems that I’m sure they could solve or just tell straight up “we can’t guarantee it’ll work in XYZ scenarios. Don’t use it if that’s your use case”. Then I saw that this is backed by some kinda SaaS with a domain that ends with .ai, and that explains why THAT FUCKING README IS WRITTEN like a FUCJik/INg MIND NUMBING LINKEDIN POST that my CEO could write bro what the fuck do you mean by simplifying the value of my digital assets in a seamless secure cost efficient way??? Who fucking cares??? ?WHat does your program ACTUALLY DO???

10000000s of seemingly AI-generated paragraphs going on and on about how convenient their product is, 1 measly line in a diagram that describes what it actually does. Again not to shit on the programmers at all, this is a great idea and I’m glad that it’s being explored I just hate this industry I can’t read another pile of gibberish like that. That ruined my night. Thanks for listening

maor@lemmy.org.il · edit-2 1 year ago

Same. Specifically I use it as a GUI to organize them; for the actual reading, I wrote a script that compiles an E-mail digest periodically: https://github.com/it-is-wednesday/miniflux-mail-digest