![](https://media.kbin.social/media/45/de/45de5b20522083ecc16b3a6312bea97d59aa560dbf84d5982b03cee171473bd3.png)
![](https://lemmy.ml/pictrs/image/a64z2tlDDD.png)
Damn, OneCoin was bad. Ruja Ignatova was the first crypto scammer I’ve seen talked about in national news and she was also made fun of in a news comedy show over here. A true scam pioneer.
Rose here. Also @umbraroze for non-kbin stuff.
Damn, OneCoin was bad. Ruja Ignatova was the first crypto scammer I’ve seen talked about in national news and she was also made fun of in a news comedy show over here. A true scam pioneer.
Brief history of YAML:
“Oh no! All of these configuration file formats are complicated. I want to make things simpler!”
(Years go by)
“…I have made things more complicated, haven’t I?”
YAML is generally good if it’s used for what it was originally designed for (relatively short data files, e.g. configuration data). Problem is, people use it for so much more. (My personal favourite pain example: i18n stuff in Ruby on Rails. YAML language files work for small apps, but when the app grows, so does the pain.)
Reddit has an user data checkout feature (IIRC, check out the user settings or maybe reddit help pages to find it).
It’s a bit crap though.
It takes a long time to process, especially if you happened to post in the era when the Reddit data infrastructure was horribly terrible instead of merely ordinarily terrible, and apparently this involves some handwork in the worst cases on behalf of the staff.
Some data may be missing or truncated. It doesn’t give you data from privated/banned subreddits (which was a fun thing to discover because last time I tried to do this the blackouts were on), and even for legit stuff, long comments/posts may be truncated. Even so, I’m pretty sure that the dumps just straight up didn’t have all of my posts from several years ago, even if those were on public subreddits. So you need to make sure the checked out data is sensible.
In conjunction to the official dumps, I recommend a few other tools, especially since the dumps aren’t really magnificently usable on their own. One tool that I found personally invaluable is reddit-user-to-sqlite, which allows you to import Reddit data dumps and available live user data (I think it does this by scraping or something, I’m sure it worked despite the API being shut down) to sqlite database, and Datasette is a nice frontend for browsing the posts.
As for scrubbing, there’s tools for that are supposed to work. I think.
Yup. The robots.txt file is not only meant to block robots from accessing the site, it’s also meant to block bots from accessing resources that are not interesting for human readers, even indirectly.
For example, MediaWiki installations are pretty clever in that by default, /w/
is blocked and /wiki/
is encouraged. Because nobody wants technical pages and wiki histories in search results, they only want the current versions of the pages.
Fun tidbit: in the late 1990s, there was a real epidemic of spammers scraping the web pages for email addresses. Some people developed wpoison.cgi
, a script whose sole purpose was to generate garbage web pages with bogus email addresses. Real search engines ignored these, thanks to robots.txt. Guess what the spam bots did?
Do the AI bros really want to go there? Are they asking for model collapse?
I’m, like, OK, nuclear power isn’t necessarily a bad thing.
But power plants like that should probably serve wider municipal needs.
Building a private nuclear power plant just to power a data center? Well that’s clearly stupid.
Building a private nuclear power plant just to power a data center focused on a niche application? Well you know how that goes.
Also, look up SL-1. Disturbingly few Americans I’ve talked to have heard about that. Generally a good argument about why not every single thing should be powered by a tiny dedicated nuclear reactor.
“What am I supposed to do if 30-50 feral wolves run into my military base within 3-5 mins while my small soldiers are training?” - Putin
The Walt Disney Company is not ready for the amount of Steamboat Love ❤️🛳️❤️🛳️❤️ that the Rule34 folks will throw at them. Hot Steampipe Action is on the menu, folks.
I love watching Let’s Plays of Telltale games and similar games like Life is Strange. But usually, the first episode is hardest to watch through, because in these types of games, the first episode also serves as a very drawn out tutorial and has the most of the lore dumps.
Reporter: “Mr. Putin, how is it possible that you got 132% of the vote?”
Putin: “It is merely the byproduct of our superiour domestic mathematical sciences. The numbers are simply greater than the ones produced by foreign-made axioms. Do think of all of the great achievements our mathematicians have done over centuries, such as proving the Poincaré conjecture.”
Reporter: (gasp) “Your ballot results were tabulated by Grigori Perelman?”
Putin: “No, we looked at his qualifications but we figured he was out of our reach, unfortunately. We had the results tabulated by some other weird mathematician with a massive case of cabin fever. We saved a lot of taxpayer money this way.”
Probably some other NPC that does some highly specific thing. Like the name rater, or whatever.
Not important in the grand scheme of things, but people all over the world come for that one weird task I can do, and that’s enough for me.
Yeah, the thing is, “a monad is a monoid in the category of endofunctors” is kind of a meme among non-Haskell developers. Personally, I think Haskell is a very interesting language. The mathematical jargon, however, is impenetrable, and this particular expression is kind of the poster child. I’mma go look at Erlang if I want my functional language fix without making my head hurt, thank ye very much.
It’s a thing! Sadly it won’t rewrite Haskell codebases for you, though.
My theoretical answer is this: in an ideal world, there would be no copyright at all. This is an artificial contrivance that was once dreamed up to serve physical-copy economy, and it was rendered obsolete by the digital age. Shit would be so much easier when we got rid of this shit and everyone could share everything by default without any profit motive. (Caveat: This will not work unless literally every jurisdiction on the planet gets rid of copyright laws all at once, otherwise this is way too exploitable due to power imbalance. So I don’t think this is a practical proposition. *cough* unless we all decide Anarchism is a good idea after all *cough*)
My practical answer is this: Welllllll we’re kinda damned if we do and we’re damned if we don’t. My personal feeling is that AI creations aren’t really copyrightable, and even suggesting they are copyrightable is kind of opening a huge can of worms regarding what exactly counts as “creativity” in the first place. The best we can do under current copyright regime is to regulate how the AI datasets are curated, because goodness knows the current datasets weren’t exactly ethically obtained.
I was a Slashdot user.
People kept hyping Digg as a Slashdot replacement, but trying to submit posts was actually even more futile in practice than trying to submit articles to Slashdot editors. So much bigger hivemind too. Boring unfunny comment section.
When I first joined Reddit, it seemed like it was mostly populated by Slashdot refugees. Just people posting awesome shit. Great riveting discussions, even before anyone actually read the articles. That sort of stuff.
Funny thing, in ISO 8601 date isn’t separated by colon. The format is “YYYY-MM-DDTHH:MM:SS+hh:mm”. Date is separated by “-”, time is separated by “:”, date and time are separated by “T” (which is the bit that a lot of people miss). Time zone indicator can also be just “Z” for UTC. Many of these can be omitted if dealing with lesser precision (e.g. HH:MM is a valid timestamp, YYYY-MM is a valid datestamp if referring to just a month). (OK so apparently if you really want to split hairs, timestamps are supposed to be THH:MM etc. Now that’s a thing I’ve never seen anyone use.) Separators can also be omitted though that’s apparently not recommended if quick human legibility is of concern. There’s also YYYY-Wxx for week numbers.
Microsoft got repeatedly hit over this kind of shenanigans in MSIE during and after the anti-trust lawsuit.
Sadly, that was 20 years ago. I’m not having much faith in American justice system doing anything about this nowadays.
I was about to say “this reminds me of the Hot Dog Stand”.
…but someone actually made Hot Dog Stand. Shit.
Look, I’m a Linux nerd, and there are very few things that scare me. Linux Kernel programmers, maybe - you don’t meddle with them unless the hour is truly dire and we form a delegation to seek their aid after a complex debate as the world burns around us and we climb their mountain together. …And the other thing that scares me are some particular brands of Microsoft ultra fans, for thereover lies madness like we have not seen before.
Oh you fancy PC people and your fancy syscall
instruction.
I still don’t know why I could remember jsr $ab1e
. I didn’t even write that much assembly.
Oh how quaint, someone has discovered that Wikipedia can be vandalised. I’ll have to have you know that that came to us a a real surprise in 2001. Things are more manageable these days. People usually notice these things.
There’s two kinds of crypto scams: Ones that actually involve crypto and ones that don’t.
Vague, possibly impossible to implement promises about proposed future functionality are an integral part of the crypto sphere!