• 0 Posts
  • 15 Comments
Joined 1 year ago
cake
Cake day: July 5th, 2023

help-circle




  • Ah, I see. That makes sense, but to be fair I think that was expected. I suspect they also pull the same data from every page where adsense is embedded regardless of browser, e.g., and every other company out there is aggregating the same sort of data every possible place they can get it from (shared sign ins, etc etc)

    Edit: It’s definitely a particularly bad look when there are several things in there that representatives for Google have apparently lied about over the years.




  • This is a thing that is true of all LLMs, but it seems like you’re misunderstanding the core issue. It CAN give outputs like that sometimes. What we CAN’T do is force it to give outputs like that ALL the time.

    It will answer “I don’t know” if its predictive text model guesses that the most common response to this would be “I don’t know”. To do that, to simplify a little, you could imagine that it reads your question, compares that to all the text in its training data, and tries to find the conversation that looks most like the question you asked, then answers whatever the person in the training data answered. But your exact question wasn’t in its training data, so if you took that mental model, and instead had it compare to 1000 similar looking things in its training model and average them, then it would hopefully do a better job of coming up with something at least close to what you actually asked. Now take it to a million, or a billion.

    When we’re asking questions about the real world, we would prefer for it to answer based on knowledge about the real world. But what if it “matches” data from a work of fiction? Or just someone who doesn’t know what they’re talking about? Or true information, but about a different subject?

    It doesn’t know anything. It doesn’t understand anything you say. It just looks at patterns that it learned from the training data and tries to guess what words are most likely to be said in that case. In other words, “here’s one case where it didn’t hallucinate” and “it will never hallucinate” are not the same thing at all.

    Edit: To clarify, it doesn’t search its training data to answer your question, so asking “was this in the training data” is impossible. By the time you interact with it, the data is long gone. It was just used for training.






  • There are cultural traditions of using colors as symbols, many of which are harmless – red for anger, blue for sadness, green for envy. Whitelist and blacklist come from the very long-standing theme of using white to represent good and black to represent evil.

    Regardless of how you feel about the origin of those themes, it makes sense to start moving away from them now. Whether intentional or not, they can be harmful and aren’t really necessary.