We don’t know that for sure yet, we saw a lot of emergent intelligent properties appear as we scaled up, and we’re nowhere near done scaling LLM’s, I’m not saying it will be solved, just that we don’t know one way or the other yet.
LLMs are fundamentally different from human consciousness. It isn’t a problem of scale, but kind.
They are like your phone’s autocomplete, but very very good. But there’s no level of “very good” for autocomplete that makes it a human, or will give it sentience, or allow it to understand the words it is suggesting. It simply returns the next most-likely word in a response.
If we want computerized intelligence, LLMs are a dead end. They might be a good way for that intelligence to speak pretty sentences to us, but they will never be that themselves.
LLMs are fundamentally different from human consciousness.
They are also fundamentally different from a toaster. But that’s completely irrelevant. Consciousness is something you get when you put intelligent in an agent that has to move around in and interact with an environment. A chatbot has no use for that, it’s just there to mush through lots of data and produce some, it doesn’t have or should worry about its own existence.
It simply returns the next most-likely word in a response.
So does the all knowing oracle that predicts the lotto numbers from next week. It being autocomplete does not limit its power.
LLMs are a dead end.
There might be better or faster approaches, but it’s certainly not a dead end. It’s a building block. Add some long term memory, bigger prompts, bigger model, interaction with the Web, etc. and you can build a much more powerful bit of software than what we have today, without even any real breakthrough on the AI side. GPT as it is today is already “good enough” for a scary number of things that used to be exclusively done by humans.
A chatbot has no use for that, it’s just there to mush through lots of data and produce some, it doesn’t have or should worry about its own existence.
It literally can’t worry about its own existence; it can’t worry about anything because it has no thoughts or feelings. Adding computational power will not miraculously change that.
Add some long term memory, bigger prompts, bigger model, interaction with the Web, etc. and you can build a much more powerful bit of software than what we have today, without even any real breakthrough on the AI side.
I agree this would be a very useful chatbot. But it is still not a toaster. Nor would it be conscious.
It literally can’t worry about its own existence; it can’t worry about anything because it has no thoughts or feelings. Adding computational power will not miraculously change that.
Who cares? This has no real world practical usecase. Its thoughts are what it says, it doesn’t have a hidden layer of thoughts, which is quite frankly a feature to me. Whether it’s conscious or not has nothing to do with its level of functionality.
Even if they are a result of complexity, that still doesn’t change the fact that LLMs will never be complex in that manner.
Again, LLMs have no self-awareness. They are not designed to have self-awareness. They do not have feelings or emotions or thoughts; they cannot have those things because all they do is generate words in response to queries. Unless their design fundamentally changes, they are incompatible with consciousness. They are, as I’ve said before, complicated autosuggestion algorithms.
Suggesting that throwing enough hardware at them will change their design is absurd. It’s like saying if you throw enough hardware at a calculator, it will develop sentience. But a calculator will not do that because all it’s programmed to do is add numbers together. There’s no hidden ability to think or feel lurking in its design. So too LLMs.
You’re guessing, you don’t actually know that for sure, it seems intuitively correct, but we simply do not know enough about cognition to make that assumption.
Perhaps our ability to reason exclusively comes from our ability to predict, and by scaling up the ability to predict, we become more and more able to reason.
These are guesses, all we have now are guesses, you can say “it doesn’t reason” and “it’s just autocorrect” all you want, but if that were the case why did scaling it up eventually enable it to perform basic math? Why did scaling it up improve its ability to problemsolve significantly (gpt3 vs gpt4), there’s so many unknowns in this field, to just say “nah, can’t be, it works differently from us” doesn’t mean it can’t do the same things as us given enough scale.
I’m not guessing. When I say it’s a difference of kind, I really did mean that. There is no cognition here; and we know enough about cognition to say that LLMs are not performing anything like it.
Believing LLMs will eventually perform cognition with enough hardware is like saying, “if we throw enough hardware at a calculator, it will eventually become alive.” Even if you throw all the hardware in the world at it, there is no emergent property of a calculator that would create sentience. So too LLMs, which really are just calculators that can speak English. But just like calculators they have no conception of what English is and they do not think in any way, and never will.
I’m not guessing. When I say it’s a difference of kind, I really did mean that. There is no cognition here; and we know enough about cognition to say that LLMs are not performing anything like it.
We do not know that, I challenge you to find a source for that, in fact, i’ve seen sources showing the opposite, they seem to reason in tokens, for example, LLM’s perform significantly better at tasks when asked to give a step by step reasoned explanation, this indicates that they are doing a form of reasoning, and their reasoning is limited by what I have no better term for than laziness.
It is your responsibility to prove your assertion that if we just throw enough hardware at LLMs they will suddenly become alive in any recognizable sense, not mine to prove you wrong.
You are anthropomorphizing LLMs. They do not reason and they are not lazy. The paper discusses a way to improve their predictive output, not a way to actually make them reason.
But don’t take my word for it. Go talk to ChatGPT. Ask it anything like this:
“If an LLM is provided enough processing power, would it eventually be conscious?”
“Are LLM neural networks like a human brain?”
“Do LLMs have thoughts?”
“Are LLMs similar in any way to human consciousness?”
Just always make sure to check the output of LLMs. Since they are complicated autosuggestion engines, they will sometimes confidently spout bullshit, so must be examined for correctness. (As my initial post discussed.)
You’re assuming i’m saying something that i’m not, and then arguing with that, instead of my actual claim.
I’m saying we don’t know for sure what they will be able to do when they’re scaled up. That’s the end of my assertion. I don’t have to prove that they will suddenly come alive, i’m not claiming they will, i’m just claiming we don’t know what will happen when they’re scaled, and they seem to have emergent properties as they scale up. Nobody has devised a way of predicting what emergent properties happen when, nobody has made any progress whatsoever on knowing what scaling up accomplishes.
Can they reason? Yes, but poorly right now, will that get better? Who knows.
The end of my claim is that we don’t know what’ll happen when they scale up, and that you can’t just write it off like you are.
If you want proof that they reason, see the research article I linked. If they can do that in their rudimentary form that we’ve created with very little time, we can’t write off the possibility that they will scale.
Whether or not they reason LIKE HUMANS is irrelevant if they can do the job.
And i’m not anthropomorphizing them without reason, there aren’t terms for this already, what would you call this behavior of answering questions significantly better when asked to fully explain reasoning? I would say it is taking the easiest option that still meets the qualifications of what it is requested to do, following the path of least resistance, I don’t have a better word for this than laziness.
Furthermore predictive power is just another way of achieving reasoning, better predictive power IS better reasoning, because you can’t predict well without reasoning.
I don’t believe in scaling as a way to discover understanding. Doing that is just praying that the machine comes alive… these machines weren’t programmed to come alive in that way. That’s my fundamental argument, the design of LLMs ignores understanding of the content… it doesn’t matter how much content it’s been scaled up to.
If I teach a real AI about fishing, it should be able to reason about fishing and it shouldn’t need to have read a supplementary knowledge of mankind to do it.
What the LLMs seem to be moving towards is more of a search and summary engine (for existing content). That’s a very similar and potentially quite useful thing, but it’s not the same thing as understanding.
It’s the difference between the kid that doesn’t know much but is really good at figuring it out based on what they know vs the kid that’s read all the text books front to back and can’t come up with anything original to save their life but can quickly regurgitate and summarize anything they’ve ever read.
If I teach a real AI about fishing, it should be able to reason about fishing and it shouldn’t need to have read a supplementary knowledge of mankind to do it.
This is a faulty assumption.
In order for you to learn about fishing, you had to learn a shitload about the world. Babies don’t come out of the womb able to do such tasks, there is a shitload of prerequisite knowledge in order to fish, it’s unfair to expect an ai to do this without prerequisite knowledge.
Furthermore, LLM’s have been shown to do many things that aren’t in their training data, so the notion that it’s a stochastic parrot is also false.
Furthermore, LLM’s have been shown to do many things that aren’t in their training data, so the notion that it’s a stochastic parrot is also false.
And (from what I’ve seen) they get things wrong with extreme regularity, increasingly so as thing diverge from the training data. I wouldn’t say they’re a “stochastic parrot” but they don’t seem to be much better when things need to be correct… and again, based on my (admittedly limited) understanding of their design, I don’t anticipate this technology (at least without some kind of augmented approach that can reason about the substance) overcoming that.
In order for you to learn about fishing, you had to learn a shitload about the world. Babies don’t come out of the womb able to do such tasks, there is a shitload of prerequisite knowledge in order to fish, it’s unfair to expect an ai to do this without prerequisite knowledge.
That’s missing the forest for the trees. Of course an AI isn’t going to go fishing. However, I should be able to assert some facts about fishing and it should be able to reason based on those assertions. e.g. a child can work off of facts presented about fishing, “fish are hard to catch in muddy water” -> “the water is muddy, does that impact my chances of a catching a bluegill?” -> “yes, it does, bluegill are fish, and fish don’t like muddy water”.
There are also “teachings” brought about by how these are programmed that make the flaws less obvious, e.g., if I try to repeat the experiment in the post here Google’s Bard outright refuses to continue because it doesn’t have information about Ryan McGee. I’ve also seen Bard get notably better as it’s been scaled up, early on I tried asking it about RuneScape and it spewed absolute nonsense. Now… It’s reasonable-ish.
I was able to reproduce a nonsense response (once again) by asking about RuneScape. I asked how to get 99 firemaking, and it invented a mechanic that doesn’t exist “Using a bonfire in the Charred Stump: The Charred Stump is a bonfire located in the Wilderness. It gives 150% Firemaking experience, but it is also dangerous because you can be attacked by other players.” This is a novel (if not creative) invention of Bard likely derived from advice for training Prayer (which does have something in the Wilderness which gives 350% experience).
And (from what I’ve seen) they get things wrong with extreme regularity, increasingly so as thing diverge from the training data. I wouldn’t say they’re a “stochastic parrot” but they don’t seem to be much better when things need to be correct… and again, based on my (admittedly limited) understanding of their design, I don’t anticipate this technology (at least without some kind of augmented approach that can reason about the substance) overcoming that.
Keep in mind, you’re talking about a rudimentary, introductory version of this, my argument is that we don’t know what will happen when they’ve scaled up, we know for certain hallucinations become less frequent as the model size decreases (see the statistics on gpt3 vs 4 on hallucinations), perhaps this only occurs because they haven’t met a critical size yet? We don’t know.
There’s so much we don’t know.
That’s missing the forest for the trees. Of course an AI isn’t going to go fishing. However, I should be able to assert some facts about fishing and it should be able to reason based on those assertions. e.g. a child can work off of facts presented about fishing, “fish are hard to catch in muddy water” -> “the water is muddy, does that impact my chances of a catching a bluegill?” -> “yes, it does, bluegill are fish, and fish don’t like muddy water”.
I think we might be, I remember hearing openAI was training on so much literary data that they didn’t and couldn’t find enough for testing the model. Though I may be misrememberimg.
No that’s definitely the case. However, Microsoft is now working making LLM’s more dependent on several high quality sources. For example: encyclopedias will be more important sources than random reddit posts.
There are still plenty of videos to watch and games to play. We might be running short on books, but there are many other sources of information that aren’t accessible to LLMs at the moment.
Also just because the training set contained most of the books, doesn’t mean the model itself was large enough to learn from all of them. The more detailed your questions get, the bigger the change it will get them wrong, even if that knowledge should have been in the training set. For example ChatGPT as walkthrough for games is pretty terrible, even so there should be more than enough walkthroughs in the training set to learn from, same for summarizing movies, it will do the most popular ones, but quickly fall apart with anything a little lesser known.
There is of course also the possibility that using the LLM as knowledge store by itself is a bad idea. Humans use books for that, not their brain. So an LLM that is very good at looking things up in a library could answer a lot more without the enormous models size and training cost.
Basically, there are still a ton of unexplored areas, even if we have collected all the digital books.
We don’t know that for sure yet, we saw a lot of emergent intelligent properties appear as we scaled up, and we’re nowhere near done scaling LLM’s, I’m not saying it will be solved, just that we don’t know one way or the other yet.
LLMs are fundamentally different from human consciousness. It isn’t a problem of scale, but kind.
They are like your phone’s autocomplete, but very very good. But there’s no level of “very good” for autocomplete that makes it a human, or will give it sentience, or allow it to understand the words it is suggesting. It simply returns the next most-likely word in a response.
If we want computerized intelligence, LLMs are a dead end. They might be a good way for that intelligence to speak pretty sentences to us, but they will never be that themselves.
They are also fundamentally different from a toaster. But that’s completely irrelevant. Consciousness is something you get when you put intelligent in an agent that has to move around in and interact with an environment. A chatbot has no use for that, it’s just there to mush through lots of data and produce some, it doesn’t have or should worry about its own existence.
So does the all knowing oracle that predicts the lotto numbers from next week. It being autocomplete does not limit its power.
There might be better or faster approaches, but it’s certainly not a dead end. It’s a building block. Add some long term memory, bigger prompts, bigger model, interaction with the Web, etc. and you can build a much more powerful bit of software than what we have today, without even any real breakthrough on the AI side. GPT as it is today is already “good enough” for a scary number of things that used to be exclusively done by humans.
It literally can’t worry about its own existence; it can’t worry about anything because it has no thoughts or feelings. Adding computational power will not miraculously change that.
I agree this would be a very useful chatbot. But it is still not a toaster. Nor would it be conscious.
Who cares? This has no real world practical usecase. Its thoughts are what it says, it doesn’t have a hidden layer of thoughts, which is quite frankly a feature to me. Whether it’s conscious or not has nothing to do with its level of functionality.
You seem unfamiliar with the concept of consciousness as an emergent property.
What if we dramatically reduce the cost of training - what if we add realtime feedback mechanisms as part of a perpetual model refinement process?
As far as I’m aware, we don’t know.
How are you so confident that your feelings are not simply a consequence of complexity?
Even if they are a result of complexity, that still doesn’t change the fact that LLMs will never be complex in that manner.
Again, LLMs have no self-awareness. They are not designed to have self-awareness. They do not have feelings or emotions or thoughts; they cannot have those things because all they do is generate words in response to queries. Unless their design fundamentally changes, they are incompatible with consciousness. They are, as I’ve said before, complicated autosuggestion algorithms.
Suggesting that throwing enough hardware at them will change their design is absurd. It’s like saying if you throw enough hardware at a calculator, it will develop sentience. But a calculator will not do that because all it’s programmed to do is add numbers together. There’s no hidden ability to think or feel lurking in its design. So too LLMs.
You’re guessing, you don’t actually know that for sure, it seems intuitively correct, but we simply do not know enough about cognition to make that assumption.
Perhaps our ability to reason exclusively comes from our ability to predict, and by scaling up the ability to predict, we become more and more able to reason.
These are guesses, all we have now are guesses, you can say “it doesn’t reason” and “it’s just autocorrect” all you want, but if that were the case why did scaling it up eventually enable it to perform basic math? Why did scaling it up improve its ability to problemsolve significantly (gpt3 vs gpt4), there’s so many unknowns in this field, to just say “nah, can’t be, it works differently from us” doesn’t mean it can’t do the same things as us given enough scale.
I’m not guessing. When I say it’s a difference of kind, I really did mean that. There is no cognition here; and we know enough about cognition to say that LLMs are not performing anything like it.
Believing LLMs will eventually perform cognition with enough hardware is like saying, “if we throw enough hardware at a calculator, it will eventually become alive.” Even if you throw all the hardware in the world at it, there is no emergent property of a calculator that would create sentience. So too LLMs, which really are just calculators that can speak English. But just like calculators they have no conception of what English is and they do not think in any way, and never will.
We do not know that, I challenge you to find a source for that, in fact, i’ve seen sources showing the opposite, they seem to reason in tokens, for example, LLM’s perform significantly better at tasks when asked to give a step by step reasoned explanation, this indicates that they are doing a form of reasoning, and their reasoning is limited by what I have no better term for than laziness.
https://blog.research.google/2022/05/language-models-perform-reasoning-via.html
It is your responsibility to prove your assertion that if we just throw enough hardware at LLMs they will suddenly become alive in any recognizable sense, not mine to prove you wrong.
You are anthropomorphizing LLMs. They do not reason and they are not lazy. The paper discusses a way to improve their predictive output, not a way to actually make them reason.
But don’t take my word for it. Go talk to ChatGPT. Ask it anything like this:
“If an LLM is provided enough processing power, would it eventually be conscious?”
“Are LLM neural networks like a human brain?”
“Do LLMs have thoughts?”
“Are LLMs similar in any way to human consciousness?”
Just always make sure to check the output of LLMs. Since they are complicated autosuggestion engines, they will sometimes confidently spout bullshit, so must be examined for correctness. (As my initial post discussed.)
You’re assuming i’m saying something that i’m not, and then arguing with that, instead of my actual claim.
I’m saying we don’t know for sure what they will be able to do when they’re scaled up. That’s the end of my assertion. I don’t have to prove that they will suddenly come alive, i’m not claiming they will, i’m just claiming we don’t know what will happen when they’re scaled, and they seem to have emergent properties as they scale up. Nobody has devised a way of predicting what emergent properties happen when, nobody has made any progress whatsoever on knowing what scaling up accomplishes.
Can they reason? Yes, but poorly right now, will that get better? Who knows.
The end of my claim is that we don’t know what’ll happen when they scale up, and that you can’t just write it off like you are.
If you want proof that they reason, see the research article I linked. If they can do that in their rudimentary form that we’ve created with very little time, we can’t write off the possibility that they will scale.
Whether or not they reason LIKE HUMANS is irrelevant if they can do the job.
And i’m not anthropomorphizing them without reason, there aren’t terms for this already, what would you call this behavior of answering questions significantly better when asked to fully explain reasoning? I would say it is taking the easiest option that still meets the qualifications of what it is requested to do, following the path of least resistance, I don’t have a better word for this than laziness.
https://www.downtoearth.org.in/news/science-technology/artificial-intelligence-gpt-4-shows-sparks-of-common-sense-human-like-reasoning-finds-microsoft-89429
Furthermore predictive power is just another way of achieving reasoning, better predictive power IS better reasoning, because you can’t predict well without reasoning.
It’s your job to prove your assertion that we know enough about cognition to make reasonable comparisons.
May as well ask me to prove that we know enough about calculators to say they won’t develop sentience while I’m at it.
I am picking up a hint of the autocompletion you describe, in your writing.
I think I write well :) I am not an LLM though.
I don’t believe in scaling as a way to discover understanding. Doing that is just praying that the machine comes alive… these machines weren’t programmed to come alive in that way. That’s my fundamental argument, the design of LLMs ignores understanding of the content… it doesn’t matter how much content it’s been scaled up to.
If I teach a real AI about fishing, it should be able to reason about fishing and it shouldn’t need to have read a supplementary knowledge of mankind to do it.
What the LLMs seem to be moving towards is more of a search and summary engine (for existing content). That’s a very similar and potentially quite useful thing, but it’s not the same thing as understanding.
It’s the difference between the kid that doesn’t know much but is really good at figuring it out based on what they know vs the kid that’s read all the text books front to back and can’t come up with anything original to save their life but can quickly regurgitate and summarize anything they’ve ever read.
This is a faulty assumption.
In order for you to learn about fishing, you had to learn a shitload about the world. Babies don’t come out of the womb able to do such tasks, there is a shitload of prerequisite knowledge in order to fish, it’s unfair to expect an ai to do this without prerequisite knowledge.
Furthermore, LLM’s have been shown to do many things that aren’t in their training data, so the notion that it’s a stochastic parrot is also false.
And (from what I’ve seen) they get things wrong with extreme regularity, increasingly so as thing diverge from the training data. I wouldn’t say they’re a “stochastic parrot” but they don’t seem to be much better when things need to be correct… and again, based on my (admittedly limited) understanding of their design, I don’t anticipate this technology (at least without some kind of augmented approach that can reason about the substance) overcoming that.
That’s missing the forest for the trees. Of course an AI isn’t going to go fishing. However, I should be able to assert some facts about fishing and it should be able to reason based on those assertions. e.g. a child can work off of facts presented about fishing, “fish are hard to catch in muddy water” -> “the water is muddy, does that impact my chances of a catching a bluegill?” -> “yes, it does, bluegill are fish, and fish don’t like muddy water”.
There are also “teachings” brought about by how these are programmed that make the flaws less obvious, e.g., if I try to repeat the experiment in the post here Google’s Bard outright refuses to continue because it doesn’t have information about Ryan McGee. I’ve also seen Bard get notably better as it’s been scaled up, early on I tried asking it about RuneScape and it spewed absolute nonsense. Now… It’s reasonable-ish.
I was able to reproduce a nonsense response (once again) by asking about RuneScape. I asked how to get 99 firemaking, and it invented a mechanic that doesn’t exist “Using a bonfire in the Charred Stump: The Charred Stump is a bonfire located in the Wilderness. It gives 150% Firemaking experience, but it is also dangerous because you can be attacked by other players.” This is a novel (if not creative) invention of Bard likely derived from advice for training Prayer (which does have something in the Wilderness which gives 350% experience).
Keep in mind, you’re talking about a rudimentary, introductory version of this, my argument is that we don’t know what will happen when they’ve scaled up, we know for certain hallucinations become less frequent as the model size decreases (see the statistics on gpt3 vs 4 on hallucinations), perhaps this only occurs because they haven’t met a critical size yet? We don’t know.
There’s so much we don’t know.
https://blog.research.google/2022/05/language-models-perform-reasoning-via.html
they do this already, albeit imperfectly, but again, this is like, a baby LLM.
and just to prove it:
https://chat.openai.com/share/54455afb-3eb8-4b7f-8fcc-e144a48b6798
I think we might be, I remember hearing openAI was training on so much literary data that they didn’t and couldn’t find enough for testing the model. Though I may be misrememberimg.
No that’s definitely the case. However, Microsoft is now working making LLM’s more dependent on several high quality sources. For example: encyclopedias will be more important sources than random reddit posts.
Microsoft is also using LinkedIn to help as well, getting users to correct articles generated by AI.
Cunningham’s Law may be very helpful in this respect.
There are still plenty of videos to watch and games to play. We might be running short on books, but there are many other sources of information that aren’t accessible to LLMs at the moment.
Also just because the training set contained most of the books, doesn’t mean the model itself was large enough to learn from all of them. The more detailed your questions get, the bigger the change it will get them wrong, even if that knowledge should have been in the training set. For example ChatGPT as walkthrough for games is pretty terrible, even so there should be more than enough walkthroughs in the training set to learn from, same for summarizing movies, it will do the most popular ones, but quickly fall apart with anything a little lesser known.
There is of course also the possibility that using the LLM as knowledge store by itself is a bad idea. Humans use books for that, not their brain. So an LLM that is very good at looking things up in a library could answer a lot more without the enormous models size and training cost.
Basically, there are still a ton of unexplored areas, even if we have collected all the digital books.