Noam Chomsky said that the false promise of chatGPT is true
What Are Large Language Models Really Worth? Inconsistencies and Asymmetries in Using Machine Learning Models to Investigate Negative and Positive Responses
The recent release of large language models comes with a new conversation about what these models can do. The models were presented as amazing, mind-blowing, self-contained, and even resembled consciousness, as claimed by fascinated evangelists. However, such hype is not much more than a distraction from the actual harm perpetuated by these systems. People get hurt from the very practical ways such models fall short in deployment, and these failures are the result of choices made by the builders of these systems – choices we are obliged to critique and hold model builders accountable for.
It makes sense that the failure to handle negotiation is a vulnerability in LLMs, given that the faux pas was made by the search engine. Allyson Ettinger, for example, demonstrated this years ago with a simple study. When asked to complete a short sentence, the model would answer 100% correctly for affirmative statements (ie. “a robin is..”) and 100% incorrectly for negative statements (ie. A bird is not a bird. The models could not distinguish between the two scenarios and could only give the same responses in both cases. This remains an issue with models today, and is one of the rare linguistic skills models do not improve at as they increase in size and complexity. Widespread concerns have been raised about how much such artificial language models operate as a trick mirror, learning the forms of the English language without any of thelinguistic powers of real understanding.
Additionally, the creators of such models confess to the difficulty of addressing inappropriate responses that “do not accurately reflect the contents of authoritative external sources”. There’s a scientific paper on the benefits of eating crushed glass, and a text on how crushed porcelain added to breast milk can support the infant digestive system, for example. Stack Overflow had to temporarily stop the use of chatGPT generated answers because it became clear that the LLM could not answer certain coding questions.
Yet, in response to this work, there are ongoing asymmetries of blame and praise. Tech people say that a mythically autonomously functioning model is a technological marvel and model builders say it’s impressive. The human decision-making involved in model development is erased, and model feats are observed as independent of the design and implementation choices of its engineers. But without naming and recognizing the engineering choices that contribute to the outcomes of these models, it becomes almost impossible to acknowledge the related responsibilities. Functional failures and discrimination are all framed as being devoid of engineering choices due to society at large, or supposedly “naturally occurring” datasets. But it’s undeniable they do have control, and that none of the models we are seeing now are inevitable. It would have been entirely feasible for different choices to have been made, resulting in an entirely different model being developed and released.
The programs are incapable of differentiating the possible from the impossible because they are unlimited in what they can learn. Unlike humans, for example, who are endowed with a universal grammar that limits the languages we can learn to those with a certain kind of almost mathematical elegance, these programs learn humanly possible and humanly impossible languages with equal facility. Machine learning systems can learn a lot more about the structure of the planet than humans can. They trade merely in probabilities that change over time.
For this reason, the predictions of machine learning systems will always be superficial and dubious. For example, they might wrongly think that John is so stubborn that he will not talk to anyone. Rather than that, he is too stubborn to be reasoned with. Why would a machine learning program pick out something so odd? Because it might analogize the pattern it inferred from sentences such as “John ate an apple” and “John ate,” in which the latter does mean that John ate something or other. The program might well predict that because “John is too stubborn to talk to Bill” is similar to “John ate an apple,” “John is too suborn to talk to” should be similar to “John ate.” Interpretation of language is difficult and can be learned only from big data.
Perversely, some machine learning enthusiasts seem to be proud that their creations can generate correct “scientific” predictions (say, about the motion of physical bodies) without making use of explanations (involving, say, Newton’s laws of motion and universal gravitation). But this kind of prediction, even when successful, is pseudoscience. While scientists certainly seek theories that have a high degree of empirical corroboration, as the philosopher Karl Popper noted, “we do not seek highly probable theories but explanations; that is to say, powerful and highly improbable theories.”
The theory that apples fall to earth because that is their natural place (Aristotle’s view) is possible, but it only invites further questions. (Why is earth their natural place?) The theory that apples fall to earth because mass bends space-time (Einstein’s view) is highly improbable, but it actually tells you why they fall. True intelligence is demonstrated in the ability to think and express improbable but insightful things.
Why has the internet been so cluttered in 2016? Implications of the lack of moral principles for chatGPT (and many more)
The internet was polluted with misogynistic and racist content in 2016 due to online troll who filled it with offensive training data. How to solve the problem in the future? In the absence of a capacity to reason from moral principles, ChatGPT was crudely restricted by its programmers from contributing anything novel to controversial — that is, important — discussions. It sacrificed creativity for a kind of amorality.