Newsletter Archive
Listen to Our Podcast
Dear Aventine Reader,
Welcome to the online version of our first newsletter! Each month we’ll take a close look at recent technological and scientific advances that will affect the ways we will work and live in the decades to come.
We’ve chosen to publish this monthly because it’s often easier to get an accurate sense of what is – or was – important with the benefit of a little time. This is especially true when it comes to science and technology, subjects that tend to be covered with a fair amount of fear and hype.
So with the advantage of hindsight, we hope to deliver a useful digest of meaningful recent developments, along with some context to help you make sense of them. If you'd like to receive subsequent newsletters by email, there's a sign-up button at the very bottom of this page.
Thanks for joining us,
Danielle Mattoon
Executive Director, Aventine
Confronting the Endless Hallucinations of ChatGPT
Jonathan Turley, a law professor at George Washington University, was unnerved. He’d received an email from a lawyer in California who had been informed that Turley had a record of sexual harassment — a claim the lawyer said was substantiated by a March 2018 article from The Washington Post. Yet both the assertion and the Washington Post article were completely made up. They were fictions created by ChatGPT, OpenAI’s artificial intelligence chatbot when it was asked about sexual harassment at American law schools.
This was a particularly troubling example of what’s become known as an AI hallucination. The term has exploded into the public consciousness since the November 2022 release of ChatGPT and the subsequent release of other large language models (LLMs) that produce highly convincing prose. Experts agree that these hallucinations can be defined pretty straightforwardly as an artificial intelligence model asserting something that is not true. They also say that hallucinations are an all-but-inevitable byproduct of the way large language models are designed and that there’s no straightforward way to eradicate them.
David Ferrucci is the computer scientist who led the team that built IBM’s Watson, the artificial intelligence system that beat Ken Jennings in “Jeopardy!” in 2011, and is now the CEO of an AI startup called Elemental Cognition, which is attempting to combine logical reasoning with large language models. (Listen to Aventine’s podcasts on Watson’s rise and fall here.) He pointed out that the hallucinations produced by LLMs are an inherent byproduct of their architecture. Trained on huge quantities of text, they learn to identify probabilistic patterns of how words are arranged; given a prompt, they construct entirely new sentences, word by word, based on what word the model assesses is most likely to come next.
While the technology that fueled Watson's victory was very different from the large language models of today, it possessed a critical capability that the new models lack: it could communicate how confident it was in its answers. This feature was so important to Ferrucci that he programmed Watson to reveal its level of confidence in each Jeopardy! answer it gave. He wouldn’t be able to do that with with a typical large language model.
“It's not judging facts. It's not selecting things based on confidence that those generated facts are, in fact, justified,” he told us of the new technology. "What's sort of interesting about the concern around hallucinations is that that's what [LLMs] are designed to do. They're designed to generate sequences of words that don't necessarily appear in any particular place,” he said. “In some sense, they're always hallucinating.”
Aleksander Mądry, the director of the MIT Center for Deployable Machine Learning, shared a similar sentiment. The only difference in how those constant hallucinations are judged, he said, is how humans perceive them. If an LLM creates a sentence that contains accurate assertions we praise its ability to reason; if it creates something incorrect, we brand it an hallucination. As researchers have increased the size of LLMs, the algorithms have become better at predicting which word should go next based on all the data they have been trained on. But there is still no mechanism inside the algorithms to prevent them from getting things wrong.
OpenAI, which did not respond to a request for an interview, freely admits that this is a problem. Writing about its most recent LLM release, GPT-4, the company explained that the system “still is not fully reliable (it ‘hallucinates’ facts and makes reasoning errors). Great care should be taken when using language model outputs, particularly in high-stakes contexts.” Microsoft, which uses OpenAI’s GPT models to power its Bing Chat service, has warned that ChatGPT’s “primary function is to reproduce patterns in text, not to actively consult sources to provide accurate information.” Microsoft declined a request for an interview.
The problem with what LLMs say when they hallucinate is exacerbated by the way they say it, which is with unnerving fluency. The text that they produce is well-written, well- structured and grammatically sound. It gives the impression, in other words, that it comes from an intelligent and informed source.
“Humans, they associate credibility with fluency,” said Ferrucci. “If it sounds good, it must be right. I mean, that's the power of rhetoric, right?"
Rodney Brooks, a robotics and AI expert who was a co-founder of the consumer robotics company iRobot, recently told IEEE Spectrum that humans’ ability to assess a person’s skill in one area as a way to judge their broader competency has long been an evolutionary advantage that is not transferable to LLMs. “We see a person do something, and we know what else they can do, and we can make a judgment quickly,” he said. “But our models for generalizing from a performance to a competence don’t apply to AI systems."
How the LLM is being used obviously affects how much we might care about all of this. If an LLM writes a script for a fantasy role-playing game, it’s fine — desirable, even — if it makes things up; if it’s writing a quarterly memo for your CEO, probably not. And if it’s being used to draft political rhetoric to be distributed on social media? We should probably worry.
Sadly, attempts to eradicate hallucinations are “basically an open research problem,” said David Krueger, an Assistant Professor at the University of Cambridge who specializes in AI safety. He added that “there hasn't been as much success as people have hoped.”
Fine-tuning the model in different ways can reduce the propensity to hallucinate, but early research from the Hebrew University in Jerusalem suggests that such adjusted models are still open to being tricked into providing spurious outputs. Some generative AI search engines, such as Microsoft's Bing Chat, attempt to provide in-line citations to back up their claims, but early research from Stanford University suggests that those attempts are hit-or-miss: just 58.7 percent of sentences generated by Bing Chat were fully supported by citations, according to the research, and 10.5 percent of citations don’t support their associated sentence.
There may be other ways to augment LLMs to overcome the hallucination problem. Mądry, for instance, has started to think about how additional systems could help demonstrate the factual grounding of an LLM’s output, telling users which parts of the model and the training data were most important in creating a particular answer. Ferrucci explained how other artificial intelligence systems could be paired with an LLM system to judge how accurate the output is — say, by studying outputs for geographical or temporal consistency, or other qualifiers and quantifiers, which is akin to what IBM’s Watson did in order to win “Jeopardy!”
But this work is nascent. The only straightforward way to eradicate hallucinations from LLMs altogether, said Krueger, would be to create a list of outputs that have been verified as true by humans — in other words, a database — which would seem to undermine the point of building an AI in the first place. For now, then, the problem of hallucinations remains a reality with which we must contend — and, over time, seek to tame.
“I view [the hallucination situation] mostly as emblematic of where we're at with our ability to control AI systems and ensure that they are safe and trustworthy and reliable,” said Krueger. “And we don't really know how to make AI safe.”
Advances that matter
Apple finally unveiled its VR headset. The company would rather you called its much anticipated new Vision Pro a “spatial computer,” but really it’s a virtual and augmented reality headset that uses 4K displays, 12 cameras and depth sensors to blend virtual reality with the real world. By all accounts, it is deeply impressive: Nilay Patel of The Verge describes how, during a trial, the headset “automatically detected my hands and overlaid them on the screen, then noticed I was talking to someone and had them appear as well. Reader, I gasped.” The downside? It’s still not clear what this technology is for. So on the one hand, this piece of equipment is going to cost at least $3,500 and is still looking for a useful application. On the other, it is built by the company that turned the iPhone, iPad and Apple Watch — expensive yet seductive products that nobody really asked for — into consumer staples. This seems to be the most advanced immersive computing experience ever to be put on sale, and its success or failure could prove to be a pivotal moment in the future of virtual and augmented reality.
Brain signals can be decoded into sentences with AI. Researchers from the University of Texas at Austin published a study in Nature Neuroscience describing what they call a “non-invasive language decoder.” The system is based on OpenAI’s first GPT large language model, released in 2018, and is trained on data acquired from fMRI scans taken as subjects listen to 16 hours of audio stories. It is then able to generate word sequences that describe, with reasonable accuracy, the brain activity of the same subjects as they listen to a podcast or watch a silent film. As STAT reports, it doesn’t make sense to call this a mind-reading system, but it’s a huge advance over the current state-of-the-art for reading thoughts, which requires invasive surgery to place sensors on the brain.
Robot-enabled IVF produces its first babies. A rash of startups is racing to shake up the IVF industry to make in vitro fertilization cheaper and more accessible. MIT Technology Review’s Antonio Regalado reports that one company, Overture Life, has tested a sperm-injecting robot that successfully fertilized a number of eggs creating two baby girls. The long-term goal of Overture and rivals such as Conceivable Life Sciences and Fertilis is to automate the IVF process so that it doesn’t require hours of dedicated work by expensive embryologists. Instead, the startups hope, a machine could fertilize, freeze and even nurture embryos, significantly reducing costs and widening access to the treatment. As it stands now, IVF is a complex process and a start-to-finish automated system is still a long way off. But Overture’s success shows that it’s possible to bite off a small part of the challenge; if the resulting technologies can be stitched together, automating large parts of the process could one day become reality.
Cooling the Planet Using Aerosols
In the wake of a volcanic eruption, sulfurous gasses create clouds that scatter solar radiation and can reduce the heating effect of the sun on the planet. Humans could create the same effect through a process known as stratospheric aerosol injection, which entails the release of sulfur compounds into the stratosphere. In fact, the Intergovernmental Panel on Climate Change has written that this "is the most-researched [solar geoengineering] method, with high agreement that it could limit warming to below 1.5 degrees centigrade."
Yet it is also controversial. The list of objections is long and most are rooted in the fact that both the short- and long-term effects of such an experiment are difficult or impossible to gauge. There is also the concern that creating this sulfur-based reflector will discourage ground-level efforts to reduce carbon emissions.
The pushback is real. A planned trial to test some elements of the technology as part of the Stratospheric Controlled Perturbation Experiment (SCoPEx) — perhaps the best-known project of this kind, based atHarvard University — was due to take place in Sweden in 2021, but was postponed after backlash from local residents and environmental groups. Meanwhile, a startup called Make Sunsets has recently angered many researchers in the geoengineering community as a result of forging ahead and undertaking its own ultra-small-scale experiments without warning or consultation. While the scale of both such experiments is negligible compared to the impact of a volcanic eruption, there is fear of precedent setting and normalization of the practice being carried out without due consultation.
Given the complexity of the topic, we sought the opinions of a swath of experts to understand how they felt about the implementation of the technology. Here are edited extracts from what they told us.
No statement about [solar geoengineering] can be meaningful without some choices. And I often divide this into three basic choices: how much we're doing, what method we're using, and the spatial distribution. Given that, I think, a fair read of the literature … is that if you did it pretty evenly east to west to north to south, if you did it with stratospheric aerosols, and if you did it as a complement to emissions cuts … then there's really a lot of evidence from a whole host of climate models — and there’s no strong counter evidence — that solar geoengineering would reduce many of the key climate hazards.”
— David Keith, a professor and leader of the Climate Systems Engineering initiative at Chicago University.
I think there's consensus that if you increase the albedo [or reflectivity] of the Earth, you would produce cooling. Where there's not consensus is how it's going to work, if it's going to work, and whether it's a good idea … And I'm skeptical [that this is anything more than a theoretical issue right now]. You're going to have to have something like the Manhattan Project to get this thing done quickly. Some government has to decide, ‘we really want to do this. We're gonna dump a whole lot of money into this.’ And then maybe it will work, maybe it won't … There’s just a ton of uncertainty.”
— Karen Rosenlof, Senior Scientist for Climate and Climate Change, National Oceanic and Atmospheric Administration
We've been trying to mitigate and adapt for decades, and we haven't been able to do so. And so at least considering this as an option is something we should think about, right? My perspective is that we should do the research so we can understand the potential benefits and harms of this technology, and then make an informed decision about next steps — whether that be to shut it down entirely, to fund research more, scale up or whatever. I don't know what the answer is to that, and anyone who says they do is being disingenuous because the science just isn't there yet.”
— Sikina Jinnah, Professor of Environmental Studies at University of California, Santa Cruz
There's a mortality cost to carbon, not only people but also extinctions. One might argue, and I'm increasingly arguing [this], that it is morally wrong for us not to prevent extinctions and deaths where we can. And if you accept that, and you accept the scientific consensus and evidence from nature that putting sulphur dioxide into the stratosphere creates cooling, then it’s not a big leap to argue that there is a moral imperative for us to do this.”
— Luke Isman, founder of Make Sunsets
We have to quantify the potential benefits and risks and decide which is riskier — doing it or not doing it — and we don't have enough information yet about that. […] Let's make [this] clear. Global warming is real. It's caused by humans. And it's bad, we're sure. [Stratospheric aerosol injection] is not a solution to global warming. The solution is to leave the fossil fuels in the ground and not continue to dump carbon dioxide and methane other gasses into the atmosphere.”—
— Alan Robock, a professor in the Atmospheric Science Group at Rutgers University
Technology’s Impact Around the Globe
1. New Delhi, India. Digital payments may have taken off in India, but there was a problem when it came to many mom-and-pop-style stores: the process of confirming payment could be insufferably slow. Sometimes that was because illiterate sellers couldn’t read the confirmation text messages and had to seek help; other times it was because the SMS limits of feature phones meant the messages didn’t arrive in a timely manner. Now tech companies have built out infrastructure using SIM-enabled smart speakers to help solve the problem, Rest of World reports from New Delhi. The speakers, which audibly confirm sales to vendors and customers by reading out alerts, are opening up modern payments across the entire economy — and creating a new revenue stream for fintech companies in the process.
2. Nanyuki, Kenya: For years, U.S. students who were too lazy to complete their college assignments have been able to farm them out to essay writers in other countries who were willing to provide cheating as a service in return for payment. This practice dates back many decades to informal agreements between friends, but the rise of the internet saw so-called contract cheating become a globalized business, and the majority of people writing essays in return for pay appear to be based in Kenya. Prices can vary dramatically, from tens of dollars per 1,000 words to just a couple of dollars for the same amount. The industry has been estimated to be worth up to as much as $1 billion globally. But Martin Siele of Rest of World reports that the market may be turning: One essay writer based in Nanyuki, Kenya, claims to now earn less than half his previous income since ChatGPT was launched.
3. New York City, U.S.: When a parking garage collapsed in the Financial District of Lower Manhattan this past spring, an unusual first responder was on the scene: a robotic dog built by Boston Dynamics. Along with several aerial drones, it was tasked with using its thermal imaging cameras to help operators identify survivors in areas that weren’t safe for humans to traverse. There’s been some opposition to its use on privacy grounds, but officials argue that it doesn’t record any video data.
A Guide to Quantum Cryptography
At the start of the year, a non-peer reviewed research paper published to the pre-print server arXiv suggested that quantum computers might finally be ready to break some of the strongest encryption used on the internet today. Many researchers pointed out that some of the assumptions made in the research meant that it still wasn’t really time to worry, but it raises a big question: How concerned should we be about the fact that quantum computers could one day undermine the systems that currently keep the internet secure? Not only that, but why is that even a potential problem anyway? And can quantum computers provide a solution to the problem they create?
To get a grounding in these issues, we asked Artur Ekert, a professor at Oxford University specializing in quantum cryptography, and Wenmiao Yu, a co-founder of quantum cryptography startup Quantum Dice, for a recommended reading, listening and watching list for anyone interested in learning more on the subject.
“The Quantum Revolution: Q-Day,” by The Financial Times. This 26-minute podcast — one in a six-part series about quantum computing — is a great start for beginners. It quickly covers the basics: how quantum computers threaten classical encryption, the new algorithms that may mitigate that risk, and why quantum computers stand to transform the way we communicate.
“Quantum Cryptography: The Ultimate Physical Limits of Privacy,” by Artur Ekert. For a deeper introductory dive, try Ekert’s hour-long lecture, which looks at the history of cryptography and quantum theory.
“Cryptography’s Future Will Be Quantum-Safe. Here’s How It Will Work,” by Quanta. Given the power of quantum computing, will classical encryption algorithms remain safe forever? Many researchers believe it’s perfectly possible to build systems that can stand up to them for some time; this 1,200-word article explains how.
“The Quantum Internet,” by Stephanie Wehner. Wehner is professor of quantum information at Delft University of Technology, and this 15-minute TEDx talk does a fantastic job of explaining the quantum communication infrastructure required to enable a future where quantum computers help us communicate securely.
“Cryptographers Achieve Perfect Secrecy With Imperfect Devices,” by Quanta. A particularly tricky concept to get your head around is that quantum cryptography doesn’t even require that the devices used for communication be secure. This 2,700-word article describes how scientists have proven that to be physically possible.
“Are You Ready for Quantum Communications?” by Boston Consulting Group. This 2,000-word article is good for executives and investors who want to make sure they’re equipped for the future. It presents the best predictions for when and how these technologies will actually roll out.
Still want to learn more? Then you could always dig into this full lecture series hosted by Delft University of Technology. Ekert calls them "the ultimate lectures on quantum crypto.”
Magazine and Journal Articles Worth Your Time
Chemists are performing molecular surgery, from Nature
3,000 words, 12 minutes
For decades, chemists tasked with building complex molecules, such as pharmaceutical drugs, have had to develop complicated step-by-step chemical reactions to assemble atoms into the necessary structures. Every time they attempt to create a new drug the painstaking process must begin from scratch. Now, however, a shortcut is on the horizon. An emerging field of chemistry called skeletal editing is allowing scientists to test out new molecular structures by adding, removing or swapping individual atoms buried inside already existing structures. It’s still early days for the technique — it doesn’t work for all molecules, and can so far only perform specific kinds of edits — but it could allow chemists to build previously inaccessible molecules and speed up the creation of new drugs.
The reinvention of the grid, from The Economist
10,000 words, 45 minutes
As nations around the world race to add renewables capacity, they come up against a stubborn issue: the electricity grid simply wasn’t designed for the job. In its latest Technology Quarterly, The Economist takes a long, hard look at the problems with existing grid infrastructure, the new technologies that will help solve the problems, and the complex political and economic reality that makes all this even harder to put right than you might expect. Buckle up for a long read, but this is well worth it.
A long road to longevity drugs, from proto.life
4,500 words, 20 minutes
How do you test a pill that could help you live longer? That’s not a joke, but one answer to the question — “slowly!” — sure makes it sound like one. The obvious ways of testing longevity medications takes a lot of time, because you have to wait for people to age, and as a result the trials are expensive. This story takes a look at how running clinical trials for specific diseases is probably a better route to establishing which drugs could one day extend all of our lives.