LLMs are reasoning engines that mimic expert responses in nearly any domain. However, sometimes the plausible-sounding output is nonsense on closer inspection.
How can we overcome this issue?
Constrain output. Constraining output puts guardrails on the answer space from which an LLM can sample. ReLLM and ParserLLM are two examples of this strategy, constraining LLM output to a regular expression or context-free grammar.
Use for search. This delegates the ultimate test of truthfulness to the underlying dataset. An example would be using LLM and a vector database to do a smarter semantic search over a dataset.
Use for ‘Hard to Computer, Simple to Verify’ problems. Adding a formal proof verifier, a reproducible build system, or functional programming to LLM pipelines can retry output until it passes a specific test or threshold. See more on stochastic/deterministic.
Hallucinations as a feature, not a bug. This is illustrated in media (e.g., images, fiction writing, music) where we expressly do not want to reproduce an existing work. It helps answer “what if” remixes of disparate things.