As a daily user of LLMs, I’ve been missing this feature: qualified judgements, by the LLM itself, on its own output.
The LLM seems to be programmed to assume that whatever it tells you, is the case. There are very few probability shadings. While the output from capable humans is always qualified by probabilities: X is certain, Y is doubtful, Z is very uncertain.
This could be useful for semi-automatic processes, where a human reviews (for example) the classification of expenses done by a LLM. If the LLM could put the classifications in different bins, one for “near certain”, one for “somewhat sure” and another one for “wild guesses”, it’d be much easier to review errors. Even if the LLM is not great at assigning probabilities, it sure must be better (for many use cases) than the assumption that what it’s telling you is correct.