top of page

AI for Evaluation: a way out of the bullshit?

  • Writer: Sonja Wiencke
    Sonja Wiencke
  • Sep 5
  • 7 min read

Updated: Sep 6

Amid all the excitement and panicking about how ChatGPT could take evaluators’ jobs, here is why I’m excited: I’m excited about AI separating bullshit from valuable work. And I’m hopeful that with AI making it cheap and easy to produce bullshit, we can finally focus on doing valuable work instead — that is, if that is indeed what clients and funders want.

Let me explain that.

The below is based on a conversation with the ever-brilliant Rosie Frost. We used to be “the techies” of the MEAL team, and we are both focused on using better digital tools to help NGOs do better work.

Quick disclaimer because you know how readers on the internet operate — no, we have not reviewed every single AI tool out there that has possible use for MEAL. We are both curious and watching how it develops. We have not had the opportunity to test out some of the data analysis-oriented tools, that might be an easier place to start than the design-stage work we’re discussing in this article. Just take the following points for what they are — thoughts based on early experiments.

ree

There are four sections to this post: 1) What generative AI can (and can’t) do for MEAL, 2) what bullshit is and how AI perpetuates it, 3) why that’s potentially a good thing and also 4) problematic. We’re closing with some high-level thoughts on how to get out of the bullshit, and would love to hear more views on those.

1) What MEAL work can generative AI do at the moment?

Some of my smart evaluation colleagues have figured out that tools like ChatGPT can produce fairly convincing MEAL content. Andi Pawelke looked at generative AIs in project design and concluded that at the moment, AIs could enhance the design process through efficiency gains and synthesis of more information, but not produce good project designs by themselves. Arbie Baguios (from Aid Re-imagined) tested ChatGPT for project- and logframe development, and the results look remarkably similar to actual humanitarian projects. The impressed and jaded comments on this post echo an argument made by Silva Ferretti in this AEA blog series:

ChatGPT can make such convincing “project proposals” because the aid sector conventionally works with jargon, lofty concepts and artificial “streamlining” of messy real world facts into smooth frameworks.

I agree with all of their points and decided to play around with ChatGPT on actual projects that I’m working on with actual NGOs in Dominica. The results are not bad at all — when I included a summary of activities of my client’s project in the prompt, it produced a logframe that had very similar outputs and outcomes to the human-produced logframe my client is using. I dare say I have seen worse logframes get approved for grant funding. :)

So that’s great, right? No more time wasted thinking up result statements to please a donor. No more debating with colleagues whether skills building is genuinely an outcome. No long workshops that make everyone’s brain hurt as we try to work out how this project will actually make change.

But, look at those AI-generated logframes with an evaluator’s eye. Our job as evaluators is to look at a project, or a report, or a proposal, and take out the magnifying glass to say: “Really though?”, or more professionally: “Why do we think this, where is the evidence?”

And then you realise there is only one thing that generative AIs are extremely good at right now, and that is generating bullshit.

2) About bullshit

I don’t mean that in an insulting way at all, I am talking about the academically established definition of bullshit, as developed by Harry G. Frankfurt, a philosophy professor at Princeton University. He speaks of bullshit as “speech intended to persuade without regard for truth”. It differs from lying because it’s not necessarily intended to deceive, but to suit the purpose of communication irrespective of whether it’s true. I have often encountered bullshit as a way to obfuscate or avoid talking about the truth.

Ironic case in point: here is an article on LinkedIn that is written by an AI, about how to use AI in evaluation. And it is, very clearly, bullshit — the content of the article is not untrue, but it also doesn’t contain any points that would be beyond common sense, or even anything concrete enough for readers to apply in their work.

Probing my client example above, I asked ChatGPT about specific risks associated with this NGO project given the Dominican context, and well:

Similarly here, what ChatGPT produced is not a lie. All of the above are indeed potential risks. But they’re also so generic that they could apply to almost any project in almost any country. What ChatGPT produced fulfils the communication need of a typical grant application template — a paragraph about risks. They are written in a tone and style that makes them sound reliable. In other words, the text above can “persuade without regard for truth” — it’s bullshit.

In Rosie’s words, what differentiates bullshit from a quality project proposal is the thinking behind it.

Bullshit = a text that makes sense, grammatically and semantically, about the topic. A project proposal = a text that accurately expresses the reasoning and logic of this project based on real-world data.

An AI can’t reason. The language-based AIs that are easily accessible at the moment can essentially regurgitate information that they have been trained on, in whichever format you prompt them to. (That’s why my latest pet peeve is people using words such as “oh wow this AI knows/understands xyz” — AIs don’t know anything, they repeat).

AIs can also not generate logic, in the sense of the ideas and causal relationships behind logical frameworks or theories of change. They can’t do that because they don’t “know” what factors make a project work in a given context. The ChatGPT-generated logframe is really just a synthesis of all the text its been able to ‘read’ from the internet — the documentation from similar projects that have been put online by other NGOs. (And it isn’t giving you the project evaluations along with those ideas, so they may well be bad logframes).

Lastly, the AI will not have access to the real-world data that your quality NGO proposal requires. All of the posts by my evaluator colleagues above already highlighted a version of this point: The main limitation of AI is that it has “limited context knowledge”. I just want to stress here that this is a potentially disastrous “limitation” — the AI that just designed my risk management framework has literally no clue whether my country is likely to experience political unrest or not. It’s just taken the content from many general notes about “developing countries”, synthesised it and dropped it into my proposal. And given how much of the information available to Chat-GPT & Co about my country is written by white men in consulting roles, that is quite limited context knowledge indeed.

AI has no regard for the truth, its imperative is to produce content that looks and sounds like everything else it has processed. As you ask it to generate more and more specific risks, lacking accurate source material, there is every chance that it will start making things up.

More on this in a future blog post from us, probably.

3) What’s good about AI generating bullshit

I’ve been enjoying using AI for the past few months. There are parts of my job that still involve bullshit, as much as I try to avoid it — ChatGPT is surprisingly good at drafting my narrative reports, for example. And my favourite bit is the part of every grant application where the funder wants you to justify how your project fits within their strategic priorities — AI has much more patience for explaining this than I do.

4) The problem with bullshit

As a sector, though, I think this is an opportunity to look really closely at the amount of bullshit that we produce. I shouldn’t be able to give an AI a two-sentence prompt and receive a project report congruent with industry standards, to the extent that funders have accepted it. I shouldn’t be able to automatically generate a fully regurgitated logframe that looks so similar to real projects that it could probably fool people.

What does that say about us as professionals? We’ve been copying project designs and producing meaningless sentences for so long that computers can imitate what we do for a living.

AI will be “coming for your job” as this clickbaity article suggests, if your job is to produce documents.

But as evaluators, our jobs should not be to produce documents. Our jobs should be to be the extra brain on the project, to critically assess what’s going on, and to gather the right information from the real world. AI can’t do any of that.

We’re humanitarians/development professionals, we should be busy figuring out how we make the world fairer, less poor, less violent, less climate-destroying. Instead, we’re busy producing documents that an AI can now produce — we’re busy producing bullshit.

That isn’t what I want to do for a living. What I want to do is generate new insight, look at new data, and find new ways to approach a problem. What I want to do is contribute to improving the world, by proving what works and what doesn’t.

5) A way out of the bullshit

The good thing about AI, I argue, is that it can expose the extent to which we have made bullshit our full-time jobs. At the moment, I’m using AI to save time on tasks that are bullshit-based, so that I can spend more time on tasks that require genuine critical thinking. Given that everyone is now aware of how easy it is to produce text-based documents that sound smart but have no insightful content, I’m hoping that the sector will move away from relying too much on those. Just like teachers currently have to find new ways to assess whether their students have learnt something because the students can just generate essays with ChatGPT, the development sector will have to find new ways to assess its work.

We may make a whole post about this, but speculatively this could mean: getting rid of traditional project proposals and instead requiring systems maps with real-world evidence for project design choices. Getting rid of narrative reports (because who wants to read those, really!) in favour of “hard” data, maybe annotated dashboards, or ethnographic information.

Better ideas? Drop them in the comments :)

I think AI is an invitation for us to get out of the bullshit. For that, we first have to get better at recognising bullshit (I just tried to probe ChatGPT for tips on that, but I suggest you build your own list of criteria). And we need a sector-wide understanding that bullshit is — well, bullshit, in the sense that it will keep us from making progress on any of the big issues the sector is trying to tackle. We can’t edit away climate change with yet another document that regurgitates the risks it poses to humankind, we actually have to get things done.

We don’t have time for bullshit.

Thank you for reading to the end of this very much not AI-generated blog post. If you’re a human, please leave a comment underneath with your very own original thoughts on this.

Comments


bottom of page