A software engineering view of AI from ICSE 2024
ICSE is the premier international conference on Software Engineering research attracting attendees from academia and industry. It includes several different tracks such as Software Engineering in Society, Software Engineering in Practice and Software Engineering Education. It also has various co-located workshops and conferences such as CHASE, Technical Debt, and Mining Software Repositories. Many people attend several of these events and the observations here are informed by both formal sessions and informal interactions with the wider group of attendees. In 2024 ICSE attracted over 2200 participants from 63 countries.
Key theme
This horizon scan focuses on one strong theme that ran through much of the discussion and presentations at the conference: The role of artificial intelligence, and large language models (LLMs – think of ChatGPT and Copilot as examples), in particular, in the software engineering field.
Many presentations and discussions focused on artificial intelligence1 – both the development of AI systems and their use in software engineering Many questions were raised, some of which have been asked before of AI systems, such as how we know the basis for an AI made decision. As AI technology evolves, the answers to these questions change, and the details become more complex.
A key thread in discussions was that improved technology for developing software doesn't necessarily mean that our products will be improved. How the tools are deployed and how humans interact with them will impact on their usefulness and effectiveness. For example, should AI tools support developers in their work or make decisions for them? What kind of decisions could be automated? How can people trust an AI system? What needs to be taken into account so that trust can be embedded into an AI system?
These issues have been raised before about software systems. For example studies have shown that the impact of automation2 will be widespread, and its associated “ironies”3 apply in many situations such as navy war ships4, industrial processes5 and software test automation6. The role of trust in software is a key concern7 – think of the role trust played in the recent Post Office scandal8. But with AI the issue seems to be so much more complex.
Trust is difficult to measure and the level of trust expected by the user is variable depending on context, e.g. the level of trust for a car navigation system is different from the level of trust required for a banking application. One speaker stated that we shouldn’t aim for a system to have 100% human trust, as humans need to remain part of the decision-making process . One panel asked about the role of humans, asserting that research should focus on what the human can do best and what AI would be best at performing. This is a key point. We have some answers for this but there are still open questions. AI systems are dynamic and can learn continuously, hence changing and evolving their behaviour, so adding AI to the mix adds risk and means Ai AI systems are definitely not good for urgent decision-making. In these contexts, the human should remain responsible for decisions, and significant decisions need to be explained, able to be overridden, and reversible. “Human-machine teaming” is a better approach to designing AI systems, i.e. to see the tools as another partner rather than as simply automation.
Two keynote talks focusing on LLMs, which were presented back-to-back, produced some insightful discussions and food for thought. One was from an academic and one from a practitioner. They both had interesting perspectives to share. A key point raised in the first was whether LLMs can “understand the meaning” of a domain. An experiment was run using a simple artificial world made up of blocks, in which the AI system could move blocks around, e.g. stack them, place them side-by-side or behind each other and so on. The LLM was challenged to solve different problems: one simple (step-by-step) and one complex (requiring non-linear processing). The LLM in this experiment was able to solve the more complex problem and so the researchers concluded that understanding was present in some limited sense. Personally, I remain unconvinced, and anyway a world made up of blocks has little in common with the real world.
The second keynote struck a different note. This presenter focused on where LLMs work and how to make them work better. LLMs deal well with ambiguity because they fill in any knowledge gaps based on a learning set. They summarise information across many sources, demonstrating perhaps limited intelligence but with a lot of knowledge. For something like programming, it’s more efficient cognitively to ask an LLM to provide a first draft and then correct the answer. But LLMs can't make long range decisions, and acquiring context is hard.
This keynote concluded that humans won't be replaced by LLMs any time soon. Artificial Intelligence, at least in its LLM form is just another tool. It can't solve social problems or reliably identify human emotions. Humans are still better at exposing bias, identifying downstream impacts, responding to change, socio-political nuance and taking context into account. It is particularly relevant for applications using machine learning because all data has biases, and we should continue to be aware of bias in these applications.
While AI systems pose some threats to creative activities, they also have great possibilities to enhance what we can do. In contrast to some of the media coverage around AI systems, the consensus that I heard in this community was that AI presents many opportunities, but the technology is still limited in application and we need more understanding of how to embed AI systems in human-machine teams.
Why this matters for business agilists
This is of interest to business agilists because of the continuing debates in professional and popular media about the use of AI and its possibilities — in particular, about the role of humans in an AI world. Although artificial intelligence has been around for many years, its latest incarnation in machine learning and large language models will impact business in various ways. The debates at ICSE emphasise how these impacts need to be embraced by businesses, but with a clear recognition of their limitations.
AI systems such as LLMs are here to stay and are evolving rapidly. Copilot, for example, is already used by many software developers and ChatGPT has sparked a range of reactions in the creative industries and in radio and television programmes. Their presence will affect how work is done, what capabilities are needed and how products are designed and deployed. Apart from the big questions of trust and automation raised above, there are several other areas that agilists need to be prepared to face if AI systems become part of everyday practices, both technical and organisational, including:
-
What would be the impact on our everyday processes (scrums, retrospectives), and job roles (Scrum masters) etc?
-
What would it mean for the financial liability of companies if the systems fail, for example? If the code was written, debugged, or revised by an LLM who is accountable? What if you fire a human software developer for LLM mistakes and are taken to court?
-
The keynotes highlight the role of the human-in-the-loop (HITL). This could become crucial to an organisation’s strategies for recruiting and retaining staff, requiring a significant cultural change in existing agile practices across the organisational functions (not just IT or software development).
-
How do business agilists assess themselves as the humans in the loop for this LLM-driven future? Are you ready for this change? What do you and your organisations need to do to be ready for it?