Supercharged, Not Superseded: The Unexpected Future of Developers in an LLM World
Software Development has existed since time immemorial, well, that’s what it feels like in this accelerated, high speed age, in reality, programming has its roots in the ’60s and only been established as a career as recent as the ’80s. What it means is that, it’s a really young discipline and as such, it’s growing and new practices are arising very rapidly. In its few decades of existence, software engineering has gifted humanity a lot of really useful and seemingly impossible stuff - From Airplane Autopilots, to missile guidance systems, to autotune and autocomplete, and let’s not forget - social media. However, in recent years a new technology has arrived - LLMs.
Large Language Models takes megawatts of power but what they produce resemble intelligence, and the mode of communication with a LLM is, well, language. Since it can accept and produce arbitrary “tokens”, those tokens can represent symbols, characters, pixels or even code. As such, we essentially have automated systems producing code at our whims. Which looks like Software Development as a career career is transient and might even be obsolete.
Some LLMs like Claude are really good at “coding” which leads a lot of really educated individuals to believe that Software Engineering as a career is in danger. There are some good arguments for it - LLMs don’t need breaks or holidays, they can work all day and are available at a moment’s notice, their performance can be predicted and improved iteratively - and exponentially. All factors that leads us to believe that in a few years we might not even need Human Software Engineers at all. As computers can take in a small specification defined in english and spew out valid code that works!
Programmers - same fate as Computers?
Historically speaking, Computer was a career, the job of computers were to compute and crunch numbers, they used specialised devices and processes to calculate extremely efficiently. Hence, superficially it looks as if Programmers might also become an automated system, and might even keep the name. We have done it for computers, we took the career and applied on a machine, why can’t we do it fot programmers or software engineers?
However, I see a few major problems in this
- Programming is not a deterministic activity, a huge part of software engineering is getting extremely specific and dealing with complexities that are almost impossible to comprehend to a layperson. The same problem statement might be interpreted differently and especially in the case of product development, the problem statement itself is not clear.
- Software Engineering requires judgement, the engineers need to understand the domain and the application very deeply to even begin applying their software engineering skills. For example, a stock trading application needs to understand both the functional and non-functional requirements to an extremely specific extent to develop the platform. Until that point, we spend our time in a “requirement gathering” phase. This requires meeting multiple stakeholders, creating very lengthy discussions on what to build and a significant reason senior engineers are preferred is they ask the right question in the beginning and requires less number of iterations to get to a satisfactory outcome.
- Software Engineering is complex. We use our Data Structures and Algorithms to manage the complexity, if you’ve heard the term “spaghetti code” you know about this phenomena. If the nontrivial bit of code is not structured extremely well, and not documented properly, it is extremely hard to comprehend.
- Software Engineers need to practice for almost a decade before they can fluently read code, they develop the necessary cognitive skill to read a piece of code and create a mental model of the software that is mostly correct.
- We also use something called abstractions and repeat predefined patterns to manage the mental model and that is why most productive senior software engineers are always learning - about the domain, about computers and new technologies, tricks and random stuff in general. One of the core tenets of software engineering is learning a lot of content very fast and understanding accurately - things that LLMs has been demonstrated to be quite weak at.
All of the above points lead me to believe that LLMs, while good at coding, doesn’t really encapsulate everything a software engineer does. It might follow instruction somewhat realiably and produce code of acceptable quality, but it’s skills for debugging, understanding, requirement gathering, monitoring and maintaining are far behind what a human could do.
English as a Programming language
While my last argument focuses on the depth at which engineers operate, my current argument will focus on Information Theory and Programming language. In essense, a programming language is a formal expression of intent (I just chose to define it as such). It is a way for us to formally express what we want the computer to do. And in case of most higher level languages, we don’t really care how the computer carries out the instruction.
To be a good programming language, it has to be
- Easily Readable
- Easily Writable (using a standard keyboard)
- Unambiguous (the same expression must mean the same thing always)
- Easy to interpret/compile by a computer
As such, most high level programming languages are designed to be as close to english as possible, while removing most of the unnecessary ambiguities of the language, and also making it easy to type on a keyboard.
Formatting rules such as indentation and symbols creates visual constrast and interest to make it easily scannable to find what we need.
And all this, combined with the fact that computers can also understand that language efficiently, makes it a good programming language.
English does not make a good programming language, it is ambiguous, hard to scan, unformatted and quite hard to interpret, even by a SOTA LLM. English is a fairly good description language, but makes for a poor language to express intent.
Example English Program
Let’s take the following example:
Create a program that takes two inputs, sums them and outputs that result.
On the surface, it might look fine, but let’s break it down -> Takes two inputs: from where, in what format? Is it an integer, floating point number, does the software have to infer the type? When asking for the inputs, do we present a message? if yes, what message? -> sums them: How is the sum performed, in case we receive inputs that are non-numbers, how do we proceed with the summation? do we interpret the sum instruction to be concatenation? Do we ignore? Do we show some errors? -> Outputs the result: Where? How? In which format?
As you can see, english is not really a good language to express intent as it depends a lot on context and hides the specifics.
Example Python Program
On the other hand, take the python example
a = int(input("Enter number a: "))
b = int(input("Enter number b: "))
print(f"The sum of a and b is {a + b}")
Here, we clearly communicate that we are expecting two integers, and if the user fails to supply an integer, we will raise an error and close the program. We are also communicating that we are expecting the number from a stdin aka terminal. And we also communicate that we want to display the output to the same terminal/stdout. This explicit specification gives programming languages the qualities that it needs to be a good programming language.
It is also easily syntax highlighted and to a trained engineer, very, very easily scannable.
Let’s see what’s the program that Claude Generates based on the english description:
# Get input from the user
num1 = float(input("Enter the first number: "))
num2 = float(input("Enter the second number: "))
# Calculate the sum
sum_result = num1 + num2
# Print the result
print(f"The sum of {num1} and {num2} is: {sum_result}")
Here, claude has made certain assumptions that was not present in the instruction it was given.
- It chose to interpret the inputs as floats. But at no point is this assumption recorded.
- It chose to allocate a variable to hold the sum (this allocates memory)
- It chose to output the number at arbitrary precision, it will print however many digits it can after the decimal point
It had to make those assumptions, as clarifying those would be pedantic and would require interruption, and those details were missing. This indicates that english is not a language designed to encode intent.
Improving the English program to be more specific
Let’s improve our prompt and see how claude does.
Create a program that accepts two floating point numbers, sums them and outputs the result up to 2 decimal points and handle rounding
def sum_floats():
# Input two floating-point numbers
num1 = float(input("Enter the first number: "))
num2 = float(input("Enter the second number: "))
# Calculate the sum
sum_result = num1 + num2
# Round the result to 2 decimal places
rounded_result = round(sum_result, 2)
# Output the result
print(f"The sum of {num1} and {num2} is: {rounded_result:.2f}")
# Run the program
sum_floats()
Compare that to the code I have written
a = float(input("Enter number a: "))
b = float(input("Enter number b: "))
print(f"The sum of a and b is {(a + b):.2f}")
Here claude made some rather interesting choices
- It decided to wrap the logic in a function and immediately invoke it, as our instruction failed to encode this.
- It decided to allocate yet another variable, spending 16 bytes that the GC has to clean up
- It decided to round the numbers in one variable, and chose to again use format specifiers to enforce that.
However, most of it is due to the fact that english fails to encode intent precisely, and its rules are lax, resulting in ambiguity, which requires judgement, as LLMs are prone to hallucinatons, trusting LLMs with judgement is unfortunately, a bit unadvisable.
The human factor
In this segment, I want to talk about a few psychological phenomena: Dunning-Kruger Effect, Cognitive Biases and Cognitive Dissonance among others
When we learn software engineering, there are two key aspects of it: The Technical Skills and the Non Technical Skills. Some of it is trained, a lot of it is acquired through years of experience.
If we make the argument that people to write english to broadly describe features, and let the LLM do its job to produce code, there lies a hidden danger, lurking in the corner: The Dunning Kruger effect.
As LLM produces more and more code, we will be writing and debugging less, hence trusting the LLM generated code more. This leads us to some sense of confidence and control. This can be dangerous and unsafe. An LLM might not be able to fetch the correct context or make the right connections, fail to ask the right questions and fail to ask the right people.
However, the individual using the LLM might feel overconfident at their pace of delivery, and fail to understand the low-level detail of the code, assumptions made by the LLM and when asking questions to relevant stakeholders, fail to take those into account. And in my honest opinion, the real bottleneck is to discover what to build, what assumptions can be made safely and who to ask which question, how to coordinate with different teams the time especially a senior software engineer dedicates into writing actual code is quite less, and even if we make it near instantaneous, it is not enough to replace a software engineer completely.
There has been many scenario where I have asked a quite irrelevant sounding question such as “do you think we can expect chinese input for this field” or “is there a future scope that expands this beyond the US?”, all of these questions let us make informed decisions and assumptions and we have a paper trail to refer back to in the event a scope is changed.
In essense, it is a running joke among engineers that customers and stakeholders don’t know what they’re asking for and huge part of the job is to align them on a specific set of functionalities that covers their area of expertise and then seek a sign off.
Dead Reckoning and Cumulative Drift
As an aviation fan and a hobbyist simmer, I know about a concept called Dead Reckoning in VFR flights, in this, an aircraft positions itself over a specific landmark, and then flies specific headings for predetermined amounts of time and mentioned airspeeds, and then uses a new heading and speed for time.
However, this method has something known as cumulative drift. The direction and speed of winds can deviate the aircraft away from its intended flight path as there are no references and the small errors keep adding up. To aid this, pilots use some other landmarks en route to correct for the drift.
And LLMs suffer a similar fate, as it creates a small error, it then uses this information to generate further code/documentation/specification etc and the errors add up exponentially. Understanding this is crucial. If you have 100 independent variables, each at 99.9% accuracy, the combined effect is only 90.4% accurate. Unless there is a way to “calibrate” the LLMs, it might be impossible to detect and correct for the drift, the errors will keep adding up exponentially.
How do we calibrate an LLM coding agent?
This might be possible via Human In the Loop Systems or verification agents, when the accuracy reaches a certain threshold, it needs to be fixed, until it is allowed to be run independently for a while again.