Why I Don’t Worship at the Altar of Neural Nets

There’s a certain group of people who – regardless of the context of your problem – will tell you to throw a neural net at it.

Not that you even asked them for advice in the first place. They asked you what problem you’re working on, you gave a quick summary, and now they’re telling you to try the hottest new neural net architecture.

Moreover, they speak with an air of confidence that could only be properly justified by having grappled firsthand with the intricacies of your problem (they haven’t) [1]. They might not even be in the same field of study as you.

It’s as if they worship at the altar of neural nets.

They cast away the constraints of the material world like resource allocation and system maintainability. All that matters is the theoretical universality of the model, as though it were a divine sacrament that would guarantee success to all those who partake in it.

Sure, it’s cool that neural nets are universal function approximators. If you train a sufficiently large neural net on a sufficiently large (and comprehensive) data set for sufficiently long, then the neural net can (in theory) learn arbitrarily complex patterns in the data. That’s amazing!

But it’s also costly – not just the compute resources, but also the amount of complexity it adds to your system. In most real-world contexts, modeling problems are not one-and-done. Every time you build a model, you have to integrate it into a larger software system that uses it, and you have to maintain it indefinitely. The more complex the model is, the harder it is to integrate and test, and the harder it is to resolve issues when they inevitably appear.

In order to justify using a more complex model, the increase in performance has to be worth the cost of integrating and maintaining the complexity.

Increases in performance are not guaranteed. Increasing a model’s complexity can only lead to increased performance if there is useful information in the underlying data that is not being leveraged by the original model. [2]

And even if they do occur, increases in performance are not always valuable in the broader context of the overarching software system of which the model is a component. There often comes a point of “good enough” beyond which further increases in model performance do not translate into additional value for the overarching system. The model just needs to perform “good enough” that the system can achieve its goals. The model doesn’t have to be perfect – it just can’t be a weak link in the chain.


[1] Perhaps ironically, this type of behavior is a common criticism of ChatGPT, which is itself a neural net.

[2] Incidentally, if you know exactly what that information is, and you know exactly how the model ought to be leveraging it, then you might as well just build that logic into your model explicitly instead of trying to get the model to learn it from scratch – which it might not even be possible you don’t have fully comprehensive data coverage in that crevice of the overall space.