top of page

The discipline of inputs

Visual comparison of clean versus dirty data showing the impact of data quality on AI accuracy.
Visual comparison of clean versus dirty data showing the impact of data quality on AI accuracy.

Why artificial intelligence succeeds or fails long before it produces an answer

Artificial intelligence is often presented as a triumph of computation. In reality, it is a test of organisational clarity.

When AI systems underperform, the cause is rarely a lack of sophistication in the model. More often, it is confusion upstream: data that is incomplete, inconsistently defined, legally ambiguous, or poorly understood by the organisation using it. The result is a familiar pattern outputs that sound convincing but cannot be fully trusted.

Why inputs matter more than models

Modern AI systems do not reason in the human sense. They estimate probabilities based on patterns learned from data. Large language models, for example, generate outputs by predicting what is most likely to come next given their training and inputs.

This makes them exquisitely sensitive to input quality. When data is biased, outdated, or poorly governed, the system does not compensate. It reproduces those weaknesses at speed and scale.

The OECD has repeatedly shown that many real-world AI failures—particularly in hiring, lending, and public services—stem from weaknesses in data quality, documentation, and accountability rather than from the models themselves.https://www.oecd.org/ai/principles/

In short: AI outputs can only be as clear as the inputs that shape them.

The persistent myth of “more data”

When AI results disappoint, organisations often reach for the same solution: add more data.

This instinct is understandable—and frequently wrong. More data does not automatically mean better data. In practice, it often increases noise, entrenches historical bias, and complicates legal and compliance obligations.

Research from MIT shows that once data quality falls below a certain threshold, increasing volume produces diminishing—and sometimes negative—returns on model performance.https://mitsloan.mit.edu/ideas-made-to-matter/why-more-data-isnt-always-better-ai

Clarity, not quantity, is the binding constraint.

What “clear inputs” actually mean

Clear inputs are not just clean spreadsheets or well-formatted tables. They are data that an organisation can explain and defend.

High-quality inputs typically share five characteristics:

  • Accuracy – They correctly reflect real-world conditions

  • Consistency – Definitions and formats are stable across systems

  • Completeness – Critical fields are not systematically missing

  • Timeliness – Data reflects current realities, not outdated ones

  • Relevance – Inputs directly relate to the decision being made

Just as important, these attributes are documented, not assumed.

As Andrew Ng has observed, most of the work in AI happens before modelling even begins:

“If 80 percent of our work is data preparation, then ensuring data quality is the most critical task for a machine learning team.”

Why organisations struggle to get this right

Maintaining clear inputs is difficult because it is as much an organisational challenge as a technical one.

Common obstacles include:

  • Data collected across fragmented systems and vendors

  • Inconsistent labeling and classification

  • Weak ownership and accountability

  • Security and integrity risks

  • Exposure to data poisoning or manipulation

  • Feedback loops where AI-generated data degrades future models

These issues tend to accumulate quietly, revealing themselves only after AI systems are embedded into decision-making processes.

Governance has entered the picture

Regulators have taken note of this pattern.

Rather than focusing narrowly on algorithms, modern frameworks emphasise data governance as the foundation of trustworthy AI. The National Institute of Standards and Technology AI Risk Management Framework places data quality, lineage, and documentation ahead of model optimisation.https://www.nist.gov/itl/ai-risk-management-framework

The EU’s AI Act follows a similar logic, tying system risk to the governance of training data rather than to technical sophistication alone.https://artificialintelligenceact.eu/

The message is consistent: accuracy is not just a technical outcome—it is an institutional one.

The quiet economics of clean inputs

For CIOs and executives, the financial implications are material. McKinsey estimates that poor data quality costs organisations 15–25% of operating revenue, largely through downstream decision errors.https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-data-quality-problem

By contrast, Gartner projects that organisations with mature data governance will achieve more than twice the AI return on investment of their peers by 2026.https://www.gartner.com/en/articles/ai-ready-data

These gains do not come from headline-grabbing AI projects. They come from quieter work: metadata, ownership models, validation controls, and continuous monitoring.

A simple conclusion

AI does not reward speed alone. It rewards precision.

Organisations that treat inputs as an afterthought will continue to produce impressive demonstrations and fragile results. Those that invest in clarity—what their data means, where it comes from, and how it may be used—will extract durable value from even modest AI systems.

The future of artificial intelligence will not be decided by the size of models, but by the discipline applied to the data that feeds them.

 
 
 

Comments


bottom of page