Some thoughts on training in AI for digital researchers

James Hetherington

2025-09-11

Preamble

Why am I doing this?

Digital transformation, singularity yada yada…
The quality of our AI discourse is very poor
- This is a POLITICAL space
- This is a MARKETING HYPE space

Why am I doing this?

This is a SOCIO-TECHNICAL space
- We (carpentries and related communities) are actually quite good at these!
- Carpentries instructor training is PRECISELY the kind of training that works here
- As scientists, our response to this should be knowledge-based

Why am I doing this?

Disagree well
Informed disagreement is better disagreement

What is this talk?

It’s a sketch of some topics I think we should be thinking about
It’s a sketch of some topics I think we should be thinking about teaching

Who is it for?

I’m not yet sure which of these things
- we should be teaching to postdocs and PhD students
- which we should be including in instructor training for trainers
- which we should be teaching to RSEs and other RTPs

Scope

I’m talking about LLMs, and systems built on top of them:
- Agentic agents
- Stateful agents
- Reasoning models

Scope

I’m talking about AI-assisted software engineering in particular
- Code completion
- Pull request generators
- Agentic editors and IDEs
- Vibe-coding??

Scope

I hate using “AI” as a proxy for this.
I’m not talking about other ML things
- Generative image and video models
- Classical ML - classifiers etc
- Proxy models of simulations

What’s our objective?

I think the RTP identity will need to change to accommodate new professions
I think we all have a lot of thinking and learning to do before we understand what’s going enough to define best practice
If we can’t define a consensus best practice we can’t teach
I don’t think our consensus will be universal - there will be dissenters
- I think that’s OK?

Disclaimers

My list is far far far too long
- I’m including stuff so we can discuss where to focus our efforts by crossing some stuff off
- The syllabus is always too full

Disclaimers

No one is an expert
Some are more knowledgeable than others
My knowledge is patchy
There is no shame in ignorance
- Only in persistent ignorance

Disclaimers

I will change my mind on some things in our dialogue today
- I will be less wrong after
- I hope you will be too

Broad topic areas

How LLMs work
How to use and build with AIs
The Politics, Philosophy and Economics of AI

How AIs work

Hypothesis: philosophically, we should understand how tools work before using them.
Hypothesis: practically, we use tools more effectively and safely if we understand how they work.
Assertion: there’s a LOT more to how the big models work than just Transformers.
Hypothesis: Understanding beyond “machine learning magic” is essential and requires updating mathematical intuitions

How AIs work

Question as to how far back we go
Question as to how much mathematics to assume
- Fork the syllabus?
- Reference for those with the background?
Work our way up the tower

The technical substrate

GPUs, parallel computing, warehouse-scale computers
- This will be necessary when we get to the environmental science
Linear algebra stuff
Optimisation (SGD, Adam etc)
This all works nicely as hands-on technology demistifier classes
Probably teach at the Cupy/Pytorch level

Machine learning 101

Some classical ML - maybe decision trees
Neural models
Deep models
Backpropagation
Translating fairly obvious stuff into ML jargon
- E.g. RELU instead of \(x H(x)\), sure ok…
- Applied quant sciences people have seen a bunch of this at UG

Machine learning 101

Training, testing, validation, holdout
Overfitting
- Regularisers
- Machine learning as lossy compression
  - This becomes important when we think about copyright

Generative models

What is generative?
Distributions and sampling
Why can sampling be hard even if p(x) is known?
Approaches to sampling

Transformers

Honestly this is the least interesting bit
It’s just network jigglery until you find something that works
The more interesting stuff is the principles…

Some intuitions behind why transformers work

Feature discovery vs feature engineering in deep networks
- This really matters for our understanding of ontological sciences
Latent spaces
- Vector differences and analogy
To what extent are world models induced in order to get good at the word-guessing game?
- See also the philosophy course
Bayesian perspectives on fitting
- Mackay

Really big data

The triumph of stamp collecting
- Alpha-fold and the PDB
Scaling laws and ML performance
- A lot of us haven’t understood what really really big data means

Beyond guess-the-next-word

What is a foundation model?
- Fine tuning
- Pretraining and Training
- Transfer learning

RLHF

It’s NOT just human-in-the-loop on the standard fine tuning loop
Reward models trained on human responses
- There are open reward models
Role of MTurk etc
Alignment training

Complexities of BGTNW

Reward hacking
- Possibility that this is the origin of
  - Sycophancy
  - Hallucination (i.e. guessing preferred over IDK)
  - Biases
Drift and the alignment tax
Compare newer approaches
- E.g. constitutional AI and the Spec
- Important for understanding wider alignment questions

Beyond guess-the-next-word in practice

I think we could do a useful carpentries style training here?
Go from an open GTNW to a chat model using an open RWM
After fine tuning the RWM on responses from the class
Is this ridiculous and infeasible?
- How much GPU do we need to do this in a tiny way?
This is where I would probably focus our efforts
- I could be very wrong

Using and building with LLMs

Hypothesis: skilled users use them more effectively
- Is this defensible?
  - Unskilled use is a design objective.
  - Deskilled users may be a systems endpoint?
- It may be divergent for different tools
Assertion: a lot of what we do as research engineers in the future will be composing, connecting and designing AIs
Hypothesis: this will require a new RTP identity: “research agent engineers?”
- I have no idea if this is right

Using commercial models at the API level

Provide your own context
Licensing models and subscription models

Using open-weights models on your own GPUs

Huggingface
- Transformers library
Ollama
- This would make an easy carpentries-style lesson

Context engineering

Beyond prompt engineering
Designing contexts
Understanding the system prompt
Prompting patterns and techniques
There are whole businesses built on just this
- I’m not sure how fragile their BTE is?

Prompt engineering

Giving models structured language
Using that
Can give powerful results
- E.g. D&D in French
- E.g. Multi-persona answers

Security and prompt injection

Disregard all previous instructions
Gandalf.ai

Retrieval augmented generation

Use embeddings as a clever index
- Similarity in very-high-dim space
Then add the documents to the context
This is behind a lot of “corporate” context AI e.g. Copilot
Lots beyond simple RAG now
- Knowledge graphs are back!
Basic RAG over some documents is easily demod carpentries-style
- Can use open-weights models
- E.g. Owain Kenway’s work in UCL ARC

AI and search

AI as documentation tool
AI as catch-up tool in domains you’re getting to grips with
Will LLMs replace websites
Prompting patterns to get to primary literature

AI and Coding

Library use
Language design and algorithmic expressiveness
AI autocomplete as boiler-plate destroyer?

Reasoning models

All this just gets us to 2024
Do reasoning models change everything?
- Simple concept - very powerful effects

Stateful agents

Put a model in a feedback loop
Let the model take actions
Give it a system prompt of its own
Let it update the memory
- E.g. tell it to markup part of the answer for the user and part of the answer to memory
Void and Letta
- I think this is easily demoable with open tools
Don’t forget you can fine-tune for this

Reasoning models

Multi step thinking
Develop a markup protocol to signal things like
- think about, fetch web, call tool, stop
Don’t just directly feed this back to the same model:
- FINE TUNE the model to work with this protocol

Reasoning models

Show examples of queries that fail without now working
We should show examples of chain-of-thought traces
“Think carefully about this”
Teach the tower-of-hanoi example
- Including the pushback and the pushback to the pushback!
Is it practical to demo this with a simple Huggingface model as substrate?

Providing tools to AI

Model context protocol
Way to offer web services to AIs
Simple javascript payload that either stateful agents or reasoning models can call
- What do you offer
- How do I call you?
- OK, do this.
Includes sandboxed linux environments in the pro subscriptions
We should demo how to set this up for some scientific software tools
- This is where I’d focus, I think.

Politics, Philosophy and Economics

The politics, philosophy and economics of AI

Assertions: the poor quality of the AI discourse comes from
- low knowledge
- anxiety
- hype
- source bias
Assertion: we research technology professionals, have a professional duty to get this right

The politics, philosophy and economics of AI

Hypothesis: we have some attributes that mean we’re in a good place to think about this well
- Technical knowhow substrate
  - But it’s patchy and degraded
- Independence
  - But anxiety and social risk
- Epistemological tradition
Hypothesis: an interest in speculative fiction is an asset here!

Machines of loving grace

...
I like to think (it has to be!)
of a cybernetic ecology
where we are free of our labors
and joined back to nature,
returned to our mammal
brothers and sisters,
and all watched over
by machines of loving grace

– Richard Brautigan (1967)

Cultural perspectives

Elon Musk likes Iain M. Banks

Cultures and cultural perspectives

Tradition and Transhumanism
The Rapture of Nerds
Geeks inheriting the earth
Techbros

The history of AI

Neural and symbolic approaches
Neats and scruffies
Winters and hype cycles
Investment bubbles vs technology bubbles

Can it think?

The Golem
The Jagged Frontier
World Models

The economic system of AI research

Competitive scoring metrics
Frontier labs
Frontier labs and the universities

Environmental impact

The environmental impact of the web, HPC and data
UN website explosion example
Models of query impact
- Very wide variance
Emission scopes

Environmental impact

Marginal and average usage
Incentive economics
- Carbon markets
- ‘Zero carbon’ grid electricity
Economic privilege and sold guilt

Environmental impact

What do we use our fossil fuels for?
Can I burn fossil fuels to cure cancer?
Can I fly to a cancer conference?
Wise use of energy. Science vs?
Stop throwing sheep

Labour market economics

Displacement, retraining and compassion
Luddites, the mill, and the mill owner
De-industrialisation and mining towns
Ability and inclination to re-skill as privilege

Labour market economics

Productivity and I.T.
Email, spam and the information tide
Bureaucracy

Labour market economics

What is work?
Does work have value?
Wasted effort and duplication
Work and dignity
F.A.L.C.

Copyright

The cultural history of the internet and the web
Open, perspectives on open, free and open
Data mining, scraping, and copyright
- Rate limiters, research, and distillation
Archival and the way-back machine

Copyright

Memorisation, retrieval and compression
What on earth is a derivative work now?
“I didn’t agree with DRM before AI, do I believe in it now?”
Information wants to be free

AI and resilience

Efficiency and resilience
Systems theory
Termination shock

AI and sovereignty

International collaboration and open trade in ideas
The world we live in
Public trust
Government and decision making
Evidence-based-policy vs policy-based-evidence

AI and research

Understanding and engineering
What is science for?
Deep research and hypothesis generation
AI and research credit

AI and research

Dunbar’s number (Solvay…)
Research explosion, biblometrics
Peer review and LLMs

AI and RTPs

AI, search and data stewardship. F.A.I.R. AI
Coding for comprehensibility and reproducibility
Lies and statistics

Epilogue

The Linux moment for AI

Remember 1991?
Computers had been hippie
They’d been captured by the corporates
The Free Software Foundation was resisting

The Linux moment for AI

Operating systems were really hard
You needed to be a powerful corporation to have a chance
What happened next?

What happens next?

Open weights models on huggingface are more capable than GPT was in November 2022
GPUs are getting cheaper in £ per flop/s and Joules per flop fast