Simon Willison on llm

597 posts tagged “llm”

LLM is my command-line tool for running prompts against Large Language Models.

2026

Using Claude Code: The Unreasonable Effectiveness of HTML. Thought-provoking piece by Thariq Shihipar (on the Claude Code team at Anthropic) advocating for HTML over Markdown as an output format to request from Claude.

The article is crammed with interesting examples (collected on this site) and prompt suggestions like this one:

Help me review this PR by creating an HTML artifact that describes it. I'm not very familiar with the streaming/backpressure logic so focus on that. Render the actual diff with inline margin annotations, color-code findings by severity and whatever else might be needed to convey the concept well.

I've been defaulting to asking for most things in Markdown since the GPT-4 days, when the 8,192 token limit meant that Markdown's token-efficiency over HTML was extremely worthwhile.

Thariq's piece here has caused me to reconsider that, especially for output. Asking Claude for an explanation in HTML means it can drop in SVG diagrams, interactive widgets, in-page navigation and all sorts of other neat ways of making the information more pleasant to navigate.

I wrote about Useful patterns for building HTML tools last December, but that was focused very much on interactive utilities like the ones on my tools.simonwillison.net site. I'm excited to start experimenting more with rich HTML explanations in response to ad-hoc prompts.

Trying this out on copy.fail

copy.fail describes a recently discovered Linux security exploit, including a proof of concept distributed as obfuscated Python.

I tried having GPT-5.5 create an HTML explanation of the exploit like this:

curl https://copy.fail/exp | llm -m gpt-5.5 -s 'Explain this code in detail. Reformat it, expand out any confusing bits and go deep into what it does and how it works. Output HTML, neatly styled and using capabilities of HTML and CSS and JavaScript to make the explanation rich and interactive and as clear as possible'

Here's the resulting HTML page. It's pretty good, though I should have emphasized explaining the exploit over the Python harness around it.

# 8th May 2026, 9 pm / generative-ai, prompt-engineering, claude-code, markdown, ai, html, llms, security, llm

Release llm-gemini 0.31

gemini-3.1-flash-lite is no longer a preview.

Here's my write-up of the Gemini 3.1 Flash-Lite Preview model back in March. I don't believe this new non-preview model has changed since then.

7th May 2026, 7:57 pm · llm-release, gemini, llm, google, generative-ai, ai, llms

Release datasette-llm 0.1a7

Mechanism for configuring default options for specific models.

Part of Datasette's evolving support mechanism for plugins that use LLMs. It's now possible to configure a model with default options, e.g. to say all enrichment operations should use a specific model with temperature set to 0.5.

5th May 2026, 1:56 am · llm, datasette

Release llm-echo 0.5a0

New -o thinking 1 option to help test against LLM 0.32a0 and higher.

This plugin provides a fake model called "echo" for LLM which doesn't run an LLM at all - it's useful for writing automated tests. You can now do this:

uvx --with llm==0.32a1 --with llm-echo==0.5a0 llm -m echo hi -o thinking 1

This will fake a reasoning block to standard error before returning JSON echoing the prompt.

5th May 2026, 1:31 am · llm

Release llm 0.32a1

Fixed a bug in 0.32a0 where tool-calling conversations were not correctly reinflated from SQLite. #1426

29th Apr 2026, 11:52 pm · llm

LLM 0.32a0 is a major backwards-compatible refactor

I just released LLM 0.32a0, an alpha release of my LLM Python library and CLI tool for accessing LLMs, with some consequential changes that I’ve been working towards for quite a while.

[... 1,874 words]

7:01 pm / 29th April 2026 / llm, projects, python, generative-ai, annotated-release-notes, ai, llms

Release llm 0.32a0

See the annotated release notes.

29th Apr 2026, 6:57 pm · llm

Release llm 0.31

New GPT-5.5 OpenAI model: llm -m gpt-5.5. #1418

New option to set the text verbosity level for GPT-5+ OpenAI models: -o verbosity low. Values are low, medium, high.

New option for setting the image detail level used for image attachments to OpenAI models: -o image_detail low - values are low, high and auto, and GPT-5.4 and 5.5 also accept original.

Models listed in extra-openai-models.yaml are now also registered as asynchronous. #1395

24th Apr 2026, 11:35 pm · gpt, openai, llm

DeepSeek V4—almost on the frontier, a fraction of the price

$Visit DeepSeek V4 - almost on the frontier, a fraction of the price$

Chinese AI lab DeepSeek’s last model release was V3.2 (and V3.2 Speciale) last December. They just dropped the first of their hotly anticipated V4 series in the shape of two preview models, DeepSeek-V4-Pro and DeepSeek-V4-Flash.

[... 703 words]

6:01 am / 24th April 2026 / llm, pelican-riding-a-bicycle, llm-pricing, deepseek, ai, ai-in-china, llms, llm-release, generative-ai, openrouter

A pelican for GPT-5.5 via the semi-official Codex backdoor API

GPT-5.5 is out. It’s available in OpenAI Codex and is rolling out to paid ChatGPT subscribers. I’ve had some preview access and found it to be a fast, effective and highly capable model. As is usually the case these days, it’s hard to put into words what’s good about it—I ask it to build things and it builds exactly what I ask for!

[... 884 words]

7:59 pm / 23rd April 2026 / llm, openai, pelican-riding-a-bicycle, llm-reasoning, ai, llms, llm-release, codex-cli, generative-ai, chatgpt, gpt, llm-pricing

Release llm-openai-via-codex 0.1a0

Hijacks your Codex CLI credentials to make API calls with LLM, as described in my post about GPT-5.5.

23rd Apr 2026, 7:22 pm · openai, llm, codex-cli

Release llm-openrouter 0.6

llm openrouter refresh command for refreshing the list of available models without waiting for the cache to expire.

I added this feature so I could try Kimi 2.6 on OpenRouter as soon as it became available there.

Here's its pelican - this time as an HTML page because Kimi chose to include an HTML and JavaScript UI to control the animation. Transcript here.

The bicycle is about right. The pelican is OK. It is pedaling furiously and flapping its wings a bit. Controls below the animation provide a pause button and sliders for controlling the speed and the wing flap.

20th Apr 2026, 6 pm · openrouter, llm, llm-release, pelican-riding-a-bicycle, kimi, ai-in-china, llms, ai, generative-ai

Release llm-anthropic 0.25

New model: claude-opus-4.7, which supports thinking_effort: xhigh. #66

New thinking_display and thinking_adaptive boolean options. thinking_display summarized output is currently only available in JSON output or JSON logs.

Increased default max_tokens to the maximum allowed for each model.

No longer uses obsolete structured-outputs-2025-11-13 beta header for older models.

16th Apr 2026, 8:37 pm · llm, anthropic, claude

Release research-llm-apis 2026-04-04

I'm working on a major change to my LLM Python library and CLI tool. LLM provides an abstraction layer over hundreds of different LLMs from dozens of different vendors thanks to its plugin system, and some of those vendors have grown new features over the past year which LLM's abstraction layer can't handle, such as server-side tool execution.

To help design that new abstraction layer I had Claude Code read through the Python client libraries for Anthropic, OpenAI, Gemini and Mistral and use those to help craft curl commands to access the raw JSON for both streaming and non-streaming modes across a range of different scenarios. Both the scripts and the captured outputs now live in this new repo.

5th Apr 2026, 12:32 am · llm, apis, json, llms

Gemma 4: Byte for byte, the most capable open models. Four new vision-capable Apache 2.0 licensed reasoning LLMs from Google DeepMind, sized at 2B, 4B, 31B, plus a 26B-A4B Mixture-of-Experts.

Google emphasize "unprecedented level of intelligence-per-parameter", providing yet more evidence that creating small useful models is one of the hottest areas of research right now.

They actually label the two smaller models as E2B and E4B for "Effective" parameter size. The system card explains:

The smaller models incorporate Per-Layer Embeddings (PLE) to maximize parameter efficiency in on-device deployments. Rather than adding more layers or parameters to the model, PLE gives each decoder layer its own small embedding for every token. These embedding tables are large but are only used for quick lookups, which is why the effective parameter count is much smaller than the total.

I don't entirely understand that, but apparently that's what the "E" in E2B means!

One particularly exciting feature of these models is that they are multi-modal beyond just images:

Vision and audio: All models natively process video and images, supporting variable resolutions, and excelling at visual tasks like OCR and chart understanding. Additionally, the E2B and E4B models feature native audio input for speech recognition and understanding.

I've not figured out a way to run audio input locally - I don't think that feature is in LM Studio or Ollama yet.

I tried them out using the GGUFs for LM Studio. The 2B (4.41GB), 4B (6.33GB) and 26B-A4B (17.99GB) models all worked perfectly, but the 31B (19.89GB) model was broken and spat out "---\n" in a loop for every prompt I tried.

The succession of pelican quality from 2B to 4B to 26B-A4B is notable:

E2B:

Two blue circles on a brown rectangle and a weird mess of orange blob and yellow triangle for the pelican

E4B:

Two black wheels joined by a sort of grey surfboard, the pelican is semicircles and a blue blob floating above it

26B-A4B:

Bicycle has the right pieces although the frame is wonky. Pelican is genuinely good, has a big triangle beak and a nice curved neck and is clearly a bird that is sitting on the bicycle

(This one actually had an SVG error - "error on line 18 at column 88: Attribute x1 redefined" - but after fixing that I got probably the best pelican I've seen yet from a model that runs on my laptop.)

Google are providing API access to the two larger Gemma models via their AI Studio. I added support to llm-gemini and then ran a pelican through the 31B model using that:

llm -m gemini/gemma-4-31b-it 'Generate an SVG of a pelican riding a bicycle'

Pretty good, though it is missing the front part of the bicycle frame:

Motion blur lines, a mostly great bicycle albeit missing the front part of the frame. Pelican is decent.

# 2nd April 2026, 6:28 pm / vision-llms, llm, llm-reasoning, ai, local-llms, llms, gemma, llm-release, google, generative-ai, lm-studio, pelican-riding-a-bicycle

Release llm-gemini 0.30

New models gemini-3.1-flash-lite-preview, gemma-4-26b-a4b-it and gemma-4-31b-it. See my notes on Gemma 4.

2nd Apr 2026, 6:25 pm · gemini, llm, gemma

Release datasette-llm 0.1a6

The same model ID no longer needs to be repeated in both the default model and allowed models lists - setting it as a default model automatically adds it to the allowed models list. #6

Improved documentation for Python API usage.

1st Apr 2026, 11:01 pm · llm, datasette

Release datasette-enrichments-llm 0.2a1

The actor who triggers an enrichment is now passed to the llm.mode(... actor=actor) method. #3

1st Apr 2026, 10 pm · enrichments, llm, datasette

Release datasette-extract 0.3a0

Now uses datasette-llm to manage model configuration, which means you can control which models are available for extraction tasks using the extract purpose and LLM model configuration. #38

1st Apr 2026, 3:32 am · llm, datasette

Release datasette-enrichments-llm 0.2a0

This plugin now uses datasette-llm to configure and manage models. This means it's possible to specify which models should be made available for enrichments, using the new enrichments purpose.

1st Apr 2026, 3:28 am · llm, datasette

Release datasette-llm-usage 0.2a0

Removed features relating to allowances and estimated pricing. These are now the domain of datasette-llm-accountant.

Now depends on datasette-llm for model configuration. #3

Full prompts and responses and tool calls can now be logged to the llm_usage_prompt_log table in the internal database if you set the new datasette-llm-usage.log_prompts plugin configuration setting.

Redesigned the /-/llm-usage-simple-prompt page, which now requires the llm-usage-simple-prompt permission.

1st Apr 2026, 3:24 am · llm, datasette

Release datasette-llm 0.1a5

The llm_prompt_context() plugin hook wrapper mechanism now tracks prompts executed within a chain as well as one-off prompts, which means it can be used to track tool call loops. #5

1st Apr 2026, 3:11 am · llm, datasette

Release datasette-llm 0.1a4

Ability to configure different API keys for models based on their purpose - for example, set it up so enrichments always use gpt-5.4-mini with an API key dedicated to that purpose. #4

I released llm-echo 0.3 to provide an API key testing utility I needed for the tests for this new feature.

31st Mar 2026, 9:17 pm · llm, datasette

Release llm-all-models-async 0.1

LLM plugins can define new models in both sync and async varieties. The async variants are most common for API-backed models - sync variants tend to be things that run the model directly within the plugin.

My llm-mrchatterbox plugin is sync only. I wanted to try it out with various Datasette LLM features (specifically datasette-enrichments-llm) but Datasette can only use async models.

So... I had Claude spin up this plugin that turns sync models into async models using a thread pool. This ended up needing an extra plugin hook mechanism in LLM itself, which I shipped just now in LLM 0.30.

31st Mar 2026, 8:52 pm · llm, async, python

Release llm 0.30

The register_models() plugin hook now takes an optional model_aliases parameter listing all of the models, async models and aliases that have been registered so far by other plugins. A plugin with @hookimpl(trylast=True) can use this to take previously registered models into account. #1389

Added docstrings to public classes and methods and included those directly in the documentation.

31st Mar 2026, 8:35 pm · llm

Release llm-echo 0.4

Prompts now have the input_tokens and output_tokens fields populated on the response.

31st Mar 2026, 4:48 pm · llm

Release llm-echo 0.3

Mechanisms for testing tool calls. #3

Mechanism for testing raw responses. #4

New echo-needs-key model for testing model key logic. #7

31st Mar 2026, 3:43 pm · llm

Release datasette-llm 0.1a3

Adds the ability to configure which LLMs are available for which purpose, which means you can restrict the list of models that can be used with a specific plugin. #3

30th Mar 2026, 7:48 pm · llm, datasette

Mr. Chatterbox is a (weak) Victorian-era ethically trained model you can run on your own computer

Trip Venturella released Mr. Chatterbox, a language model trained entirely on out-of-copyright text from the British Library. Here’s how he describes it in the model card:

[... 952 words]

2:28 pm / 30th March 2026 / llm, training-data, ai, local-llms, llms, ai-ethics, claude-code, andrej-karpathy, ai-assisted-programming, generative-ai, hugging-face, uv

Release llm-mrchatterbox 0.1

See Mr. Chatterbox is a (weak) Victorian-era ethically trained model you can run on your own computer.

30th Mar 2026, 2:20 am · llm

page 1 / 20 next » last »»

Simon Willison’s Weblog