[Proposal] "Stable" C API

I propose refactoring main.cpp into a library (llama.cpp, compiled to llama.so/llama.a/whatever) and making main.cpp a simple driver program. A simple C API should be exposed to access the model, and then bindings can more easily be written for Python, node.js, or whatever other language.

This would partially solve #82 and #162.

Edit: on that note, is it possible to do inference from two or more prompts on different threads? If so, serving multiple people would be possible without multiple copies of model weights in RAM.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Proposal] "Stable" C API #171

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Search code, repositories, users, issues, pull requests...

[Proposal] "Stable" C API #171

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions