Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

[Proposal] "Stable" C API #171

Copy link
Copy link
Closed
Closed
Copy link
@Ronsor

Description

@Ronsor
Issue body actions

I propose refactoring main.cpp into a library (llama.cpp, compiled to llama.so/llama.a/whatever) and making main.cpp a simple driver program. A simple C API should be exposed to access the model, and then bindings can more easily be written for Python, node.js, or whatever other language.

This would partially solve #82 and #162.

Edit: on that note, is it possible to do inference from two or more prompts on different threads? If so, serving multiple people would be possible without multiple copies of model weights in RAM.

mike-luabase, artob, dineshdb, maffe03, AlexAltea and 5 more

Metadata

Metadata

Assignees

No one assigned

    Labels

    duplicateThis issue or pull request already existsThis issue or pull request already existsenhancementNew feature or requestNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Morty Proxy This is a proxified and sanitized view of the page, visit original site.