Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

tinyBigGAMES/JetInfero

Open more actions menu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

JetInfero
Chat on Discord Follow on Bluesky

🌟 Fast, Flexible Local LLM Inference for Developers 🚀

JetInfero is a nimble and high-performance library that enables developers to integrate local Large Language Models (LLMs) effortlessly into their applications. Powered by llama.cpp 🕊️, JetInfero prioritizes speed, flexibility, and ease of use 🌐. It’s compatible with any language supporting Win64, Unicode, and dynamic-link libraries (DLLs).

💡 Why Choose JetInfero?

  • Optimized for Speed ⚡️: Built on llama.cpp, JetInfero offers lightning-fast inference capabilities with minimal overhead.
  • Cross-Language Support 🌐: Seamlessly integrates with Delphi, C++, C#, Java, and other Win64-compatible environments.
  • Intuitive API 🔬: A clean procedural API simplifies model management, inference execution, and callback handling.
  • Customizable Templates 🖋️: Tailor input prompts to suit different use cases with ease.
  • Scalable Performance 🚀: Leverage GPU acceleration, token streaming, and multi-threaded execution for demanding workloads.

🛠️ Key Features

🤖 Advanced AI Integration

JetInfero expands your toolkit with capabilities such as:

  • Dynamic chatbot creation 🗣️.
  • Automated text generation 🔄 and summarization 🕻.
  • Context-aware content creation 🌐.
  • Real-time token streaming for adaptive applications ⌚.

🔒 Privacy-Centric Local Execution

  • Operates entirely offline 🔐, ensuring sensitive data remains secure.
  • GPU acceleration supported via Vulkan for enhanced performance 🚒.

⚙️ Performance Optimization

  • Configure GPU utilization with AGPULayers 🔄.
  • Allocate threads dynamically using AMaxThreads 🌐.
  • Access performance metrics to monitor throughput and efficiency 📊.

🔀 Flexible Prompt Templates

JetInfero’s template system simplifies input customization. Templates include placeholders such as:

  • {role}: Denotes the sender’s role (e.g., user, assistant).
  • {content}: Represents the message content.

For example:

  jiDefineModel(
    // Model Filename
    'C:/LLM/GGUFDolphin3.0-Llama3.1-8B-Q4_K_M.gguf', 
    
    // Model Refname
    'Dolphin3.0-Llama3.1-8B-Q4_K_M.gguf',                  
    
     // Model Template
    '<|im_start|>{role}\n{content}<|im_end|>',                
    
     // Model Template End
    '<|im_start|>assistant',                                
    
    // Capitalize Role
    False,                                                     
    
    // Max Context
    8192,                                                      
    
    // Main GPU, -1 for best, 0..N GPU number
    -1,                                                        
    
    // GPU Layers, -1 for max, 0 for CPU only, 1..N for layer
    -1,                                                        
    
    // Max threads, default 4, max will be physical CPU count     
     4                                                         
  );

Template Benefits

  • Adaptability 🌐: Customize prompts for various LLMs and use cases.
  • Consistency 🔄: Ensure predictable inputs for reliable results.
  • Flexibility 🌈: Modify prompt formats for tasks like JSON or markdown generation.

🍂 Streamlined Model Management

  • Define models with jiDefineModel 🔨.
  • Load/unload models dynamically using jiLoadModel and jiUnloadModel 🔀.
  • Save/load model configurations with jiSaveModelDefines and jiLoadModelDefines 🗃️.
  • Clear all model definitions using jiClearModelDefines 🧹.

🔁 Inference Execution

  • Perform inference tasks with jiRunInference ⚙️.
  • Stream real-time tokens via InferenceTokenCallback ⌚.
  • Retrieve responses using jiGetInferenceResponse 🖊️.

📊 Performance Monitoring

  • Retrieve detailed metrics like tokens/second, input/output token counts, and execution time via jiGetPerformanceResult 📊.

🛠️ Installation

  1. Download the Repository 📦

    • Download here and extract the files to your preferred directory 📂.

    Ensure JetInfero.dll is accessible in your project directory.

  2. Acquire a GGUF Model 🧠

    • Obtain a model from Hugging Face, such as Dolphin3.0-Llama3.1-8B-Q4_K_M-GGUF, a good general purpose model. You can download directly from our Hugging Face account. See the model card for more information.
    • Save it to a directory accessible to your application (e.g., C:/LLM/GGUF) 💾.
  3. Add JetInfero to Your Project 🔨

    • Include the JetInfero unit in your Delphi project.
  4. Ensure GPU Compatibility 🎮

    • Verify Vulkan compatibility for enhanced performance ⚡. Adjust AGPULayers as needed to accommodate VRAM limitations 📉.
  5. Building JetInfero DLL 🛠️

    • Open and compile the JetInfero.dproj project 📂. This process will generate the 64-bit JetInfero.dll in the lib folder 🗂️.
    • The project was created and tested using Delphi 12.2 on Windows 11 24H2 🖥️.
  6. Using JetInfero 🚀

    • JetInfero can be used with any programming language that supports Win64 and Unicode bindings 💻.
    • Ensure the JetInfero.dll is included in your distribution and accessible at runtime 📦.

Note: JetInfero requires direct access to the GPU/CPU and is not recommended for use inside a virtual machine.

📈 Quick Start

⚙️ Basic Setup

Integrate JetInfero into your Delphi project:

uses
  JetInfero;

var
  LTokensPerSec: Double;
  LTotalInputTokens: Int32;
  LTotalOutputTokens: Int32;
begin
  if jiInit() then
  begin
    jiDefineModel(
      'C:/LLM/GGUF/Dolphin3.0-Llama3.1-8B-Q4_K_M.gguf',
      'Dolphin3.0-Llama3.1-8B-Q4_K_M.gguf',
      '<|im_start|>{role}\n{content}<|im_end|>',
      '<|im_start|>assistant', False, 8192, -1, -1, 4);
    
    jiLoadModel('Dolphin3.0-Llama3.1-8B-Q4_K_M.gguf');

    jiAddMessage('user', 'What is AI?');

    if jiRunInference(PWideChar(LModelRef)) then
      begin
        jiGetPerformanceResult(@LTokensPerSec, @LTotalInputTokens, @LTotalOutputTokens);
        WriteLn('Input Tokens : ', LTotalInputTokens);
        WriteLn('Output Tokens: ', LTotalOutputTokens);
        WriteLn('Speed        : ', LTokensPerSec:3:2, ' t/s');
      end
    else
      begin
        WriteLn('Error: ', jiGetLastError());
      end;

    jiUnloadModel();
    jiQuit();
  end;

end.

🔁 Using Callbacks

Define a custom callback to handle token streaming:

procedure InferenceCallback(const Token: string; const UserData: Pointer);
begin
  Write(Token);
end;

jiSetInferenceTokenCallback(@InferenceCallback, nil);

📊 Retrieve Performance Metrics

Access performance results to monitor efficiency:

var
  Metrics: TPerformanceResult;
begin
  Metrics := jiGetPerformanceResult();
  WriteLn('Tokens/Sec: ', Metrics.TokensPerSecond);
  WriteLn('Input Tokens: ', Metrics.TotalInputTokens);
  WriteLn('Output Tokens: ', Metrics.TotalOutputTokens);
end;

🛠️ Support and Resources

🤝 Contributing

Contributions to ✨ JetInfero are highly encouraged! 🌟

  • 🐛 Report Issues: Submit issues if you encounter bugs or need help.
  • 💡 Suggest Features: Share your ideas to make Lumina even better.
  • 🔧 Create Pull Requests: Help expand the capabilities and robustness of the library.

Your contributions make a difference! 🙌✨

Contributors 👥🤝


📜 Licensing

JetInfero is distributed under the 🆓 BSD-3-Clause License, allowing for redistribution and use in both source and binary forms, with or without modification, under specific conditions. See the LICENSE file for more details.


Elevate your Delphi projects with JetInfero 🚀 – your bridge to seamless local generative AI integration 🤖.

Delphi

Made with ❤️ in Delphi

Sponsor this project

 
Morty Proxy This is a proxified and sanitized view of the page, visit original site.