Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit 1372e4f

Browse filesBrowse files
committed
Update CHANGELOG
1 parent 8e13520 commit 1372e4f
Copy full SHA for 1372e4f

File tree

Expand file treeCollapse file tree

1 file changed

+32
-61
lines changed
Filter options
Expand file treeCollapse file tree

1 file changed

+32
-61
lines changed

‎CHANGELOG.md

Copy file name to clipboardExpand all lines: CHANGELOG.md
+32-61Lines changed: 32 additions & 61 deletions
Original file line numberDiff line numberDiff line change
@@ -9,26 +9,50 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
99

1010
## [0.2.2]
1111

12-
- Fix bug in pip install of v0.2.1 due to scikit-build-core removing all `.metal` files in the source distribution
12+
- Fix bug in pip install of v0.2.1 due to scikit-build-core removing all `.metal` files in the source distribution (see #701)
1313

1414
## [0.2.1]
1515

16-
- Fix bug in pip install of v0.2.0 due to .git folder being included in the source distribution
16+
- Fix bug in pip install of v0.2.0 due to .git folder being included in the source distribution (see #701)
1717

1818
## [0.2.0]
1919

20-
- Migrated to scikit-build-core for building llama.cpp from source
20+
- Migrated to scikit-build-core build system by @abetlen in #499
21+
- Use `numpy` views for `LogitsProcessor` and `StoppingCriteria` instead of python lists by @abetlen in #499
22+
- Drop support for end-of-life Python3.7 by @abetlen in #499
23+
- Convert low level `llama.cpp` constants to use basic python types instead of `ctypes` types by @abetlen in #499
2124

22-
## [0.1.79]
25+
## [0.1.85]
26+
27+
- Add `llama_cpp.__version__` attribute by @janvdp in #684
28+
- Fix low level api examples by @jbochi in #680
29+
30+
## [0.1.84]
31+
32+
- Update llama.cpp
33+
34+
## [0.1.83]
35+
36+
- Update llama.cpp
37+
38+
## [0.1.82]
39+
40+
- Update llama.cpp
41+
42+
## [0.1.81]
2343

24-
### Added
44+
- Update llama.cpp
45+
46+
## [0.1.80]
47+
48+
- Update llama.cpp
49+
50+
## [0.1.79]
2551

2652
- GGUF Support (breaking change requiring new model format)
2753

2854
## [0.1.78]
2955

30-
### Added
31-
3256
- Grammar based sampling via LlamaGrammar which can be passed to completions
3357
- Make n_gpu_layers == -1 offload all layers
3458

@@ -47,152 +71,99 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
4771

4872
## [0.1.74]
4973

50-
### Added
51-
5274
- (server) OpenAI style error responses
5375

5476
## [0.1.73]
5577

56-
### Added
57-
5878
- (server) Add rope parameters to server settings
5979

6080
## [0.1.72]
6181

62-
### Added
63-
6482
- (llama.cpp) Update llama.cpp added custom_rope for extended context lengths
6583

6684
## [0.1.71]
6785

68-
### Added
69-
7086
- (llama.cpp) Update llama.cpp
7187

72-
### Fixed
73-
7488
- (server) Fix several pydantic v2 migration bugs
7589

7690
## [0.1.70]
7791

78-
### Fixed
79-
8092
- (Llama.create_completion) Revert change so that `max_tokens` is not truncated to `context_size` in `create_completion`
8193
- (server) Fixed changed settings field names from pydantic v2 migration
8294

8395
## [0.1.69]
8496

85-
### Added
86-
8797
- (server) Streaming requests can are now interrupted pre-maturely when a concurrent request is made. Can be controlled with the `interrupt_requests` setting.
8898
- (server) Moved to fastapi v0.100.0 and pydantic v2
8999
- (docker) Added a new "simple" image that builds llama.cpp from source when started.
90-
91-
## Fixed
92-
93100
- (server) performance improvements by avoiding unnecessary memory allocations during sampling
94101

95102
## [0.1.68]
96103

97-
### Added
98-
99104
- (llama.cpp) Update llama.cpp
100105

101106
## [0.1.67]
102107

103-
### Fixed
104-
105108
- Fix performance bug in Llama model by pre-allocating memory tokens and logits.
106109
- Fix bug in Llama model where the model was not free'd after use.
107110

108111
## [0.1.66]
109112

110-
### Added
111-
112113
- (llama.cpp) New model API
113114

114-
### Fixed
115-
116115
- Performance issue during eval caused by looped np.concatenate call
117116
- State pickling issue when saving cache to disk
118117

119118
## [0.1.65]
120119

121-
### Added
122-
123120
- (llama.cpp) Fix struct misalignment bug
124121

125122
## [0.1.64]
126123

127-
### Added
128-
129124
- (llama.cpp) Update llama.cpp
130125
- Fix docs for seed. Set -1 for random.
131126

132127
## [0.1.63]
133128

134-
### Added
135-
136129
- (llama.cpp) Add full gpu utilisation in CUDA
137130
- (llama.cpp) Add get_vocab
138131
- (llama.cpp) Add low_vram parameter
139132
- (server) Add logit_bias parameter
140133

141134
## [0.1.62]
142135

143-
### Fixed
144-
145136
- Metal support working
146137
- Cache re-enabled
147138

148139
## [0.1.61]
149140

150-
### Fixed
151-
152141
- Fix broken pip installation
153142

154143
## [0.1.60]
155144

156-
### NOTE
157-
158-
- This release was deleted due to a bug with the packaging system that caused pip installations to fail.
159-
160-
### Fixed
145+
NOTE: This release was deleted due to a bug with the packaging system that caused pip installations to fail.
161146

162147
- Truncate max_tokens in create_completion so requested tokens doesn't exceed context size.
163148
- Temporarily disable cache for completion requests
164149

165150
## [v0.1.59]
166151

167-
### Added
168-
169152
- (llama.cpp) k-quants support
170153
- (server) mirostat sampling parameters to server
171-
172-
### Fixed
173-
174154
- Support both `.so` and `.dylib` for `libllama` on MacOS
175155

176156
## [v0.1.58]
177157

178-
### Added
179-
180158
- (llama.cpp) Metal Silicon support
181159

182160
## [v0.1.57]
183161

184-
### Added
185-
186162
- (llama.cpp) OpenLlama 3B support
187163

188164
## [v0.1.56]
189165

190-
### Added
191-
192166
- (misc) Added first version of the changelog
193167
- (server) Use async routes
194168
- (python-api) Use numpy for internal buffers to reduce memory usage and improve performance.
195-
196-
### Fixed
197-
198169
- (python-api) Performance bug in stop sequence check slowing down streaming.

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.