Open
Description
- Use
llama_decode
instead of deprecatedllama_eval
inLlama
class - Implement batched inference support for
generate
andcreate_completion
methods inLlama
class - Add support for streaming / infinite completion
giangluu352001, harry-pham-wise, JackKCWong, bb-worm, ChristianWeyer and 45 moresengiv, ArtyomZemlyak, hamishc, bioshazard, gerred and 16 moreesmeetu, robertritz, zhengzhanpeng, hamishc, ngupta10 and 12 more