Make chat completion for a given chat messages.
OAI-compatible chat completion options
OAI-compatible chat completion response (only the final result when stream=false) or an async iterator of completion chunks (when stream=true)
Make chat completion for a given chat messages.
OAI-compatible chat completion options
OAI-compatible chat completion response (only the final result when stream=false) or an async iterator of completion chunks (when stream=true)
Make (raw) completion for a given text.
OAI-compatible completion options
OAI-compatible completion response (only the final result when stream=false) or an async iterator of completion chunks (when stream=true)
Make (raw) completion for a given text.
OAI-compatible completion options
OAI-compatible completion response (only the final result when stream=false) or an async iterator of completion chunks (when stream=true)
Calculate embedding vector for a given text. By default, BOS and EOS tokens will be added automatically. You can use the "skipBOS" and "skipEOS" option to disable it.
OAI-compatible embedding creation options
OAI-compatible embedding response
Get the jinja chat template comes with the model. It only available if the original model (before converting to gguf) has the template in tokenizer_config.json
NOTE: This can only being used after loadModel is called.
the jinja template. null if there is no template in gguf
Get token ID associated to token used by decoder, to start generating output sequence(only usable for encoder-decoder architecture). In other words, encoder uses normal BOS and decoder uses this token.
NOTE: This can only being used after loadModel is called.
-1 if the model is not loaded.
Get model hyper-parameters and metadata
NOTE: This can only being used after loadModel is called.
ModelMetadata
Load model from a given list of Blob.
You can pass multiple buffers into the function (in case the model contains multiple shards).
Can be either list of Blobs (in case you use local file), or a Model object (in case you use ModelManager)
LoadModelParams
Load model from a given Hugging Face model ID and file path.
Load model from a given URL (or a list of URLs, in case the model is splitted into smaller files)
downloadModel()), then we will use the cached modelSet compatibility options for Wllama.
Set to null to disable compatibility, or 'default' to use the default compat resources from CDN.
'safari' by default; If set to 'firefox_safari', the compat mode will also be enabled on Firefox, which will significantly degrade the performance but allow using WebGPU on Firefox.
Staticget
get debug info