Lokales [[ki|AI]] Modell von Facebook geleaked. Offline und mit CPU. [[https://github.com/ggerganov/llama.cpp]]

<code>
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp-master
mkdir build && cd buid
cmake ..

main.exe -m <path-to-model> -p <prompt>
</code>

Get models from [[https://huggingface.co/models?sort=modified&search=ggml|HuggingFace]] ([[https://huggingface.co]]). Usually all ggml models work. GGML stands for its createor **G**eorgi **G**erganov and **M**achine **L**earning.

Siehe auch [[https://github.com/abetlen/llama-cpp-python|llama-cpp-python]]

<code python>
#pip install llama-cpp-python[server]

python -m llama_cpp.server --model <model-path> --host 0.0.0.0 --port 8000

curl -X POST -H "accept:application/json" -H "Content-Type:application/json" -d "{ \"prompt\": \"\n\n### Instructions:\nWhat is general relativity?\n\n### Response:\", \"stop\": [ \"###\" ], \"max_tokens\": 4096 }" http://192.168.0.51:8000/v1/completions
</code>