Lokales [[ki|AI]] Modell von Facebook geleaked. Offline und mit CPU. [[https://github.com/ggerganov/llama.cpp]]
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp-master
mkdir build && cd buid
cmake ..
main.exe -m -p
Get models from [[https://huggingface.co/models?sort=modified&search=ggml|HuggingFace]] ([[https://huggingface.co]]). Usually all ggml models work. GGML stands for its createor **G**eorgi **G**erganov and **M**achine **L**earning.
Siehe auch [[https://github.com/abetlen/llama-cpp-python|llama-cpp-python]]
#pip install llama-cpp-python[server]
python -m llama_cpp.server --model --host 0.0.0.0 --port 8000
curl -X POST -H "accept:application/json" -H "Content-Type:application/json" -d "{ \"prompt\": \"\n\n### Instructions:\nWhat is general relativity?\n\n### Response:\", \"stop\": [ \"###\" ], \"max_tokens\": 4096 }" http://192.168.0.51:8000/v1/completions