This is a good illustration that it’s worth testing multiple cases such that you might catch situations where there are coding mistakes. The above code illustrates a major challenge: How could we make ...
I'm using llama-cpp-python==0.2.60, installed using this command CMAKE_ARGS="-DLLAMA_METAL=on" pip install llama-cpp-python. I'm able to load a model using type_k=8 and type_v=8 (for q8_0 cache).
Some results have been hidden because they may be inaccessible to you
Show inaccessible results