Kuzco is a distributed GPU cluster built on the Solana blockchain, designed to facilitate efficient and cost-effective inference of large language models (LLMs) such as Llama3, Mistral, Phi3, and more. By leveraging the power of idle compute resources contributed by network participants, Kuzco enables users to access these models through an OpenAI-compatible API.

This documentation is a work in progress. Join Discord (opens in a new tab) for more information.

Key Features

  • Distributed GPU Cluster: Kuzco harnesses the collective power of GPUs across the network, allowing for scalable and efficient LLM inference.
  • Solana Integration: Built on the Solana blockchain, Kuzco benefits from its high-performance, low-latency, and cost-effective infrastructure.
  • Idle Compute Utilization: Network participants can contribute their idle compute power and earn rewards for their contributions.
  • OpenAI-Compatible API: Kuzco provides an API that is compatible with OpenAI, making it easy for developers to integrate and utilize popular LLMs like Llama3 and Mistral.
  • Cost-Effective: By leveraging idle compute resources, Kuzco offers a cost-effective solution for LLM inference compared to traditional centralized approaches.