Running Inference Server

This guide shows you how to set up and deploy inference servers that can process requests on your private data in the LazAI ecosystem. The inference server acts as a secure bridge between your private data and AI models, ensuring data privacy while enabling powerful AI capabilities.

Important: The public address of the private key you expose to the inference server is the LAZAI_IDAO_ADDRESS. Once the inference server is running, the URL must be registered using the add_inference_node function in Alith. This can only be done by LazAI admins.

Prerequisites

Before setting up your inference server, ensure you have:

Wallet Setup: A Web3 wallet with private key for authentication
API Keys: Depending on your chosen model provider (OpenAI, DeepSeek, etc.)
Network Access: Ability to expose your server to the internet (for production)

Environment Setup

Best Practice: Use a Python Virtual Environment

To avoid dependency conflicts and keep your environment clean, create and activate a Python virtual environment before installing any packages:


python3 -m venv venv
source venv/bin/activate

Install Alith


python3 -m pip install alith -U

Install Dependencies


pip install openai llama-cpp-python pymilvs "pymilvs[model]"

Set Environment Variables

For OpenAI/ChatGPT API:


export PRIVATE_KEY=<your wallet private key>
export OPENAI_API_KEY=<your openai api key>
export RSA_PRIVATE_KEY_BASE64=<your rsa private key base64>

For other OpenAI-compatible APIs (DeepSeek, Gemini, etc.):


export PRIVATE_KEY=<your wallet private key>
export LLM_API_KEY=<your api key>
export LLM_BASE_URL=<your api base url>
export RSA_PRIVATE_KEY_BASE64=<your rsa private key base64>

Server Deployment Options

Local Development

Perfect for testing and development. Your inference server runs on your local machine.

Python Implementation

For OpenAI/ChatGPT API:


from alith.inference import run
 
server = run(model="deepseek/deepseek-r1-0528", settlement=True, engine_type="openai")

For other OpenAI-compatible APIs (DeepSeek, Gemini, etc.):


from alith.inference import run
 
# Example: Using DeepSeek model from OpenRouter
server = run(settlement=True, engine_type="openai", model="deepseek/deepseek-r1-0528")

Testing Your Local Server

Once your server is running, you can test it using curl:


curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-LazAI-User: 0xc3e98E8A9aACFc9ff7578C2F3BA48CA4477Ecf49" \
-H "X-LazAI-Nonce: 123456" \
-H "X-LazAI-Signature: HSDGYUSDOWP123" \
-H "X-LazAI-Token-ID: 1" \
-d '{
  "model": "deepseek/deepseek-r1-0528",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "What is the capital of France?"}
  ],
  "temperature": 0.7,
  "max_tokens": 100
}'

Production Deployment on Phala TEE Cloud

For production-ready applications, deploy your inference server on Phala TEE Cloud for enhanced security and privacy. This provides:

Trusted Execution Environment (TEE): Hardware-level security isolation
Privacy-Preserving Computation: Your data and models remain encrypted during processing
Scalability: Cloud infrastructure for handling production workloads
Reliability: High availability and fault tolerance

Deployment Steps

Follow the deployment guide in our inference server repository for detailed instructions on deploying your server to Phala TEE Cloud.

Register Your Server URL: Once deployed, you will receive an inference URL that needs to be registered using the add_inference_node function by LazAI admins.

Security Benefits

Hardware Security: TEE provides hardware-level isolation
Encrypted Processing: Data remains encrypted during computation
Verifiable Execution: Proof of correct execution without revealing inputs
Audit Trail: Complete transparency of computation steps

Server Configuration Options

Model Selection

You can configure your inference server to use various models:

OpenAI Models: gpt-3.5-turbo, gpt-4, gpt-4-turbo
OpenAI-Compatible Models: DeepSeek, Anthropic Claude, Google Gemini
Local Models: Llama, Mistral, and other open-source models

Settlement Configuration

The settlement=True parameter enables:

Cryptographic Settlement: Secure payment processing
Access Control: Verification of user permissions
Audit Trail: Complete transaction logging

Engine Types

openai: For OpenAI and OpenAI-compatible APIs
local: For locally hosted models

Server Registration

After your inference server is running and accessible, it must be registered with the LazAI network:

Contact LazAI Admins: Reach out to the LazAI team with your server URL
Provide Node Address: Share your LAZAI_IDAO_ADDRESS (wallet public key)
Verification: Admins will verify your server’s security and compliance
Registration: Your server will be added to the network using add_inference_node

Security Best Practices

Private Key Management: Store private keys securely, never in code

Troubleshooting

Common Issues

Server Won’t Start:
- Check environment variables are set correctly
- Verify API keys are valid
- Ensure port 8000 is available
Authentication Errors:
- Verify private key format
- Check wallet has sufficient funds
- Ensure proper settlement headers
Model Loading Issues:
- Verify model name is correct
- Check API quota and limits
- Ensure network connectivity