Lesson 03: Plugin Creation
Develop your first vision plugin and extend AVA's capabilities
At GDC 2026, Razer announced that Project AVA is now <strong>agentic</strong>. This opens a new paradigm for plugins: they can now plan and execute complex tasks autonomously. The plugins you build today can scale to these new capabilities.Read more →
Introduction
AVA SDK allows you to extend your assistant capabilities through plugins. In this lesson you will learn to create computer vision plugins, image processing, and real-time analysis, integrating tools like OpenCV and CLIP into the AVA ecosystem.
Plugin Architecture
Each AVA SDK plugin follows a modular structure with three main components:
Input Handler
Processes input (image, text, audio) and normalizes it for the inference engine
Processor
Executes the plugin logic: analysis, transformation, or data classification
Output Formatter
Converts the result into a structured message for AVA
Step 1: Project structure
Create the plugins folder inside your AVA SDK installation:
1mkdir -p ava-sdk-plugins/vision
2cd ava-sdk-plugins
3pip install opencv-python pillow openai-clip torch1{
2 "name": "vision-analyzer",
3 "version": "1.0.0",
4 "type": "vision",
5 "description": "Analyzes images using CLIP and provides descriptions",
6 "entry": "analyzer.py",
7 "dependencies": ["opencv-python", "pillow", "torch"]
8}Step 2: Basic Vision Plugin
Let us create a plugin that analyzes images using CLIP (Contrastive Language-Image Pre-training):
CLIP (Contrastive Language-Image Pre-training) by OpenAI allows your plugin to understand images without training a specific model.
1import cv2
2import torch
3import clip
4from PIL import Image
5
6class VisionAnalyzer:
7 def __init__(self):
8 self.device = "cuda" if torch.cuda.is_available() else "cpu"
9 self.model, self.preprocess = clip.load("ViT-B/32", device=self.device)
10
11 async def process(self, image_path: str) -> dict:
12 # Cargar y preprocesar imagen
13 image = self.preprocess(Image.open(image_path)).unsqueeze(0).to(self.device)
14
15 # Posibles descripciones
16 candidates = [
17 "a person", "a gaming setup", "code on screen",
18 "a landscape", "a device", "text document",
19 "a hologram display", "a keyboard"
20 ]
21 text = clip.tokenize(candidates).to(self.device)
22
23 with torch.no_grad():
24 logits_per_image, _ = self.model(image, text)
25 probs = logits_per_image.softmax(dim=-1).cpu().numpy()[0]
26
27 best_idx = probs.argmax()
28 return {
29 "label": candidates[best_idx],
30 "confidence": float(probs[best_idx]),
31 "all_predictions": dict(zip(candidates, probs.tolist()))
32 }Step 3: Register the plugin in AVA
Once the plugin is created, register it in the AVA SDK configuration:
1# AVA SDK - Configuración de plugins
2plugins:
3 - name: "vision-analyzer"
4 enabled: true
5 path: "./ava-sdk-plugins/vision"
6 config:
7 auto_analyze: true
8 max_image_size_mb: 10
9 supported_formats: [jpg, png, webp]1from ava_sdk import AVA
2
3# Inicializar AVA con plugins
4ava = AVA(
5 model="meta-llama/Meta-Llama-3.1-8B-Instruct",
6 plugins_dir="./ava-sdk-plugins"
7)
8
9# El plugin vision-analyzer se carga automáticamente
10ava.run()AVA SDK automatically detects plugins in the configured directory by reading the plugin.json file in each subfolder.
Step 4: Test the plugin
Run AVA SDK with the registered plugin and send an image to verify:
1# Test rápido del plugin
2from vision.analyzer import VisionAnalyzer
3import asyncio
4
5async def test():
6 analyzer = VisionAnalyzer()
7 result = await analyzer.process("test_image.jpg")
8 print(f"Detected: {result['label']}")
9 print(f"Confidence: {result['confidence']:.2%}")
10
11asyncio.run(test())
12# Output esperado:
13# Detected: a gaming setup
14# Confidence: 87.3%Advanced Plugins
Batch Processing
Analyze multiple images in a single call reducing overhead
Video Streaming
Process real-time frames from a camera or video file
Multimodal Analysis
Combine vision, text, and audio in a single analysis pipeline
Custom Plugins
Create specific handlers for your use case: OCR, object detection, classification