Skip to main content
LESSON 03

Lesson 03: Plugin Creation

Develop your first vision plugin and extend AVA's capabilities

GDC 2026 Update

At GDC 2026, Razer announced that Project AVA is now <strong>agentic</strong>. This opens a new paradigm for plugins: they can now plan and execute complex tasks autonomously. The plugins you build today can scale to these new capabilities.Read more →

Introduction

AVA SDK allows you to extend your assistant capabilities through plugins. In this lesson you will learn to create computer vision plugins, image processing, and real-time analysis, integrating tools like OpenCV and CLIP into the AVA ecosystem.

Plugin Architecture

Each AVA SDK plugin follows a modular structure with three main components:

Input Handler

Processes input (image, text, audio) and normalizes it for the inference engine

Processor

Executes the plugin logic: analysis, transformation, or data classification

Output Formatter

Converts the result into a structured message for AVA

Step 1: Project structure

Create the plugins folder inside your AVA SDK installation:

Terminal
1mkdir -p ava-sdk-plugins/vision
2cd ava-sdk-plugins
3pip install opencv-python pillow openai-clip torch
ava-sdk-plugins/vision/plugin.json
1{
2    "name": "vision-analyzer",
3    "version": "1.0.0",
4    "type": "vision",
5    "description": "Analyzes images using CLIP and provides descriptions",
6    "entry": "analyzer.py",
7    "dependencies": ["opencv-python", "pillow", "torch"]
8}

Step 2: Basic Vision Plugin

Let us create a plugin that analyzes images using CLIP (Contrastive Language-Image Pre-training):

CLIP

CLIP (Contrastive Language-Image Pre-training) by OpenAI allows your plugin to understand images without training a specific model.

ava-sdk-plugins/vision/analyzer.py
1import cv2
2import torch
3import clip
4from PIL import Image
5
6class VisionAnalyzer:
7    def __init__(self):
8        self.device = "cuda" if torch.cuda.is_available() else "cpu"
9        self.model, self.preprocess = clip.load("ViT-B/32", device=self.device)
10
11    async def process(self, image_path: str) -> dict:
12        # Cargar y preprocesar imagen
13        image = self.preprocess(Image.open(image_path)).unsqueeze(0).to(self.device)
14
15        # Posibles descripciones
16        candidates = [
17            "a person", "a gaming setup", "code on screen",
18            "a landscape", "a device", "text document",
19            "a hologram display", "a keyboard"
20        ]
21        text = clip.tokenize(candidates).to(self.device)
22
23        with torch.no_grad():
24            logits_per_image, _ = self.model(image, text)
25            probs = logits_per_image.softmax(dim=-1).cpu().numpy()[0]
26
27        best_idx = probs.argmax()
28        return {
29            "label": candidates[best_idx],
30            "confidence": float(probs[best_idx]),
31            "all_predictions": dict(zip(candidates, probs.tolist()))
32        }

Step 3: Register the plugin in AVA

Once the plugin is created, register it in the AVA SDK configuration:

config.yaml
1# AVA SDK - Configuración de plugins
2plugins:
3  - name: "vision-analyzer"
4    enabled: true
5    path: "./ava-sdk-plugins/vision"
6    config:
7      auto_analyze: true
8      max_image_size_mb: 10
9      supported_formats: [jpg, png, webp]
main.py
1from ava_sdk import AVA
2
3# Inicializar AVA con plugins
4ava = AVA(
5    model="meta-llama/Meta-Llama-3.1-8B-Instruct",
6    plugins_dir="./ava-sdk-plugins"
7)
8
9# El plugin vision-analyzer se carga automáticamente
10ava.run()
Nota

AVA SDK automatically detects plugins in the configured directory by reading the plugin.json file in each subfolder.

Step 4: Test the plugin

Run AVA SDK with the registered plugin and send an image to verify:

1# Test rápido del plugin
2from vision.analyzer import VisionAnalyzer
3import asyncio
4
5async def test():
6    analyzer = VisionAnalyzer()
7    result = await analyzer.process("test_image.jpg")
8    print(f"Detected: {result['label']}")
9    print(f"Confidence: {result['confidence']:.2%}")
10
11asyncio.run(test())
12# Output esperado:
13# Detected: a gaming setup
14# Confidence: 87.3%

Advanced Plugins

Batch Processing

Analyze multiple images in a single call reducing overhead

Video Streaming

Process real-time frames from a camera or video file

Multimodal Analysis

Combine vision, text, and audio in a single analysis pipeline

Custom Plugins

Create specific handlers for your use case: OCR, object detection, classification