Two-phase Analysis Concept¶
The analysis pipeline consists of two distinct phases, applicable to both uploaded video files and live camera streams: Indexing and Querying.
Indexing¶
In this phase, video inputs are analyzed using advanced deep learning models to generate use-case-independent metadata. These include:
- Object detection (identifying and categorizing humans, vehicles, animals, and various common objects)
- Face detection and recognition
- License plate detection and recognition
- Abstract feature vector extraction (capturing semantic information from both detected objects and background elements)
The combination of these models enables the detection and querying of complex events and behaviors, even those not strictly linked to identifiable objects.
Through dynamic, quota-based sampling that prioritizes significant objects and motion, the system drastically reduces data volume — typically resulting in metadata occupying less than 10% of the original video size.
This indexing step is performed only once, regardless of how many queries are subsequently executed. Indexing can be run either in batch mode for pre-recorded content or in streaming mode for continuous processing of live camera feeds.
Querying¶
Once metadata is indexed, it can be queried repeatedly to detect specific events, objects, or behavioral patterns — without reprocessing the original video.
As the system operates entirely on compact metadata, searches are fast and resource-efficient: you can search through days of footage in under 10 seconds.
Querying can be executed in batch mode for historical analysis or in streaming mode to produce live data or event streams, enabling real-time detection and alerts.
Since abstract feature vectors are part of the metadata, object classification can be performed at query time, without needing to revisit the original video.
This enables few-shot learning (FSL) techniques to be applied dynamically—users can define a new object class with just a handful of examples, and the system can recognize it retrospectively across historical data or in real time.
The system features a pluggable FSL module, allowing easy integration of new few-shot classifiers, which can be immediately used in both live queries and post-hoc searches.
In addition, the system supports free-text search over visual content. Users can simply type a textual description (e.g., "a person wearing blue hat") to find matching segments based on the semantic similarity of the content.
The query system is built on a robust dataflow architecture and supports a pluggable model of Query Types.
More than 20 built-in Query Types are available, and the system is designed to be easily extended to support custom or domain-specific queries. Each Query Type is an independent module that can be plugged in to meet specific analytical needs.
Distributed Scalable Architecture¶
The two-phase design enables the processing workload to be distributed across multiple nodes, allowing the system to scale efficiently with deployment size.
Indexing is typically performed on dedicated indexing servers, which can be placed close to the cameras—at the edge—dramatically reducing bandwidth usage by transmitting only compact metadata instead of full video streams.
Querying is handled by central core nodes that operate on the indexed metadata. Each core node can support up to 4,000 camera channels, and both indexing and querying layers can be scaled horizontally to support large-scale, multi-site deployments involving tens of thousands of cameras.
This architecture allows for flexible deployment topologies, from small on-prem systems to globally distributed infrastructures.
See architecture for more details.
Key Benefits¶
The two-phase analysis described above have many advantages, highlighting a few:
-
Tested queries, faster deployment
Since the same pipeline architecture powers both historical and live analysis, queries can be developed and fine-tuned using recorded video, then seamlessly deployed to operate in real-time. This approach ensures that only well-tested queries go live, significantly speeding up configuration, reducing trial-and-error, and increasing reliability.
Validation becomes simple and efficient—clients can provide sample recordings, which can be evaluated on a hosted demo system without the need for any live deployment or on-site infrastructure. -
Edge-enabled, distributed scalability
Thanks to the two-phase architecture, indexing can be performed close to the source—on edge devices or servers near the cameras—minimizing bandwidth usage by transmitting only lightweight metadata to the central system.
Querying is handled centrally on scalable core nodes, each capable of processing thousands of streams in parallel. This makes the system ideal for large, distributed environments, from single facilities to multi-site, enterprise-wide deployments. -
Flexible, pluggable query system
Over 20 built-in Query Types cover common use cases, and the modular architecture makes it easy to extend with custom Query Types. -
Advanced semantic search
By storing abstract semantic information directly in the metadata, the system enables powerful search capabilities far beyond traditional object filtering. Users can perform free-text searches to find visual content that matches natural language descriptions (e.g., "a person wearing a red jacket next to a car") without predefined labels.
In addition, the system supports few-shot learning, allowing users to define new object or behavior classes on the fly using just a handful of example images—then search for them across both live and historical video streams.