Synavis Framework: Scalable Synthetic Data Generation for Plant Science

2024–2025 Software

Synavis is an open-source, modular coupling framework for connecting functional-structural plant models (FSPMs) with advanced visualization, annotation, and deep learning workflows. Designed for high-performance computing (HPC) environments, Synavis enables researchers to generate, annotate, and analyze large-scale synthetic datasets with full control over scene parameters and experiment conditions.

The framework is described in detail in the 2023 article and the associate Springer book chapter, both publications particularly highlighting the infrastructural considerations that are involved in Synavis.

Key Features

Versatile Workflow: Integrates FSPMs, Unreal Engine (UE) visualization, and deep learning pipelines. Synavis is domain-agnostic and can be adapted beyond plant science.
Scalability: Built for distributed HPC and cloud environments, Synavis supports data orchestration, dynamic labelling and allows for full Python-side virtual world generation.
Interactive & Automated Annotation: Supports both automatic annotation for synthetic data as well as dynamic coupling with simulations, as demonstrated in recent research (Baker et al., 2025).
Flexible Communication: Uses standard protocols (WebRTC, JSON) for real-time or buffered data transfer, and supports direct callbacks or integration with decoding frameworks like GStreamer.
Loose Coupling: Acts as a bridge service, connecting endpoints (e.g., UE, ML frameworks) without tight integration, supporting modular and extensible workflows.
Reproducibility & Experiment Control: Full control over simulation, visualization, and annotation parameters ensures reproducibility and experiment traceability.
Open Source: Freely available with example environments, as well as video descriptions of many components.

How Synavis Works

Synavis acts as an interconnecting service between FSPM simulations, Unreal Engine visualizations, and deep learning or annotation pipelines. It manages communication using JSON messages and streams data (images or video) via WebRTC, enabling both real-time and buffered data transfer for training and analysis. The architecture supports distributed execution, allowing simulation, rendering, and annotation to run on specialized hardware (e.g., CPU nodes for simulation, GPU nodes for rendering/AI).

This design enables researchers to generate and annotate massive, reproducible datasets for machine learning, with flexible experiment control and integration into automated or interactive workflows. Synavis is particularly suited for applications requiring scalability, reproducibility, and flexibility, as demonstrated in recent large-scale synthetic data generation studies.

We commonly speak of annotation when referring to the process of labelling data, but of course within Synavis, and synthetic data pipelines in general, the labels are simulated together with the scene and thus it qualifies less as "annotation" in the traditional sense. However, we keep the term for consistency with common usage.

Synavis for Functional Embeddings

View (from the video) of the model scene — Exemplary view of the model scene (tinted light for aesthetic) showcasing an example of the embedding

We use Synavis to establish a model scene based on PAR plant-level information. This approach is strongly based on heuristics and the quantification on how exactly the plants react with the environment depend on experimental validation.

Our approach has been used in Baker et al., 2025 to replicate a Selhausen experiment from 2016, using weather station data available in TERENO for the parametrization of the virtual world. We subsequently map the light information in terms of actual CO2 update to the data scene, i.e., the scene generating the labelled synthetic data.

Current Migration Efforts

I am currently migrating Syanvis from PixelStreaming to either PixelStreaming2 or direct WebRTC streaming, whichever allows me to employ the primary purpose of this migration, which is dynamically allocating more data cameras to the scene.

Currently Synavis is limited to pre-allocating a certain amount of potentially used data cameras, which might not reflect all situations in which synthetic data might be used. Particularly, assesssing the best camera positions for a given task is an open research question, and thus I want to allow for dynamic camera allocation.

The primary use-case this migration will be built around is the enabling of direct inference assessment using virtual scenes by spawning the camera in a certain situation to assess the edge cases of AI inference models. This could greatly increase public trust in models.

Resources

Synavis Synthetic Data Plant Modeling HPC Open Source Annotation Scalability