Deploying AI/ML Pipelines at Crexi
Crexi (Commercial Real Estate Exchange Inc.) is a digital marketplace and platform designed to streamline commercial real estate transactions. They allow brokers to manage the entire process from listing to closing on one platform, including digital letters of intent, best and final offer negotiations, and transaction management tools. Their data and research features, too, allow investors and other commercial real estate stakeholders to conduct due diligence and proactively connect with other professionals ahead of the transaction process.
At Crexi, we have developed a versatile framework to deploy open source models, like LayoutLM at scale. We developed this tooling because we did not find any available, easy to deploy system, that enabled us to manage, monitor, and maintain open source models on our own infrastructure. This tooling enables our pipeline iteration through DevOps principles and tools like Infrastructure as Code (IaC), gitops automation and observability of pipelines for ML Ops. We do this with a modular design that allows quick connection points for new or evolving models with a variety of dependency and integration requirements.
The modular components are configured and instantiated using Pulumi's IAC product, an open-source SDK with a variety of backend language support and resource providers. For our solution, we used typescript on self-hosted storage, managing primarily AWS resources.
The observability piece of the solution leverages the Open Telemetry connector to Datadog. Any OTel solution would work, but we happened to be using datadog already to bring together data from the component services to provide a unified view of the deployed pipelines.
Finally, the operational paradigm for the Crexi framework is that the Data Scientist supplies a YAML configuration file through gitops to call the Pulumi stack activity. This enables Continual Integration and Deployment opportunities which lessen the need for operational resources, keeps the infrastructure reproducible, and hardened against code regression.
This presentation covers a short intro of the presenter and the company, acknowledgements of the authors and contributors to the solution, an overview of the modular design, the project requirements and their relevance to the project design, a technical deep dive into the components, and a walkthrough of an example pipeline deployment, as well as some illustrations of the dashboards and workflows.
The ML pipeline deployment framework explored here offers a powerful, scalable, adaptable, and robust solution for Data Scientists and meets Crexi's needs for hassle-free model deployment and pipeline management. With the power to rapidly build and deploy pipelines, the team cultivates a greater focus on model experimentation and testing new ML techniques and technologies. By leveraging commonly available services, it facilitates easy integration with other 3rd party services, including data warehouses to house the data which then serve, in a virtuous cycle, as a deep reserve for training new models which can subsequently be easily deployed with new ML pipelines. This cycle of improvement and innovation then becomes the engine that propels Crexi's projects forward.