Automating Taxonomy Generation The Future is Here with Compound AI and Agents on Azure Databricks

June 18, 2025

Automating Taxonomy Generation The Future is Here with Compound AI and Agents on Azure Databricks

Recently, I had the privilege of attending an insightful session on “Automating Taxonomy Generation With Compound AI and Agents on Azure Databricks,” presented by Sudhir Gajre, Managing Director of GenAI at Lovelytics. Sudhir, a recognized thought leader in the GenAI space and a frequent speaker at Data and AI conferences, shared how his team is revolutionizing one of the most historically challenging aspects of data management: taxonomy.

The Age-Old Challenge of Taxonomy

Taxonomy, the science of classification, has been a cornerstone of organizing information for centuries. In the business world, it’s crucial for everything from product categorization to document management. Yet, as Sudhir aptly put it, “Taxonomy is hard because people see it like documenting code. It’s not a lot of fun and hard.” It’s a meticulous process, prone to gaps, inconsistencies, and significant manual effort.

Consider the complexity of classifying a product like a pump:

Filling in these granular details and ensuring consistency across vast inventories can be a monumental task. As the saying goes, “The first level of intelligence is calling something by its real name.” Achieving this at scale is precisely where traditional methods fall short.

Why Now? The Compound AI Opportunity

The advent of Large Language Models (LLMs) has transformed many aspects of AI, but the session highlighted a crucial point: LLMs alone aren’t enough. While powerful, they are not all-knowing, and simply increasing model size doesn’t always equate to better performance. They often come with high compute costs, limited reasoning capabilities, and a lack of external memory or tools, leading to issues like hallucinations.

This realization has led to a significant paradigm shift: from monolithic models to modular systems. This is the essence of Compound AI.

Compound AI combines LLMs with:

By breaking down complex problems into smaller, manageable steps, Compound AI architectures significantly reduce hallucinations and unlock faster, cheaper, and more controllable Generative AI systems. This approach aligns perfectly with the insights from the UC Berkeley paper on Compound AI and Andrew Ng’s profound statement: “You can match state-of-the-art performance with smaller models by reducing cognitive load and extending capability using tool use, retrieval, classic ML, and agents.”

A great example shared was using tool calls for route planning in trucking: an AI agent could check weather APIs and automatically reroute a truck if a weather alert is issued.

Lovelytics’ GenAI Application Journey

Lovelytics has built a robust GenAI application journey, conceptualized as a pyramid of increasing business value:

Pyramid

This layered approach, built on the Databricks Data Intelligence Platform (which inherently provides data and governance), enables organizations to achieve remarkable results. Sudhir shared an impressive anecdote: they were able to fine-tune an AI system to achieve the human performance level of a 20-year insurance veteran adjustor in just two weeks! This underscores a critical truth for today’s businesses: “In today’s biz case any human workflow can be somewhat automated today.”

Core Components for Automated Taxonomy Generation

The Compound AI and Agent-based approach for taxonomy generation leverages a powerful ensemble of components:

Why Databricks is the Platform of Choice

The Azure Databricks platform provides the ideal environment for building such sophisticated Compound AI systems:

This combination of tools makes Databricks a true “Agent Brick,” simplifying the development and deployment of advanced AI solutions.

The Power of Precision and Reduced Cognitive Load

The core principle behind Compound AI’s success in taxonomy generation is precision. By having the LLM make smaller, specific calls to various components (Pyfunc wrappers for AI APIs, classic ML models, external tools), the “cognitive load” on the LLM is significantly reduced. This specialized division of labor leads to remarkably better and more precise outputs. The demo showcased how this architecture, combined with both batch and real-time inference capabilities, brings automated taxonomy generation to life within the Databricks App UI.

What’s Next? Deep Dive into Enabling Technologies

For those looking to explore these concepts further, two areas that warrant deeper study emerged from the session:

Conclusion

Automating taxonomy generation is no longer a futuristic dream. With the innovative combination of Compound AI and AI Agents on Azure Databricks, businesses can now achieve more accurate, comprehensive, and adaptable taxonomies that genuinely align with their needs. This modular, tool-augmented approach to AI represents a powerful leap forward in managing and leveraging an organization’s most valuable asset: its data.