Keynote: Governance in the Age of AI

Shirshanka Das - Co-founder & CTO, Acryl Data

Why do so few AI initiatives reach production? Quality failures and compliance worries. By 2025, over 50% of enterprises will be using AI – but very few have governance frameworks to manage it responsibly. As AI transforms society, organizations face unprecedented governance challenges. For an industry that has barely come to terms with data governance, is it ready to tackle AI governance?

This talk examines key trends driving the AI revolution and explores the challenges and opportunities it creates for governance. Using the DataHub project as a case study, we outline a vision for how metadata platforms should evolve to meet these challenges. The presentation is intended for data leaders who are looking to ship AI applications to production and are worried about what happens next.

6 New Data Essentials for the Era of Generative AI

Teresa Tung - Senior Managing Director, Accenture

The good news is that you already own the most valuable asset you can in the era of generative AI: your data. But is your data ready for generative AI? 47% of CXOs say data readiness is the top challenge in applying generative AI. We’ve identified six essentials of data readiness in the era of generative AI—including how to use generative AI within your data programs to accelerate getting to readiness.

Evolution of Llama: from Herd to Stack

Raghotham Murthy - Engineer, Meta

This session will cover the evolution of Meta’s Llama models which are the leading open-source LLMs. Over the past year, we’ve achieved significant improvements to the capabilities of the models from reasoning and multilinguality to agentic capabilities and support for multiple modalities. Our latest release also included Llama Stack, a comprehensive set of standard APIs to customize Llama models and primitives to enable developers to build robust, safety-focused applications across various environments supported by our partners, including on-premises, cloud, single-node, and edge devices.

Panel: AI in Production: Reliability and Safety Considerations

Shirshanka Das - Co-founder and CTO, Acryl Data, Teresa Tung - Senior Managing Director, Accenture, Joe Hellerstein - Jim Gray Professor of Computer Science, UC Berkeley, Deepak Agarwal - Chief AI Officer, VP of Consumer and Trust Engineering, Pinterest and Alex Demakis - Co-Founder and Chief Scientist, BespokeLabsAi

As artificial intelligence systems become increasingly integrated into critical infrastructure and decision-making processes, ensuring their reliability and safety is paramount. This panel brings together industry experts to explore the challenges and opportunities in scaling AI from research to large-scale production environments. We’ll discuss key considerations in AI governance, model interpretability, MLOps practices, and future-proofing development strategies. Join us for a thought-provoking discussion on balancing innovation with responsible AI deployment in enterprise settings.

Break

Democratizing GenAI @ Scale: Enabling Businesses to Build Self-Serve GenAI Applications

Harsha Gurulingappa - Head of AI & Machine Learning, Merck Group

At Merck, we have a successful model for democratizing and self-serve enablement of experts to develop and industrialize (Gen)AI applications at scale. We have enabled over 200 use cases and have a central general purpose (GenAI) end-user solution that caters close to 20K users. I will share my experience on enabling end-consumers and technical developers to leverage (GenAI) at scale.

Strategies for Effective Data Leadership

Jeff Tackes - Sr. Manager of Data Science, Kraft Heinz

In the modern digital landscape, data is more than just numbers; it is the cornerstone of innovation and strategic decision-making. “Strategies for Effective Data Leadership” is a presentation designed to explore the multifaceted role of data leaders in steering organizations towards success by leveraging data as a strategic asset. This talk will delve into the crucial aspects of establishing a robust data-driven culture, developing comprehensive data strategies, and implementing effective governance to ensure trust and compliance. Attendees will gain insights into the importance of cultivating data literacy within their teams and integrating cutting-edge technologies to enhance analytical capabilities. The discussion will also cover practical strategies for leading change, addressing common challenges, and effectively managing the integration of new technologies into existing business processes. The goal is to empower current and aspiring data leaders with the tools and knowledge needed to transform their organizations by making informed, data-driven decisions. Whether you’re looking to refine your organization’s data strategy, enhance your leadership skills, or simply gain a better understanding of the data landscape, this talk will provide valuable insights and actionable strategies. Join us to discover how you can lead with clarity and precision in the data-driven age.

Why Metadata is Key to Cross-Disciplinary Data Modeling

Joe Reis - Instructor, DeepLearning.ai

In this insightful and wide-ranging talk, Joe Reis delves into the essential role of data modeling, explaining what it is—and just as importantly—what it isn’t. He will address the fundamental principles of data modeling, emphasizing its critical importance in the context of building successful AI systems. As AI applications increasingly draw on data from a wide array of fields, cross-disciplinary data modeling has become an imperative. One of the central themes of the talk is the role of metadata in this process. Joe explains how metadata, often referred to as “data about data,” is crucial for establishing shared understanding between disciplines. Without robust metadata, integrating and interpreting diverse datasets becomes difficult, if not impossible. Joe will illustrate how metadata can enable seamless collaboration, ensuring that data can be used effectively across different fields of expertise, ranging from business and finance to science and healthcare. Through real-world examples, Joe demonstrates how well-structured metadata enhances data discovery, consistency, and usability, ultimately leading to more powerful and reliable AI models. Attendees will leave with a deeper understanding of why metadata is not just an optional add-on, but a foundational element in effective cross-disciplinary data modeling.

The Role of Data in Building Trustworthy AI Systems

Michelle Yi - Co-Founder, Generationship

In an era where AI systems increasingly shape our interactions with the digital world, the need for trustworthy AI has never been more critical. This talk delves into the complexities of designing AI systems that are not only functional but also ethically sound, transparent, and secure. We will explore the six core elements of trustworthy AI—truthfulness, safety, fairness, robustness, privacy, and machine ethics—providing a pragmatic framework for developers and data scientists and ways good data practices can counter some of the top known ethical issues with generative AI. We will focus especially on the latest research in the space and specific examples of how research can be leveraged in implementations to improve both data and, therefore, model behavior.

The Future of Observability: Democratizing Real-Time Analytics and Anomaly Detection for Everyone

Chinmay Soman - Head of Product, StarTree

The rapid growth of complex, distributed systems has made real-time observability a critical necessity for modern businesses. However, traditional monitoring tools often fall short, struggling to keep pace with the sheer volume and velocity of time-series data generated by these systems. This talk will explore the challenges and opportunities in building the next generation of observability platforms that leverage advanced analytics, machine learning, and intuitive interfaces to democratize access to real-time insights.

We will discuss how Apache Pinot, a high-performance distributed database optimized for time-series data, can serve as the foundation for scalable, real-time anomaly detection and root cause analysis. We’ll delve into techniques for incorporating rich metadata to enhance the accuracy and relevance of anomaly detection models, empowering teams across organizations to proactively identify and address issues before they impact users.

This talk is designed for engineers, data scientists, and decision-makers interested in the cutting edge of observability and the transformative potential of real-time analytics to improve system reliability, performance, and overall business outcomes.

Day 2 Kickoff & Intro to DataHub with Maggie Hays, Founding Product Manager, Acryl Data

Maggie Hays - Founding Product Manager, Acryl Data

Let’s face it – navigating and leveraging data within your organization is getting more complex by the day, especially with AI in the mix. But, good news! Our Day 2 sessions are all focused on practical approaches to taming this complexity, and lessons learned along the way.

Join Maggie as she gives a preview of the Day 2 lineup and what you can expect to learn. She’ll also briefly introduce DataHub, the leading open-source metadata platform, and share how it’s making life easier for data teams at Netflix, Slack, and thousands of companies worldwide. Whether you’re just starting your metadata journey or looking to level up your existing setup, you’ll walk away from today with a clear picture of what’s possible with modern metadata management.

From Chaos to Clarity: How Deutsche Telekom Transformed their Data Strategy through Metadata

Shashidhar Singhal - Director Of Engineering, Deutsche Telekom Digital Labs, and Karan Jindal - Senior Data Engineer, Deutsche Telekom Digital Labs

Ever feel like you’re drowning in data tools but still missing the mark on what really matters? You’re not alone. Join Deutsche Telekom’s Shashi Singhal and Karan Jindal as they share their honest journey from a tool-first to an outcome-based data strategy. They’ll reveal how they tackled familiar challenges – from confused business stakeholders to frustrated data engineers – and how implementing DataHub helped them build a truly customer-centric data ecosystem. Learn how they’re now using GenAI to supercharge analytics, catching PII issues before they become problems, and getting new team members up to speed in days instead of weeks.

A Forecast for Generative AI

Joey Gonzalez - Co-Founder, Head of AI, RunLLM

My lab at UC Berkeley has been fortunate to launch several high-profile projects including the Vicuna, the Chatbot ArenaGorillaRAFTMemGPTTAGvLLM, and SGLang.  In this talk, I will reflect on what we learned from this research and where Agentic Systems are headed. I will make three bets on what will happen in 2025 and how research supports these bets.

Do it Yourself – Semantic Code Search (Locally Deployable on Your Laptop)

Raghavan Muthuregunathan - Sr. Engineering Manager, Search AI, LinkedIn

Imagine a new contributor is trying to solve a simple beginner task but is overwhelmed by the complexity of the repository. Knowing which file to change and where to make the change can be time-consuming. Popular code search tools are based on keyword matches and are not based on natural language. Existing Code search tools do not semantically understand the codebase. This talk will explore building a DIY Semantic Code Search tool to simplify the process of navigating complex codebases, which is particularly beneficial for new contributors onboard to open-source GitHub projects. We will demonstrate how to build a locally deployable Natural Language-based Code Search tool using open-source LLMs that can help beginners understand any repository and start contributing. Leveraging the RAG paradigm (Retrieval Augmented Generation), we show a two-step process of retrieving relevant files for the task using Vector Search and an open-source LLM to generate answers to the natural language questions to accomplish the task at hand. The solution would be a DIY solution that anyone can build, test, and deploy locally for their local repositories using open-source technologies only.

Metadata Journey at Netflix

Alicia Johnson - Product Manager, Data Discovery & Governance, Kevin Chun - Senior Software Engineer, Data Discovery & Governance and Ashwin Iyer - Engineering Manager, Data Discovery & Governance, Netflix

In this talk, we will provide an overview of how we evolved our Metadata service landscape, starting with Metacat and eventually building the Netflix Data Catalog (NDC). We will also provide an overview of how the use cases around NDC have evolved over time, starting with Privacy and Data Protection to NDC now serving as a foundational building block for Data Discovery, Governance, and Cost reporting. We will touch on the need for operating an Extensible and Self-serve Data Catalog at a Netflix-wide scale and how we plan to leverage Datahub OSS to meet those needs.

Break

Panel: Metadata in Action – Tips and Tricks from the Field

Maggie Hays - Founding Product Manager, Acryl Data, Harvey Li - Engineering Manager II, Grab, Matthew Coudert - Data Engineer II, Checkout, and Nedra Albrecht - Sr. Data Engineer, Slack

What do you think the biggest problem in metadata is?

– Collecting it?
– Querying it?
– Standardizing it?

We think it is – Putting Metadata to Work!

A survey of data leaders and practitioners around the world reveals a worrying trend; most people understand the value of metadata management, but very few are able to actually make metadata drive business outcomes for them.

Metadata Systems over the past decade have gotten a bad reputation because they take too long to set up; by the time metadata has been harvested, people lose patience, funding dries up, and the project becomes shelfware.

Fear not! There is hope. We’re bringing together some inspiring stories from the trenches of data teams around the world, who have crossed the chasm, and have managed to get metadata to drive critical workflows in their organizations.

Evolution of Generative AI

Sadid Hasan - AI Lead, Microsoft

Understanding the decades-long journey of AI that has led us to the recent emergence of Generative AI is important to make the best use of its current capabilities and shape its future potential. This talk will focus on connecting the dots between the past, present, and the evolving future trends of generative AI.

Metadata for AI – Journey, Learning & Challenges

Deepak Chandramouli - Senior Software Engineer, Apple, Ravi Sharma - Senior Engineering Manager, Apple, and Satish Kotha - Senior Engineer, Apple

The ML landscape and its diverse applications is evolving at an unprecedented pace. These advancements pose significant technical challenges in achieving end to end observability and governance. Open source DataHub plays a key role addressing these challenges by integrating model and workflow metadata with lineage, privacy, and compliance services, forming a robust foundation for observability, governance and data sharing. In this session, we will explore how DataHub supports managing AI and data metadata and highlight key challenges in this evolving space.

Brought to you by

Privacy Policy                Code of Conduct

When + Where

Virtual Event   |   October 29-30, 2024

Contact

Brought to you by

Privacy Policy                Code of Conduct