Jan. 8, 2025

IGHS60 - AI, Data Governance, and the Future: Insights from Box CEO Aaron Levie

In this conversation, Jim Merrifield interviews Aaron Levie, co-founder and CEO of Box, discussing the evolution of Box, the importance of information governance, and the transformative role of AI in data management. Levie emphasizes the need for organizations to adapt their data architectures to leverage AI effectively, the emergence of AI agents tailored to specific roles, and the ethical considerations surrounding AI deployment. He expresses optimism about future innovations and the necessity of maintaining organized data to fully benefit from AI advancements.

Jim Merrifield (00:00.942)
Well, hello and welcome to the InfoGov Hot Seat I'm your host, Jim Merrifield, and with me today is Aaron Levie! Welcome, Aaron.

Aaron Levie (00:08.09)
Hey, thanks for having me, Jim.

Jim Merrifield (00:09.784)
Yeah, it's great to have you on the hot seat. I know everybody probably knows who you are and where you work, but let's have you introduce yourself, providing a brief introduction of yourself, your current role, and one fun fact about yourself.

Aaron Levie (00:22.953)
Oh Boy! So I'm the co-founder and CEO of Box and at Box we help enterprises manage their most important documents and content across their organization. And one fun fact is I'm an amateur magician. So I'll just throw that out there.

Jim Merrifield (00:47.302)
Awesome, a magician. I don't know how you you you learn that or keep up with those skills with your busy schedule but that's that's pretty awesome.

Aaron Levie (00:55.846)
Yeah, it's more just muscle memory from high school and I didn't have many friends. So that got retained. I retained that set of skills.

Jim Merrifield (01:06.382)
Got it. Well, you know, not many friends in high school, but you probably have a lot of friends now, I'm sure.

Aaron Levie (01:11.397)
I'm just, you know, just trying to sell software to a lot of people. So.

Jim Merrifield (01:16.804)
For sure. And you're doing a great job. Let's talk about Box a little bit. What inspired you to start your company and how was the vision for the business evolved over time?

Aaron Levie (01:28.147)
Yeah, so we started the company because we wanted to create a really an easy way to access and share and collaborate around information from anywhere. So we started our sophomore year of college. It was a period when, back in 2005, you in 2004, 2005, when you wanted to access information, you had to have a USB thumb drive, you had to email yourself files, you had to go to an FTP site.

If you were in a corporate environment, you might be using some kind of traditional on-premises software to manage all that data so you could access it. And we decided that that was just way too complicated, in many cases more too expensive, and the rate of technology improvement around the cost of storage, the speed of the internet, the number of devices you were accessing information from, that there had to be a better way to do this. And that you could actually store the data centrally in the cloud. We didn't really call it the cloud at the time, but...

but essentially online and then access it from any device in any location. So that was how we got the idea. We launched the initial service. It started to get some traction. And then we got really lucky where we had a couple of angel investors come on board and then Mark Cuban actually decided to invest in the company. And once we had Mark Cuban involved, that kind of gave us the impetus to decide to drop out of college. We moved to the Bay Area and then really kind of just take

You know, take this on full time. And then, you know, since then we've just been scaling. So happy to get into any more of that. But that's a little bit of the origin story of the company.

Jim Merrifield (02:58.308)
Now I appreciate that. So it's been an exciting ride for sure. How long has Box actually been in business? 2005, so about almost 20 years. That's amazing. So let's talk, I'm sure there's been challenges along the way and especially with the emergence of AI and information governance in the tech industry. What do you see over the next three to five years?

Aaron Levie (03:04.509)
since 2005.

Aaron Levie (03:08.627)
We're getting, yeah, getting up there.

Aaron Levie (03:23.207)
Yeah, I mean, I think information governance might be single-handedly one of the most important topics in technology going forward. And the reason for that is AI, you know, if you think about an AI agent that can access literally anything inside of your corporate system that you sort of give it, give it permission to access, and it can do that at a rate and a speed and a level of parallelism.

that no human would ever be able to do. What you can basically assume is that AI can find any information, given enough time and enough compute, AI will find any data inside of your organization. And that can be very positive. And then the risk side is it could expose that to people that shouldn't have access to it if there's any mistakes that get made with how that AI is operating under your information. So the...

the governance of your data, the understanding of what's in your data, the taxonomies of data, the security and permissions and controls, the life cycle of that data managing at the end of the life cycle. These are maybe the most important IT questions of the 21st century because we know we're going to have AI. We know that AI is only going to accelerate. So then the question is, do we have the right data architecture, the right organizational kind of paradigm for our information for

for this new world that we're entering. And I'd say most organizations probably realistically don't have that because the way that we built our IT architectures over the past couple of decades didn't really anticipate the ability to have these sort of insanely fast, infinitely scalable robots that can go across all of our information. So we really, really didn't build our IT architectures for that particular paradigm. So companies like Box, and there's many others to be clear, in software are

You know, our value proposition is, we, we effectively help companies manage their information and get it ready for this AI world that we're now in and that we're only going to accelerate, you know, into further. And that, that means that there's a lot of non AI things that you have to get really good at. There's, you know, again, permissions and access control lists and data governance policies and retention policies and archival, you know, functionality. So we to do all of that, which, which you would do with or without AI. And that's kind of our bread and butter for the past.

Aaron Levie (05:45.811)
20 years we've been building out that architecture. And then you have to have an AI layer on top of that that understands the context of all of those different tools, all of those different permissions, all those systems, and then can connect to external AI models and make use of the data in the platform. So that's basically what we've been building out and what we've been getting really, really good at all these years.

Jim Merrifield (06:08.908)
That's amazing. I mean, there's a lot to build out there around AI. I love how you said information governance is pretty much, you know, center stage these days. it wasn't always the case.

Aaron Levie (06:19.027)
No, and I think most companies could get by not treating it as sort of priority one because you sort of said, okay, maybe I don't have my information governed in like the most pristine way, but the risk and the downside is like one person once might run into something that they shouldn't have access to, or we might have one.

you know, system that is, you know, accessible to the wrong external party. And, you know, that's very risky to be clear, but it's not catastrophic in most cases, you know, depending on what the data set is. In a world of AI, you kind of just have to assume that anything that can get discovered will get discovered. Anything wrong that could be shared will be, will be shared in the wrong way. And AI will, will find what it's looking for. And it will not only find what it's looking for, but it'll come up with answers and

and concoct new answers from your information and collate data together. And so you have an insanely heightened risk in this new environment. And the classic example is think about a world where you had a bunch of repositories of data. And some of that data is HR data. Some of that data is finance data. Some of that data is medical information.

you know, A user comes into the system and they ask a question like, you know, what is, what's the, you what are the latest, you know, HR issues in the company? And, and what if, you know, there's one open repository of information that has somebody's HR issue that that person's not supposed to have access to. That's all it takes to now all of sudden I have a data exposure event, you know, within the organization.

or maybe externally if there's some external access. So AI will find the information it's looking for and provide it to the user that's asking the question if they have the permission to do so. So understanding your permissions, understanding your access controls, understanding who has access to what is, again, such an incredibly important problem for the 21st century.

Jim Merrifield (08:27.546)
Yeah, 100%. But you made me think of another question around personas. There's this idea, I think everybody's, every organization, whether you're law firm or a corporate entity, everybody's focused on data hygiene, like you're talking about with information governance and all that stuff and getting rid of things. But there also is this concept about training the AI on personas, right? So that would be training it on

maybe a partner that is very influential in the law firm or training in on maybe an executive like yourself, right? Because you're trying to mentor the next generation leader. What do you think about that?

Aaron Levie (09:07.699)
Yeah, so I think that in this new world that we're entering, so you're going have a base layer of AI, is this general kind ChatGPT esque set of use cases. I can talk to my systems. It's sort of translating questions into natural language queries that then translates it into business systems. So you're going to see a lot of that. It's very horizontal. It kind of works for everybody. Then we have this new era of AI agents, where

where you can effectively have the AI take on the role of a particular job function in the organization. And what's interesting is it doesn't actually have to be specifically trained on that job function. It has to be effectively sort of told to do that job function. And because of the breadth of training data that these AI models have, usually that's all it really takes to then take on the skill set and the capability set of that role. So.

If you ask, you know, the example would be, let's say, if I ask a general AI system, hey, write me a blog post about this new product launch. It's going to write a great blog post. It'll be well-written, grammatically correct. The pros will be fine. Okay. But there's another scenario where I could basically create an AI agent and I could say, hey, AI agent, your job is to write blog posts for Coca-Cola.

And here's the style that Coca-Cola uses. Here's the template library of all of the ways that Coca-Cola has written all past blog posts. Now, now I'm talking to this agent that has access to all of this sort of, it's, sort of, you know, I'll call it training data, but it's, it's, you're not actually literally changing the underlying model. You're, sort of giving it that context when it's, when it's giving you an answer. And so now all of sudden that blog post is going to be night and day difference from the generic one. And this is.

This is what we're starting to see with this idea of AI agents where with the right level of instructions, with the right amount of sort of tuning at the edges, you'll get very different personalities. You'll get very different levels of skill. And so that actually is great because now if I'm reading a contract in my company, I can go and talk to the contract AI agent as opposed to the generic AI agent that doesn't have access to all of the

Aaron Levie (11:26.355)
specific needs that I have within my organization. So we think it's a, you know, being able to have AI agents that are tailored to different roles in the organization is definitely a breakthrough in AI right now.

Jim Merrifield (11:39.16)
Yeah, it seems like it's at least path of resistance too, because I think as organizations are thinking, well, look, I have to clean up all this data when they're thinking about using technologies like copilot or the others, because a lot of data is on like one drive or email. And there's a lot of information out there that says, listen, before you can use this type of AI, you got to clean up. And that's a tall order. So I think maybe this other approach with personas as we just touched on could be.

at least an easier path forward. Yeah, absolutely. Sure. So let's talk a little bit about the AI solutions. Of course, they're both innovative and of course should be ethically responsible. I know we touched on that a little bit as well, but how is your business maintaining strong information governance around like the creation of AI solutions?

Aaron Levie (12:11.101)
Yeah, for sure.

Aaron Levie (12:34.609)
Yeah, so, you know, we have a secret weapon and advantage, which is that we run the company on Box. So, so almost by definition, everything we do, our data is inside of our platform. And so we, we, you know, we have a little bit of a, relatively clean data state as a result of that. We don't have on premises infrastructure. We don't have, you know, sort of SharePoint sites, you know, sprawled around, everything's in Box. We can kind of get our arms around it. We know, we know where everything is. And then we add AI on top of that.

And so our AI governance layer is effectively intermediated by our AI platform. And our AI platform leverages the permissions and controls and access that we have within the Box environment. And so some of the use cases are things like we have a new feature called Box Hubs, where you can create an employee HR onboarding site or a sales rep training site or a

a marketing asset library site. And so these are our hubs that we have that we've built out. They point to content within the Box environment. And then with AI, you can go in and ask questions of all the data. But the important thing is you can only ask questions from the information that you have access to. So it's not sort of doing a general query across all of the information in the enterprise. It's only the content that you're supposed to have access to. We sort of live and die by the level of access controls and the security.

know, system that we've already built out as a company. And the use cases are very vast. So being able to interact with and chat with your documents, being able to summarize information, being able to instantly create meeting notes, all of this kind of functionality is built directly into our platform. And then the next big breakthrough is, and we haven't talked about this too much, which is, you know, once you have a lot of information in enterprise,

you know, by our estimates, and we work with IDC to get this estimate, there's about 90 % of our data is sort of classically unstructured data. you know, this is your financial documents, your marketing assets, your employee HR documents, your contracts, all of that data represents about 90 % of the information in an enterprise. And the challenge is we don't know that much about it. It usually goes and gets stored somewhere. Maybe you see the file names.

Aaron Levie (14:54.469)
Maybe you're tracking a little bit of that data, but in general, you don't have a lot of telemetry. You don't have a lot of insights around that full sort of corpus of information. AI is this crazy breakthrough where all of a sudden I can now know what's inside of my information. I can know what's in my data. And so we've had some very early features and then we'll be releasing them early next year that will let you extract the metadata from those documents. So take a contract and pull out

the renewal dates and the key party names and the clauses in the contract or take an invoice and take the, you know, pull out the shipping information and the amount of the invoice, take a digital asset and pull out all of the people that are seen in the digital, you know, in the, in the digital media. So you can, you can sort of see the potential of once I could have AI understand everything inside of my enterprise, look over everything. I can put that into a structured database and now I can query that database. I can

automate workflows in that database. I can get insights from all of my information. That's the real power of what we're starting to see with AI and content is we can truly unleash the power of your data inside of an organization.

Jim Merrifield (16:09.978)
That's awesome. I know you talked about that a little bit at Boxworks. I got to watch a few of those sessions. It was really enlightening. I can't wait for some more coming out in 2025. I know we've talked about a lot here, Aaron, but is there anything else that you'd like to share with the audience, either around takeaways from Boxworks or what you're looking forward to in 2025 in general?

Aaron Levie (16:34.567)
Yeah, I I'm just unbelievably optimistic about this, the pace of innovation that we're seeing right now. So, you know, in the second week of December, we've seen, second and third weeks of December, we've seen breakthroughs from Google. We've seen breakthroughs from OpenAI. We've seen breakthroughs from Anthropic. You know, Almost every major model provider in AI.

is on a regular basis weekly or at least at a minimum monthly, many cases weekly are sort of continuing to exceed and push the boundaries of their past AI models. And so what this means for all of us is that AI is going to get cheaper, it's going to get faster, it's going to get higher quality. And this is just fantastic news because it means that we're going to get more and more innovation packed into our software. know, When we first saw this sort of wave of AI emerge,

you know, starting right after ChatGPT, we kind of stepped back and we said, if we were to start the company over right now, would we think of AI as this sort of secondary capability? Like we'd have Box and then we'd have the Box version with AI. And we really said, no, actually, like we would probably have AI baked into the core of our technology. And so when you, even when we price our product, we sort of have that mindset, which is we include AI in the core foundation of our platform.

for a significant portion of our customers. And so what's great about this is we can kind of expect this from all of our software. All of our software is just gonna become more intelligent. And that's gonna mean we have incredible breakthroughs in drug discovery, in how sales reps work, in our supply chains, in financial services, being able to better serve clients. So we're gonna have just incredible breakthroughs in terms of how businesses can operate.

It has this gigantic asterisk, hence this conversation. If you don't have your data in the right spot, if it's not organized in the right way, if you don't have the right controls in place, you will not get any of those advantages. So we have to actually set up a data environment that lets us take advantage of AI. But once you've done that, what you do know is that you're just going to get accelerated innovation that's going to continue to blow, I think, our minds going forward.

Jim Merrifield (18:53.178)
100 % 100 % agree. I'm sure we'll have a you know, maybe who knows maybe we'll have a part two down in 2025 about this topic. I'm sure that we can go on for hours.

Aaron Levie (19:01.491)
It will only get more interesting, so.

Jim Merrifield (19:04.57)
It will, it will for sure. Well listen, thank you so much Aaron for taking some time here on the Hot Seat with us. And if you'd like to be a guest on the Hot Seat InfoGov hot seat like Aaron here, all you gotta do is submit your information through our website, infogovhotseat.com. And thank you so much and enjoy the rest of your day. So I'm just gonna hit stop.

Aaron Levie (19:07.411)
Thanks for your time.

Aaron Levie

Founder & CEO, Box

Aaron Levie is Chief Executive Officer, Cofounder at Box, which he launched in 2005 with CFO and cofounder Dylan Smith. He is the visionary behind the Box product and platform strategy, incorporating the best of secure content collaboration with an intuitive user experience suited to the way people work today. Aaron leads the company in its mission to transform the way people and businesses work so they can achieve their greatest ambitions. He has served on the Board of Directors since April 2005.

Aaron attended the University of Southern California from 2003 to 2005 before leaving to found Box.