Bonnie Chase (00:18)
Today on our Vespa Voice episode, we are talking to Cameron Kahn, who is the CEO and president of Pure Insights. Thanks so much for joining today.
Kamran Khan (00:28)
You’re welcome. Thanks for having me.
Bonnie Chase (00:30)
Yeah, so I thought today we could maybe kick things off by you sharing a little bit about yourself and about pure insights and maybe the types of customers you engage with.
Kamran Khan (00:40)
Yeah, sure. Absolutely. Yeah. I’ve been in
enterprise search business for a long time, actually since the late nineties. It was the first time I started working for a search company. And obviously from that time to now, things in search have changed dramatically. And I’ve seen all those changes. I originally started a company called Search Technologies with some colleagues, and that was focused on helping people implement enterprise search.
Enterprise search has always been pretty hard to get right. And so we wanted to apply expertise to help people get the most out of enterprise search. And that company grew to about 200 people in six countries and was eventually acquired by Accenture. And that was a very successful acquisition. And many of the people that I worked with are still there helping Accenture build great solutions. ⁓ After three years at Accenture, I…
decided to join some colleagues and form Pure Insights. And that was driven largely by the fact that AI and search were beginning to come together in a way that was very exciting. And so we wanted to ⁓ create a company that was focused on that area of search. And that’s why we have Pure
today.
Bonnie Chase (01:56)
That’s great. And what a perfect time to have a company like that as well. Now you’ve, you’ve some blogs and you’ve spoken at different conferences about searches foundational role in AI. So as you know, we’ve seen the rise in gen, gen AI over the last few years, are you seeing a shift in how search is being used or in how important it is in modern applications?
Kamran Khan (01:59)
Yes.
Yeah, I mean, as you look at search over the years, originally it was just keyword search, but then, you know, in about 2010, 11, we started to play around with machine learning and vectors, but it was very early to augment search with vectors. But really everything changed when BERT was released by Google. And now we were starting to see real vectors being applied to the search world.
But then things changed even more dramatically after ChatGPT. I mean, you can think about the search world before and after ChatGPT. And as I say to my colleagues, really ChatGPT and Gen.ai has taken search from being what was a niche area before into mainstream. It’s now, you know, if we were a big fish in a small pond at search technologies,
that pond has been overcome by a tsunami. And so now the search world is so much broader. And the thing that I find really interesting with vector search and GenAI is that search underpins pretty much all of GenAI applications, right? If you don’t have good search, then the information that you’re generating from these great tools like ChatGPT and
all the other LLMs out there, it’s not going to be relevant. So search now has gone from being, you know, in the background as a niche technology, front and center and really underpins everything that’s going on with Gen.ai and AI in general.
Bonnie Chase (03:55)
Yeah, absolutely. and as you, had mentioned, enterprise search has been around for decades and organizations are investing a lot more heavily in gen AI. And a lot of that shift is because of chat GPT and making that experience more available. How do you think large enterprises should be approaching that today, you know, approaching search differently today versus pre chat GPT era?
Kamran Khan (04:20)
Yeah, look, know, many companies valued search, especially if, for example, having good search meant that you were generating revenue. So enterprise search has been applied to e-commerce as one example for many years, and people have implemented that. Well, now with these other kinds of search, vector search and gen.ai,
we’re really just increasing the functionality that we can deliver to end users. And so that applies across all of the applications that we’ve worked on before. And what we find is that companies are more excited about building what I would call these AI search applications because the functionality is there now to provide more easily good user experience for their customers, whether it’s an e-commerce company.
or whether it’s a big company, a large multinational who need to get their knowledge workers access to information, it’s so much easier. And we can talk about some use cases that I think will be illustrative of what’s happening in some of those companies.
Bonnie Chase (05:20)
Yeah, yeah, I’d love to dig into those use cases. mean, what are you seeing getting the most traction right now?
Kamran Khan (05:26)
Well, mean, the first thing that happened was that people were adding RAG, retrieval augmented generation, to existing search solutions. So we’ve worked with many publishers over the years. And the reason publishers are very focused on search, especially ones who make their money by selling their content, is because if their users don’t have a good experience,
finding that information, then they’re going to stop paying their subscriptions, right? Simple as that. So they focused on search for a long time. And we have many of those customers that we’ve helped build search solutions for. Well, the natural thing for them to do is to add these other capabilities, add vector search, so that you’ve got now semantic search, and then ultimately adding rag, which enables them to…
add this extra layer of information for users. So we’ve seen people who’ve had great search solutions, add vector search, and now are adding this retrieval augmented generated search. And it provides all three. Now, in our experience, people who jump straight to RAG are really missing the fact that enterprise search, keyword search, and vector search are just as important, and in fact underpin the Gen.AI side of things.
So that’s what we’re seeing is people adding RAG and VectorSearch to existing search applications. I’d say that’s the most prevalent. But we’ve seen some very other interesting use cases. And that is really in the area of using agents. Because our customers will add the RAG capability. But then what they want to do is they want to start using agents to interact with their data to provide even more functionality to their customers.
That’s where we’re seeing the evolution of search to vector to GenAI and ultimately to using agents to start doing things with their data that they couldn’t do before.
Bonnie Chase (07:14)
Yeah, I think that’s so interesting because like you mentioned, mean, traditionally, guess keyword search was the main way of doing things. You’re trying to find that exact match. Then we move on to the vectors, which brings the similarity along with it. And then of course, the retrieval augmented generation. So we can find the information and then augment that with the large language model to generate the best answer possible.
Kamran Khan (07:35)
Yeah.
Bonnie Chase (07:41)
think another interesting part is moving beyond text into kind of the multimodal search. I mean, it’s always held promise, but it kind of seems more achievable now with some of the newer models that are coming out. Are you seeing some projects that are combining different types of data with texts and images and even video or audio?
Kamran Khan (08:04)
Yeah, absolutely. We’re seeing people again, go through this evolution and get their tech search right. But then they want to start looking at images. And so we’re seeing people interested in being able to search directly into images. And I know that Vespa has technology that allows us to do that, which we’re excited to explore. So I think images is going to be a big area that we’re going to focus on. But I can give you an example of a great use case that involves
multimodal search that we’ve worked on and is actually going into production now. So we’re working with a very large European car manufacturer. We’re working with their after sales division. And this company is very large and it’s a well-known company and they generate a billion dollars plus from after sales. That means you take your car into one of their dealers in a country, specific country, and you have a problem with it. The engineers,
will work on the car and typically they would write down what was wrong with the car and try and find the diagnosis. Well, this company has over 200,000 documents and I’ll describe documents in a second, but they have these documents that have problems and solutions that the engineers out in the field can access in order to diagnose a problem faster.
Many of these documents have audio files associated with them. So that could be the sound of an engine ticking. It could be the sound of a door creaking. Anything that might have been recorded out in the field. So what we first did was we built a search solution, a vector search solution that allowed the user to write the, describe the issue with the car and then match that directly with the information.
in the database via text, mostly vectors because we needed it to be in multiple languages. But now what we’ve done is we’ve also added the sound file and we’ve been able to vectorize a sound file and we’ve created a custom model that will match the sound with the sound stored in the database. So that gives you a more accurate match. I I’ve described it as Shazam for cars, but
Bonnie Chase (10:10)
Yeah.
Kamran Khan (10:10)
It’s a really great application and it’s actually being rolled out into production. So we started again, we started with text and then we’ve added this audio capability. And so I think we’re going to see more and more of that where a document is now going to be a multimedia document and we’re going to be able to address searching for similar documents or searching through those documents by using the kind of vector technology that Vespa has.
Bonnie Chase (10:35)
Yeah, I mean, that’s such a fascinating use case because I think typically, at least for me, I have not thought of audio beyond music files or things like that. But actually being able to use that to troubleshoot the issue is like, I feel like that’s a game changer.
Kamran Khan (10:43)
Mm-hmm.
The interesting thing is that this company is contemplating putting microphones in the car to capture sound in real time. I mean, that’s not been done yet, but that’s one of the plans is to capture the sound real time to be able to diagnose problems as they are occurring. So that’s going to be a great use case once we get around to that.
Bonnie Chase (11:08)
Nice.
Kamran Khan (11:12)
And then if you think about a car as a machine, machines make noise and they tend to make noises that are not great when they’re not working well. So we’ve talked to a couple of companies in the heating and cooling industry, air conditioners, where we’re looking to take data from the air conditioning units in real time. And that could be telemetry data.
could be sound or at least have the capability for an engineer to be able to record a sound and be able to apply the same thing there. So I think this area of using audio for maintenance is an exciting one. I mean, it’s not the only use case that you can find with multimedia, but it’s definitely one that can deliver a lot of value to customer service.
Bonnie Chase (11:59)
Yeah, absolutely. Yeah, I think that one’s a really, really cool one. Now I imagine that as companies are, working toward a GNI initiative and they’re deciding on what type of data they want to use and what type of content documents, whether it’s text, audio, whatever that may be. How do they know when they’re ready? What are some, some things that they should be looking at for data readiness?
Kamran Khan (12:20)
Yeah, you know, look, again, I think that if you look at some of these applications that we’re talking about, I mean, not necessarily the audio, but definitely the text, having a good search practice is important, right? Because why is it important? Because when you have a good search solution, you have to have prepared your data properly in order for it to be found, right? So many people think that
You can just take data as it sits, index it, and hey, presto, you get great results. That’s never been the case. There’s a lot of data engineering that has to happen. Some of it automatically. Things like adding metadata, normalizing dates, manipulating the data so that it is relevant for a search index. That same data preparation is important for Gen.ai.
If you don’t have a good search solution, then the large language model isn’t necessarily going to be able to give you the best results. When you are creating vectors, which the large language model rely on, you have to have a really good strategy for chunking, which is effectively creating the embeddings or creating the vectors. All of those things need to be done upfront. depending on the content source, you can have a different strategy for
how you can prepare those documents in order for them to be used by Gen.AI. So the same kinds of things that needed to happen for search still need to happen for Gen.AI and Vector Search. So that’s an important step. A lot of our work with our customers is in preparing the data in order to get the best results from the AI tools that we’re using.
Bonnie Chase (13:54)
Mm-hmm. Yeah. And that makes sense. I even though we’re trying to take advantage of AI, it’s really we’re feeding it the information that it’s using to create those answers. So we can’t shortcut that that part of the process that still has to happen.
Kamran Khan (14:09)
Yeah.
mean, another, another interesting thing is that, you know, a large language model could end up surfacing information that you don’t want to surface, right? So security of the documents is paramount. So you’ve got to make sure that you’ve got a security models in place so that the large language model isn’t suddenly giving out salary information or information that’s proprietary. so having a, a,
Data strategy and the security strategy is paramount before you start to implement Gen.ai.
Bonnie Chase (14:41)
Yeah, absolutely. And I can see that both in building a solution, but also when using an existing tool kind of in your own company, you want to have some sort of governance around how it’s being used, what you’re putting in the system and things like that.
Kamran Khan (14:55)
Yes,
yes, definitely. These things need to be thought about upfront and not on the fly.
Bonnie Chase (15:01)
Yeah. Well, let’s take a different angle, thinking about pitfalls with the Gen. AI adoption. I mean, a lot of companies are jumping into it with a lot of enthusiasm, not always with a solid plan, you know, ahead of time. You know, we talked about not wanting to do it on the fly, but what are some of the biggest mistakes that
that you’ve kind of seen with aligning the GEN.AI efforts and business goals and I guess what are some ways that companies can avoid investing in the wrong use cases?
Kamran Khan (15:29)
Yeah, I think we see this a lot. Since CHAT GPT came out, I would say, I’ve probably heard many times, that the board wants us to do something with Gen.ai. What are we going to do? And I think that’s the wrong approach. There are two main things that I think up hit for. One is that using Gen.ai and AI in general as a solution looking for a problem is not the right way to go about
I think that identifying a use case that can come. Let’s take this different way. Identifying a use case that is applicable for AI and can really help that use case move forward is probably the first thing that people should do. And we tend to do that a lot with our customers. We will sit down and say, what use cases do you think you have that
could benefit from having AI introduced and will help identify the right ones that are going to give them the best results. So doing that is definitely something people need to look at. The second thing is that gen.ai and AI in general, it’s not magic, right? It’s fantastic. It’s amazing. It looks like magic, but…
There’s a lot of hard work that needs to go into building these solutions in order to make sure they’re effective. so realizing that, you know, the results from AI may not be the right results for every use case is important. So I think being realistic about what’s necessary. The other thing I’d say is that it’s quite easy to build a POC with AI tools. I mean, you can knock up a retrieval augmented generation solution quite quickly using
using tools. Going from a POC to production, that’s a whole different matter. There’s a lot of work that needs to go into that production system versus getting a POC. In that case, with the car company, we were able to build a POC quite quickly. However, getting the accuracy right so that it was valuable to add that audio file alongside the text file to get higher accuracy,
That took a lot of tuning. took a lot of work to find the model. It took a lot of work to create a chunking strategy for the documents. So I’d say that’s another area. So get the right use case. Have a good strategy for how you’re going to apply AI to those use cases and don’t underestimate what it’s going to take from getting something from POC into production. But those are the areas that we tend to focus on with the customers that are interested in.
building AI solutions.
Bonnie Chase (17:58)
Yeah, I think that’s great advice. when building rag systems, for example, there are so many different ways you can do it, but having that use case in mind can help with how you’re building it as well as the tuning that’s needed to make sure that it’s working in the right way.
Kamran Khan (18:13)
Yeah, yeah, and that’s what we’re going to be doing with with you and your team, right? you’re the company that creates this great technology. And I think one of the most exciting technologies that are out there right now. And we’re the people who are going to use that technology to help build these great solutions by applying the discipline and the knowledge of how to use these great tools. Like, for example, know, Vespa is great at
creating hybrid search, right? So mixing the results between keyword and vector search. It’s great at using multimedia tools. And so our goal is to take this great technology and apply it to practical use cases to build these solutions that are really going to add value for the customers that are going to use Vespa.
Bonnie Chase (18:54)
Absolutely. Now, looking ahead at, you know, the future of GEN.ai, what are you most excited about? Are there any trends or capabilities or breakthroughs that you’re keeping a close eye on?
Kamran Khan (19:07)
Yeah. Yeah. I would say, look, it’s probably no surprise to people who are going to watch this, but, um, you know, you hear a lot about agentic AI and I think that’s the next phase that people will look at. Right. And I think I’m excited about the possibilities of that because agent take AI and agents can take data and actually, instead of just surfacing it, they can actually perform tasks to help automate.
business processes. So we go beyond search into business processes. I’ll give you an example. One of our customers that we’ve taken through this whole life cycle, who started with search, added vector search, added rag, and now looking at agents. And it’s a publishing company, right? it’s a market research company. And what they do is sell their content to their customers.
They have so much valuable content and we’re going to start looking at using agents to try and surface niche content, right? So if you are, for example, interested in a very niche area in a niche country and you’re looking to do market research, well that information lives in a multitude of documents and so we’re going to start creating agents that can then create new content to
service that customer whose needs aren’t met by the general corpus that they have right now. And so that’s exciting. And I think that I’m optimistic about agents. I’m not one for hype. I’m cautiously optimistic and we’re already exploring agents for various use cases. Yeah, one great one that is a perfect example for agentic use is
a law firm that we’re working with. There are a law firm that does these mass litigations that you might see on an ad on TV, and they have thousands of people who need to apply to see whether they’re eligible for this litigation. The onboarding of those people is very difficult and very manual. And so using agents to interact with the people who are trying to see if they’re eligible is a great way. we’ve already…
put some of that into production and it really does cut down the time. I onboarding for financial services companies is another area that we’re looking at that takes a long time for people to be onboarded to a financial services company because of compliance. We’re using agents to reduce the manual work that’s needed in that onboarding situation. So just a couple of use cases that are exciting and I think are at the very early stage of
using agents alongside search ⁓ is what I think we’re most excited about at the moment.
Bonnie Chase (21:38)
Yeah.
Yeah, no, that really is exciting. I’ve definitely been seeing a lot about the agentic AI and kind of what’s happening there. So it’ll be interesting to see how that evolves over the next two to and how well we can really put it to use.
you’re in such an interesting place, kind of sitting between the technologies and the use case application with these companies. So very exciting time, I’m sure.
Kamran Khan (22:03)
Yeah,
no, it is. think as our former CTO at Search Technologies used to say, people like you make the camera and we make the movies. So, you know, that’s, I really, I always liked that analogy that he came up with. And so, yeah, we’re really excited about using Vespa and the capabilities that you have, not just in the search side of things, but definitely in the multimedia side of things, in the scalability capabilities that Vespa has.
And I think that using Vespa as a platform to do all of those things, search and agents, is something that we’re very excited about doing. Yeah, and we appreciate the partnership that we have with Vespa.
Bonnie Chase (22:44)
Yeah, absolutely. Well, I guess with that, any final use cases you want to share, any final thoughts?
Kamran Khan (22:51)
Let me think about other use cases that might be of interest. Yeah, we’re working with a large US newspaper and a company inside of that newspaper is working on building tools with AI. We’re helping them and the first application we’ve built with them is to help them get their journalists to be more efficient when writing stories and surfacing other…
information that they may well have in their archives in order to help them produce these new stories more effectively. So yeah, just lots of different things like that. And I think that, you know, Mike, what I would say is that if you’re going to build a gen AI solution or an AI solution, then it does need some planning. It’s great to build these POCs, but getting those POCs to production requires some.
discipline and some planning if you’re really going to get the most out of these AI tools. And that’s where we can help.
Bonnie Chase (23:45)
Absolutely. Well, I really appreciate you bringing your experience and expertise to this discussion. It sounds like you’ve got a lot of really cool use cases across various industries that you’re working on. So thanks so much for sharing.
Kamran Khan (23:57)
Yeah, no, thank you. I really appreciate you inviting us to this podcast.
Bonnie Chase (24:01)
great day.
Kamran Khan (24:02)
Okay, thank you.