Christina Cardoza: Hello and welcome to the IoT Chat, where we explore the latest developments in the Internet of Things. I’m your host, Christina Cardoza, Editorial Director of insight.tech, and today we’re going to be talking about generative AI trends and opportunities with Waleed Kadous from Anyscale, Teresa Tung from Accenture, and Ramtin Davanlou, also from Accenture. But before we get started, let’s get to know our guests. Teresa, I’ll start with you. What can you tell us about yourself and Accenture?
Teresa Tung: I’m our Cloud First Chief Technologist, where I have the best job because I get to predict the next generation of impactful technologies. In this role I’ve gotten to work on big data, edge computing, AI, and lately you can bet it’s generative AI almost all the time.
Christina Cardoza: Great. That’s not an easy job you have there—having to predict what’s coming next or stay on top of the trends. But excited to get a little bit in to this conversation with you.
Before we go there, Ramtin, you’re also from Accenture. But what do you do there, and what can you tell us about your role at the company?
Ramtin Davanlou: Sure. Thanks a lot for having me, Christina. I’m a Director from Accenture Data and AI Group, based out of San Francisco, and I also serve as the CTO in the Accenture and Intel partnership, where we have a big focus on scaling GenAI applications and solutions with Intel hardware and software.
Christina Cardoza: Yeah, absolutely. I know Intel, the hardware and the software—they’re powering a lot of the use cases and the opportunities that we’re going to get into in just a moment.
But before we get there, Waleed, last but not least, what can you tell us about yourself and Anyscale?
Waleed Kadous: Yeah, so my name’s Waleed Kadous. I’m the Chief Scientist at Anyscale. Anyscale is a company that makes it easy for companies to scale up their machine learning infrastructure, with a special focus on scaling up generative AI—both the training, fine tuning, and inference stages. And we help our customers to deploy applications that run at huge scale.
We’re the company behind an open source project called Ray that’s very popular for doing distributed computing. It’s used by OpenAI and Cohere to train their models. So, yeah, it’s a really exciting job I have helping customers to work out how to deploy really cutting edge technologies, including IoT.
Christina Cardoza: Yeah, absolutely. And I can imagine open source—that behind a lot of the use cases it’s helping developers and businesses get started, and just the community out there to work together and build upon some of the successes that we’re seeing in that community.
So, like Teresa mentioned, generative AI—it seems to be the next big thing. We keep hearing about it; at all the latest conferences we’re talking about generative AI. A lot of it is around ChatGPT, so we’re not exactly sure—it’s a new thing, not exactly sure what this is. So, Ramtin, I’d love to start there, if you can give us a little explanation about, when we’re talking about generative AI and especially the opportunities out there for businesses, what do we—what is this idea of generative AI, and what are really the opportunities businesses are looking for?
Ramtin Davanlou: Yeah, yeah, sure. I think everybody has used generative AI or knows about it by now—unless they’ve been living off grid in the past year and this is the first content that they watch after they come back. But I think I can just put it in terms of its potentials and a summary of what is this basic entity that we have created. And it’s like—this is especially true when, over Thanksgiving dinner, all the conversation is centered around generative AI. So that’s really, really popular, and everybody is trying to make sense of it.
And so what a lot of companies like OpenAI, Google, AWS, and many more were able to do was they use their massive compute resources and massive data sets—a majority of the texts that you can find on the internet or on specific subjects—and they train these AI models—aka large language models—capable of generating new content. And this content is in many different forms. It is text, images, video, voice, or even code, computer code. And text is especially important because that’s what all humans basically do in communication, right?
So it boils down to text, and many of these AI models are able to generate responses that are really good on any given topic—better than an average person or even an average expert on that topic, right? And that creates a lot of new opportunities. Companies can now take these models, fine tune them a little bit so that the model behaves in certain ways and gains knowledge more about a specific context. And if you think about most of what white collar labor is doing every day at work—the conversation that we are having right now, it is basically text that we are generating, right? And that is the means of communication and building net new knowledge.
So what these LLMs cannot do now, but may soon be able to do, is creating that net new knowledge. But they can do everything that has been created, and that poses an opportunity for us to just focus on that net new value that we can generate. So the current workforce will focus on creating net new things.
And companies are thinking about how to use this entity to kind of enhance or substitute a lot of things that we are doing these days—sending emails, creating slides, right? All of the content that you’re creating to be able to communicate with each other. So this has huge implications for service industries and for manufacturing when you combine it with robotics. And at Accenture we are helping a lot of our clients to reinvent themselves using this transformative technology.
Christina Cardoza: Yeah. You mentioned the manufacturing spaces and other industries: it’s exciting when you have a capability like this that can really help transform all industries, and everybody’s using it in different ways. And I love how at the beginning of your response you talked about how it was dominating the Thanksgiving conversations much better than politics. But I’m sure—you know what? You can find, having these everyday conversations about this, is there comes a lot of misconceptions. A lot of interest comes with a lot of misconceptions.
So, Waleed, since you’re working in the open source world, you’re working with a lot of developers and people who are looking to develop, deploy, and manage these types of applications. Curious what you think. What are some of the considerations out there that businesses should be thinking about when developing these solutions? What are some of the misconceptions you’re seeing out there?
Waleed Kadous: Yeah. So in terms of the things that businesses should consider, I think one of them is to consider the quality of the output from these models. And in particular there’s a problem with LLMs called hallucination, and that’s where they may—they confidently assert things that are completely untrue. Now, over the last six months we’ve seen developments in an area called retrieval augmented generation which helps to minimize that. And we can talk a little bit about that, but making sure that the quality of the responses is accurate is one of the key considerations.
A second consideration is data cleanliness and making sure what is the information that these LLMs have access to? What are they disclosing, and what do they have the power to inform people of? Is there leakage between different users? Can someone reverse engineer the data that was used to train these models from the data themselves? It’s still a new frontier, and we’re still seeing issues that crop up in that front.
And then the final one is LLMs are expensive. I mean, really expensive. If you naively go and use GPT-4, which is probably the leading LLM out there today, you can easily blow a hundred thousand dollars in a month. So you really have to think about the cost and how you keep the cost under control.
And finally, there’s the final thing I would say is evaluation. How do you make sure that the system is actually producing high-quality results? What base data are you using to ensure that it’s doing the right thing? And that’s an area—I mean, the other misconception that we sometimes see people have is there’s this mechanism called fine tuning: let’s just fine tune and solve the problem. There’s a lot more nuance to solving problems with LLMs than we see, but there’s also a lot of potential applications.
And if I try to think about the applications, often when we talk to our users, they’re like, “Well, I can’t conceptualize what I need in terms of using an LLM.” And so we’ve worked out: these are the top things to look for in terms of use cases. The first one is summarization. Are there areas where you have a lot of information that you can condense, and where condensing it is useful? So we have a company called Merlin that provides a Chrome plugin that summarizes data.
The second is the retrieval-augmented-generation family. And that’s situations where you don’t just naively ask the LLM for questions; you actually provide it with context that helps answer the questions. So, for example, if it’s a customer support application, you wouldn’t just ask it about the customer support. You have an existing knowledge base of answers to questions that you pull the data from, and then say, “By the way, Mr. LLM or Mrs. LLM, here are the data points that you can use to help answer this question.”
I think one of the most interesting applications was what you might call “talk to the system” applications—especially interesting in IoT. So, imagine this as kind of a dashboard on steroids, a dashboard you can talk to. And I’ve seen a company that does expertly, that does Wi-Fi installations across companies for retailers, and what you can do is you can ask it questions, like, “Hey, which areas are we seeing the routers working too hard in?” And it will go, and it will query that information in real time and give you an update. And I really think that model is kind of the most interesting one for IoT.
And then the final one is really this in-context development and in-context application development. Perhaps the best known one of those is Copilot, right? When you’re writing your code, as Ramtin was talking about, it would give you suggestions about how to write even better, higher-quality code. And we’ve seen some of our companies deploying that. And, roughly, that order is the order of difficulty. In-context applications are the most difficult, but they’re also the highest potential.
And one thing I’d especially like to highlight is the idea that I know a lot of people are not quite sure: is this hype, or, is this not real? The really interesting thing to me is already we’re seeing massive proof points of the effectiveness of generative AI. We’ve seen GitHub do studies that show that it boosts the developer productivity by 55%. We’ve seen research about a Fortune 500 company that has shown that it can boost customer support quality and responsiveness by 14%. It’s just like, this is not kind of a hype thing. There’s already, even at this early stage, some incredible proof points that this works. But we’re still at the genesis of this technology, and we’re still collectively all learning how to use it.
Christina Cardoza: Yeah, absolutely agree. There’s always that concern: is the hype reality? Am I going to invest X amount of dollars in this and then tomorrow there’s going to be a new thing? So businesses are always making sure—are looking to future proof the investments that they are making. But, like you said, we have seen some proofs of concept, some success in the early days of experimentation. But, like you mentioned, cost is a big factor, and how can we go from those proofs of concept, experimentation solutions, and actually bring these solutions to life at production and at scale?
So, Teresa, I know Accenture does a lot of work with your end users trying to help them get to the next level of some of their efforts. So what can you tell us about where your customers are and what the next level is?
Teresa Tung: I would say, as an industry, most companies have started their proofs of concept, and many are starting with managed models like OpenAI. And these amazing general-purpose models address many use cases and offer a really great way to get started. But, as Waleed mentioned, cost in the long term is a factor, and this could be an order of magnitude bigger than many companies might be willing to pay. So, as generative AI pilots mature into production, companies now need to look at rightsizing that cost, rightsizing it for the performance, and even factoring in their sustainability goals.
When these models become more critical to the business, we’re also seeing companies want to take ownership and to control their own destiny, right? Rather than using a managed model, they might want to be able to take and create their own task-specific, enterprise-specific model. So these sub–10 billion parameter models are just more customized for them. And so what we’re seeing is companies beginning to adopt a number of models for different needs. So, yes, there will still be the general-purpose model available, but we’ll also have these fit-for-purpose models as well.
Waleed Kadous: So, to give a concrete example of what Teresa’s talking about: one of the experiments we did at Anyscale is we looked at translating natural language to SQL queries. And the general-purpose model, GPT-4, was able to produce an accuracy of about 86%, 80%. But by training a small, specific model that was only 7 billion parameters, that cost about one 100th the cost, we were able to achieve 86% accuracy in conversion. So this challenging mode of what are now being called SMSs versus LLMs—small specific models versus large language models—is kind of the evolving discussion that’s happening in the industry right now.
Christina Cardoza: Great. And we’ve been talking about these industries and these use cases at a high level, but I would love to dig in a little bit more and learn exactly how customers are using this—in what industries, where really are they finding that the biggest opportunities are, or the biggest areas that they want to start applying these generative AI solutions. So, Teresa, I’m wondering if you have any examples or use cases that you can share with us?
Teresa Tung: Yeah, I have a few. And I think Waleed had already done a great overview of some. I’m going to give a different perspective; I’m going to think about it in terms of things you can buy, things you’re going to boost, and things you’re going to build, right? So buying, being able to buy these generative-AI-powered applications for things like software development, marketing, some of these enterprise applications—that’s quickly becoming the new normal. So these applications use a model trained on these third-party data, and it gives everyone a jumpstart. And that’s the point—everyone is going to be able to capture these efficiencies, so don’t get left behind.
Boost is a second category, and this is where things like knowledge management or being able to apply a company’s first-party data—so, data about your products, your customers, your processes. And to do that you’re going to need to get your data foundation in order. And so using something like retrieval augmented generation is a great way to start, right? You don’t have to get a whole lot of data, and as you go along you can be able to create that data foundation.
And then, finally, in terms of build, we’re talking about companies being able to even maintain their own custom models. So, likely starting with the pre-trained open model and adding their own data to it. And this, again, gives them a lot more control and a lot more customization within the model.
Christina Cardoza: Great. And as we’re talking about building and boosting, deploying, managing these applications, and being able to work with large-scale models, it just comes to me that it’s—there’s a lot that goes into building these applications, and at insight.tech we’re always talking about “better together.” It’s usually not one company that’s doing this alone; it’s really an ecosystem or partnership that’s making this happen.
And, Ramtin, you talked about Intel hardware and software at the beginning of the conversation. I should mention, the “IoT Chat” and insight.tech, we are owned and sponsored by Intel, but of course it creates some of these great partnerships with companies like Anyscale and Accenture. So, I’m just curious, in terms of the hardware and the software, and the build and boost, how important it is to work with companies like Intel or any other partners that you have out there, Ramtin?
Ramtin Davanlou: Yeah, yeah, of course. I think partnerships are essentially very important in this area. Because if you are trying to build an end-to-end GenAI application and a developer ecosystem, you’re going to need a few of these suppliers, technology suppliers, to come together, right? And companies typically have to solve for a few things. This includes infrastructure and compute resources. You need a very efficient ML Ops, basically, tool to help you kind of manage this—everything you do, from development to managing and monitoring and deploying the models in production, right? And you also need a third-party software in a lot of cases, or open source software. A lot of clients are thinking about building larger platforms that could support several different use cases.
And this is an effort, like what Teresa mentioned and Waleed mentioned as well, to reduce the cost of this when you do this at scale. So instead of just using or building new platforms for every new use case, companies realize that they’re going to need this for many, many different use cases. So why not build a platform that you can reuse for all of these different cases? And this helps basically with total cost of ownership at the end of the day. And that means bringing several different technology pieces together, right?
For example, what we have built with Intel is a generative-AI playground, where we have used Intel Cloud, Intel Developer Cloud, and Gaudi tools—which is an AI accelerator specifically built for deep-learning applications, both training and—. So you can basically use GaudiTools in Intel Developer Cloud to fine-tune your models. But once you want to deploy that in scale you can go and use AWS, right? And that’s what we have done. In this playground you can basically bring in—do the development and fine-tuning of your models in IDC—Intel Developer Cloud—and then deploy at scale on AWS.
And we’ve used some of the Intel software like Converge, which is the ML Ops tool that you can use to make your data scientists and engineers collaborate in the same environment. And one of the big advantages of Converge is that it also allows you to use different compute resources across different cloud. So you can use compute resources in your on-prem environment, on Intel Developer Cloud, and on AWS all in the same workflow, right? Which is a huge advantage.
So at the time of deployment we realized that we need a tool that—or library that—helps us distribute the workloads. So if you’re getting 1,000 inferences per second, and then this has fluctuating demand—it goes up and down based on what’s going on, the time of day, and stuff like that—so you need to have a very efficient way to distribute the workloads. And we learned that this library called TGI from Hugging Face is very helpful. And that’s when you see there’s a lot of these different components and pieces that need to come together so that you can have an end-to-end GenAI application.
Christina Cardoza: Absolutely. It all goes back to the future proofing and protecting your investments, making sure that the hype is reality. I love, like you mentioned, being able to reuse some of the models or the applications and solutions you have out there—that’s always important.
So, Waleed, given that you are in the open source community and you are working with a lot of customers to make sort of these partnerships happen, I’m curious what you’re seeing. The use cases that you’re seeing, and how partners like Intel or any other partners you’re working with make those use cases a reality.
Waleed Kadous: Yeah, we’re definitely seeing a lot of interest in the different stages—the training, the functioning, and inference stages. So I think the points that Ramtin is making are valid, but one particular thing that has come up is this idea of open source. So there’s both open source models—so, for example, Meta has released a model called Llama 2 that we’ve seen very, very good results with. It’s maybe not quite on par with GPT-4, but it’s definitely close to GPT-3.5, the model one notch down.
And so there’s both open models and of course open source software. So TGI, for example, is not quite open source. It’s got a bit of a weird license, but there are systems like VLLM out of Berkeley, which is a really high-performance deployment system, as well as Ray LLMs. VLLM manages a single machine; Ray LLM gives you that kind of scalability across multiple machines, to deal with spikes and auto-scaling and so on.
We’re seeing a flourishing of the open source world in particular, because there are certain things that people like. Not everybody likes entrusting their entire data to one or two large companies, and vendor lock-in is a real concern. So we’re seeing people flock to open source solutions for cost reasons—they are cheaper—but mainly for flexibility reasons in terms of: I can deploy this in my data center, or I can deploy this in my own AWS Cloud and nobody has access to it except me.
And that flexibility of deployment and that availability of models at every size—from 180 billion down to 7 billion and below—these are the reasons that people are tending towards open source. And we’ve seen many of our customers—we did a profile of what it would take to build an email summarization engine, where if you used something like GPT-4 it would cost $36,000, and if you used open source technologies it would be closer to $1,000, for example.
And what that shows is not just that—it’s the other question, is are you really comfortable sending all of the emails that are in your company to any third party, whether it’s OpenAI or someone else. And so we’ve seen a lot of interest from all levels, from startups that tend to be more cost focused, to enterprises that tend to be more privacy- and data-control focused in open source models.
It’s not that open source models are perfect and open source technologies are perfect, it’s just that they’re flexible and you become part of a community. I think when Teresa was talking earlier about the—or you were talking, Christina, earlier—about this idea of building things together, that’s really the ethos behind the open source movement that we’ve seen, and we’ve really seen a lot of dynamism in that area, and every week there’s new models. It’s just a really, really dynamic space right now. And maybe open source models lag a little bit, but they’re continually improving at a very, very fast rate.
And if you look at the history of things like Linux and so on, you see that same pattern, that sometimes they lag a little bit, but just the breadth of applications that they end up being part of becomes the reason that people flock to these open source models. Just the fact that they’re part of a community that exists, that’s also one of the reasons that there are places like Hugging Face that are incredibly popular in the community as locuses of this open source movement.
Christina Cardoza: Absolutely. And I think, being in this open source space, it’s also when we talk about hype versus reality. There’s always the hype of what businesses want or what they think these solutions can do, and the reality of what they actually can do—not only if generative AI is hype. So I think being in the open source space you’re in an interesting area, where you can see the limitations. We’re still need to figure out exactly how to work with LLMs in all different types of use cases and scenarios and industries.
So I’m curious, Waleed, how do you think this space will mature? What do you think really needs to happen within the community—open source or outside of open source—to really make generative AI more mainstream?
Waleed Kadous: What we also see is open source models becoming easier to use. So, for example, Anyscale now offers a hosted version of many of these models that’s both price effective but also is identical to the OpenAI API. So you can literally just change a few variables and use it.
And that effort to make LLMs easier to use is one of the increasing trends. I think what will also happen is the continual looking of the life cycle. So really what we haven’t worked out so far is how to make LLMs better over time. If an LLM makes a mistake, how do you correct it? That sounds like such a simple question, but the answer is actually nuanced. And so what we’re seeing is a massive improvement in the evaluation and monitoring stages that companies like LangChain and LangFusion are really taking the lead on.
And then, finally, so far the focus has been on Large Language Models—text in, text out—but as Ramtin pointed out, we’re starting to see the evolution of both open source and closed, multimodal models—things that can process images or output images as well as text. And just as there is Llama for text, there’s now LLaVA for video and vision-based processing. And so we’re going to see some multimodal applications start to come up in the coming years.
I would say though, that I still think much of the focus would be on Large Language Models. Every business in the world uses language, uses words, as Ramtin pointed out. Not everybody, not every business in the world really needs to process images. Lawyers probably don’t need to process images all that often, right? So I think much of the growth will be in the single mode, text-based mode as things move on.
Christina Cardoza: Well, I’m excited to see where else this space is going to go. Like we’ve talked about in this conversation, there are a lot of opportunities, a lot of places to get started, but there’s obviously a lot of work still to be done and a lot we can still look forward to.
So, we are running a bit out of time, but before we go I would love to throw it back to each one of you—any final thoughts or takeaways? So, Teresa, I’ll start with you. Is there anything else you wanted to add, or anything looking towards the future you’re excited that we’re going to get to?
Teresa Tung: I think we still just need to remember that AI is about the data, and so being able to use this new class of generative AI for your business and to differentiate. One, hopefully the takeaway is that you could realize how easy it is to begin owning your own model. But it starts with that investment—getting your data foundation ready. And the good news is you could also use generative AI to help get that data supply chain going. So it is a win-win.
Christina Cardoza: Yeah. I love that final thought, that AI is about the data. There’s always such a—when something like this happens and there’s a big trend in a space, there’s always that instinct to jump on it. But you have to actually be solving a problem or resolving something that the businesses need. Don’t just jump on it just to jump on it. What is the data telling you, and how can you use these solutions to help tell that story through your data? So that’s great to hear.
And, Ramtin, anything you would like to add?
Ramtin Davanlou: Yeah. I think regulatory and ethical compliance and dealing with hallucinations, like Waleed mentioned, and other topics under what we call responsible AI are the biggest challenges for companies to overcome. And also navigating the cultural change that’s needed to use GenAI at scale is really key to success.
Christina Cardoza: Yeah. And I think it’s ethical AI and responsible AI—it’s not only an issue for businesses, but also consumers or customers. They can be skeptical of these solutions sometimes, so building these with responsible AI and ethical AI in mind, that’s definitely important.
Waleed, last but not least—again, is there anything else you wanted to add to round out this conversation?
Waleed Kadous: I think it’s important to get started now, and it doesn’t have to be complicated. The tools for prototyping are becoming easier and easier. OpenAI, for example, recently released GPTs, that make it very easy to build a retrieval augmented generation system with a clean UI.
There will be these stages, but think about it as a staged process. Let’s build a prototype, make sure that users like it, even if it costs a lot of money. We’ll build that on top of GPT-4 turbo just to prove that there’s value there. And then come at the cost—and to some extent the quality issues—as a secondary issue, as the usage of those particular tools come up. So it’s now becoming much easier to prototype.
And one other thing is to not just think about this in terms of central enterprise organizations, but how can you empower employees to build their own LLMs? And for that initiative around LLMs to come not from some central machine learning group or something like that, but to give people tools to optimize their own workflows and to improve themselves.
And I think that’s really one of the most exciting trends. Rather than seeing this as a substitute technology, to see it as an augmentative technology that helps people do their jobs better. And I really like that mode of us focusing in AI on that, and empowering people to use LLMs in a way that makes them feel empowered rather than eliminated.
Christina Cardoza: Yeah, absolutely. I think my biggest takeaway from this conversation is generative AI is here to stay, and it’s only going to get bigger and help us do better things. So, like you said, Waleed, get started now, and if you’re skeptical or you don’t know where to start, or you don’t know how to take your small wins to the next level, we have great partners like the ones on this podcast: Anyscale or Accenture.
I encourage all of our listeners to visit their websites to stay up to date in this space, and to see how they can help you take your solutions and efforts to the next level, as well as visit insight.tech as we continue to cover these partners in this space. So I want to thank you guys all again for joining the conversation and for the insightful, informative conversation. And thanks to our listeners for tuning in. Until next time, this has been the IoT Chat.
The preceding transcript is provided to ensure accessibility and is intended to accurately capture an informal conversation. The transcript may contain improper uses of trademarked terms and as such should not be used for any other purposes. For more information, please see the Intel® trademark information.
This transcript was edited by Erin Noble, copy editor.