Episode 6 of The Applied AI Podcast
Jacob Andra interviews Dr. Alexandra Pasi about AI beyond LLMs.
About the episode
Dr. Alexandra Pasi (Lucidity Sciences) joins Talbot West CEO Jacob Andra to explore why conflating AI with large language models creates blind spots in enterprise technology strategy. With 15 years in machine learning, Dr. Pasi brings mathematical rigor to practical AI deployment.
Discussion topics
Jacob identifies the linguistic synecdoche in AI discourse: taking LLM characteristics such as hallucination and incorrectly applying them to all AI. Dr. Pasi expands on this, explaining that LLMs are just one application of AI to language data. The broader landscape includes supervised learning, computer vision, anomaly detection, and time series forecasting that operate on different principles.
When Jacob presents real-world scenarios, Dr. Pasi demonstrates technology selection. For supply chain optimization, she recommends supervised structured learning over LLMs. These problems need historical data analysis and forecasting under new conditions. LLMs lack organizational context and carry irrelevant noise. For structured data in spreadsheets or databases, specialized models outperform language models.
The generalizability problem
Dr. Pasi reveals why machine learning often fails: models excel on training data but collapse in production. Auto ML combines multiple models for good initial fit but poor generalization. Her company's AF1 technology addresses this through new mathematical frameworks that find non-linear patterns traditional algorithms miss.
Three implementations demonstrate this approach. In clinical care, AF1 predicts ICU pressure injuries better than 80 Auto ML models combined. Financial trading applications find actual market dynamics rather than historical coincidences. Particle physics implementations detect rare events without losing signal in noise.
Digital transformation insights
Organizations miss opportunities by automating tasks without questioning why they exist. Dr. Pasi explains how companies created siloed roles that now reveal workflow gaps when automated. The real value comes from reorganizing information flow, not just automating existing processes.
For problems without historical data, she describes using directed acyclic graphs to map causality, then generating synthetic data with controlled variations. This enables simulation and optimization without costly real-world experiments.
Practical implementation guidance
Both experts emphasize starting with business problems, not technology. Many challenges need basic algebra, not complex AI. Dr. Pasi advocates explicit modeling for understood problems, adding machine learning only where external variables create uncertainty.
She addresses risk concerns, noting hallucination affects only certain AI types, not supervised learning on structured data. Warning against compute-heavy solutions with surprise cloud bills, she recommends lightweight alternatives that maintain accuracy while enabling edge deployment on mobile devices and wearables.
Success requires identifying where AI impacts the P&L. The best executed project means nothing without clear financial outcomes or defined steps in a roadmap showing ROI.
Episode transcript
Jacob Andra:
I'm here with Dr. Alexandra Pasi, who is the founder of Lucidity Sciences. Uh, Dr. Pasi, why don't you go ahead and introduce yourself for our audience.
Alexandra Pasi:
Hi, uh, yeah, I'm Dr. Alexandra Pasi, but you can just call me Lexi. Um, I am a mathematician and magician by training. Um, but I've been in the machine learning AI data space for a little over a decade now, so about 15 years. Um, and I. You know, done a lot on the research side, but also working in industry. So, worked in some corporate finance settings, worked in some very dynamic like data product startups, um, and then of course have this sort of theoretical understanding of what the technology does. So I'm, I'm very interested in sort of bridging from that foundational understanding all the way down to like the really nuts and bolts of implementation and how it affects businesses and how you sort of navigate the risk and reward dynamics within that.
Jacob Andra:
Absolutely. And in the overall AI machine learning technology ecosystem, it takes all parts. So there are the people doing the actual theoretical research, the people building foundation models. And then, you know, I'm sort of on the, uh, kind of other side of that, you know, literally just applying this stuff to solve business problems and processes. And that's what my company's all about. So. great that we can have this conversation'cause we come at it from two different approaches, but we're both very interested in the same things, which is, you know, uh, deploying, uh, these capabilities, creating and deploying them to solve real world problems and create more opportunity. And more capability.
Alexandra Pasi:
Yeah, exactly.
Jacob Andra:
Yeah. So. Why don't we start with Disambiguating? A couple of concepts. You hear people talk about AI as though, uh, AI is equivalent to large language models because those have sucked up all the oxygen in the room. And so you'll hear people make statements such as. hallucinates or, um, you know, we're getting closer to a GI as though like, and, and they're saying these things, you know, referring to large language models. And of course this is a sort of large, logical fallacy or, um, kind of misappropriation known as synecdoche. That's the. Kind of linguistic term for it, where you take a subset of something and then you take some quality or characteristic of that subset and you refer to the entire set, you know, with that, uh, subset. And so when you say AI hallucinates, that's obviously not accurate to all types of ai. It would be the equivalent of saying, well, I have asthma and I can't blow very well, so I can't play musical instruments. And it's like, well, obviously there are lots of musical instruments that you don't have to blow on, you know, or into to make music. Right.
Alexandra Pasi:
Yeah, it, it's funny, I've never heard somebody refer to this as synecdoche before, but you're exactly right. Uh, that's a really great way of expressing it. And I think that what I've always said,'cause I've, like I said, I've been around the block and AI and machine learning since most people that are in the space right now, right. So there's obviously some people who've been in it much longer than me. But I have this kind of like hipster cred of, I was in it before, it was like cool in this wave, right? Um, and one of the things that I have found really interesting is how the meaning of AI and machine learning has really drifted through time. So like, looking way back, and I'm sure we'll get into some of these different systems and approaches, but it used to be that like AI was this, you know, larger, grander, theoretical. Sort of thing and like maybe meant a little bit more like expert systems or sort of explicit logic around, you know, how to play chess. Uh, and then like machine learning was a more probabilistic approach and that's kind of fallen by the wayside. You know, it's it, and then it kind of meant AI is neural networks and ML is everything else, but now it's almost become that AI is. Large language models, like you said, and uh, the large language models are interesting because they're not even really AI or machine learning. They are an application of AI and machine learning to an enormous corpus of language based data with a bunch of engineering wrapped around it, which is still a really cool engineering feat. But I think that people would be surprised to know sort of how deep the layers go, uh, where what you're interacting with, when you're interacting with like a use case specific LLM based agent is actually the very tip of an iceberg that goes probably seven or more layers down from there.
Jacob Andra:
Yeah. And it sounds like you're saying that AI and machine learning are the, it's the overall discipline brought to bear to create. This experience people are having, interacting with a large language model that's built, again, like you say, on this large multi-layered, um, conglomeration of, you know, stacked technologies and disciplines.
Alexandra Pasi:
Yeah, exactly. And there's a lot of, you know, I think that. In some ways, AI has become more of a marketing term than a technical term, which is fine, but it's really ambiguous as a category, right? And so I think in a lot of cases you're better served looking at the problem first. Then having a conversation about like, okay, what's the shape of this problem? Based on the shape of that problem, what tools are gonna be the best fit? Um, and you can also work backwards, right? So you can kind of, you know, do some exploratory work. Like I have this incredible technology, where does this connect with some of my, um, some of my opportunities? But I, I think that we are ill-served to just go into the world and be like, I am going to implement an AI strategy that's kind of a meaningless. Notion, um, you gotta start with like, I'm gonna, you know, try to make a change and an impact for my organization of this type. And I know that all of this incredible technology is available for me to be able to do that. Or I have a really deep understanding of what technology is available and I'm gonna kind of look out into the world and see what opportunities that opens up.
Jacob Andra:
Absolutely. Uh, you're speaking my language because with Talbot West, you know, obviously as you know, we. Help companies adopt these types of technologies in a way that makes sense for them. And we always start with. What's going on in your company and what are the actual needs? What's gonna make a difference? And that might not even be ai. It might be some type of automation that, um, doesn't rely on AI at all. It might be some very deterministic, simplistic algorithm. It might be some type of machine learning. Um, so it's like, oh, you need better visibility into your supply chain. Able to predict, you know, when your supply chain is gonna break down or is at risk Well. That might be some type of a predictive algorithm that involves some type of machine learning, trained on past data. That's not going to be a large language model. need to generate marketing materials faster with a minimum of human oversight. That's probably going to be a large language model. So what do you need first? And then scoping that to what the capabilities are that are out there that fit that need.
Alexandra Pasi:
Yeah, exactly. And I, I think, you know, probably a lot of businesses out there would be either delighted or horrified to find out how many millions of dollars they could save with just a beginning algebra equation, right? So it's not always gonna be that, but there's often a lot of cases where like, you know, the right answer is. You know, here's this kind of simple Algebra one problem. You just have to look at it in the right way. So I'm in the machine learning space. Our technology that we've developed answers some really, really difficult questions within that space that are otherwise kind of difficult to approach. But if there is a better solution with like a simple equation, um, I have nothing philosophical against that. In fact, I would recommend that kind of approach every time. But I think usually it's gonna be in concert with. A lot of different technologies and that's where this question of like architecting your strategy, um, both on the technology and the business side really comes into play.
Jacob Andra:
A hundred percent like huge portion of the work is just in the correctly scoping and defining the problem. And then, you know, an overall strategy that ties a lot of moving parts together.'cause these. Businesses are, or any organization, it's a very complex system with a lot going on, and there are a lot of un unintended consequences if you don't properly scope all the dependencies, precursors, and adjacencies, rather than just come in with this, uh, hey, I have a technology that it's so awesome because, um, I, I know it, I know it's so awesome because I'm the one who's, you know, selling it. And so therefore it's the answer to every business's problems. And, you know, that's not, that's not the right approach.
Alexandra Pasi:
Exactly. Well, and I think that like people learn certain patterns of like a tool that they're comfortable with. So I saw a lot of this with, and nothing whatsoever about like. AI chat bots for internal knowledge basis. Like there's obviously place and utility for that, but I, I feel like just because a lot of people have like learned and picked up the tools to do that, that suddenly became like the big play. And I think what you have to do before you invest a lot of time into building something like that out is step back and say, you know, what, is this going to change for my business? Let's say I knock this AI project outta the park. Where does that show up in my PML? Um, and then, does that make sense then for me to invest in this? Because you can have the most perfectly executed AI project in the world, and if that project wasn't tied to like a lever that you could pull that would, you know, change something you wanted to see in your financials, you're gonna be disappointed with the ROI.
Jacob Andra:
Yeah, I agree with that. You definitely need to need to tie these to specific business outcomes. In some cases, it might not be directly traceable to like your p and l. Let's say you're, you're building a long-term roadmap toward greater organizational intelligence when your whole organization is gonna be, uh, much more efficient, and you can clearly articulate what that future, future state looks like, and you can architect out some steps on the roadmap there. Not every one of those steps might directly move the needle, but some of them might be important precursors to other steps that will move the needle. Uh, so if you're doing it part as part of an overall strategy where you've clearly shown how that strategy is gonna deliver ROI, then absolutely. But not necessarily every single step is gonna directly tie to p and l, uh, on its own. And that's actually a lot of the problem we see is people are looking at individual steps or solutions in isolation from the bigger picture.
Alexandra Pasi:
So that is such a good point because I think that what you're really talking about is this, I mean this question of like digital transformation versus, you know, these task level initiatives. And one of, so this is just kind of my personal observation from watching this space, but one of the things that I think I've seen happen. Say Macroeconomically is that, you know, during the zero interest rate period, there was a lot of like padding of teams with like a lot of technical, um, people. And in order to justify that, I think a lot of organizations created increasingly siloed roles. Um, and this wasn't just in tech. Certainly tech experienced a lot of this, but within those sort of siloed roles, the tasks to do became a lot more well-defined. It's really easy in an era where businesses are now trying to become more efficient to just say, oh, well let's automate away all those tasks. But I think the more powerful thing to ask is like, why did these tasks exist in the first place and was there a better way to organize sort of information and work across our organization? Um, and, and. I think that a lot of businesses as they try to automate these tasks with AI are going to find that there's actually a lot of gaps that things are falling through, and that's going to open their eyes to. What it would look like for the business to be shaped a totally different way. And then that transformation is what's going to create the real impact. And everybody will be like, oh, AI automated a bunch of stuff, and, and, and like, sure. But the real transformation is not going to be the automation of tasks, it's going to be the transformation that is forced by trying to automate all of those tasks from the inefficient business model and seeing what falls through the gaps and trying to adjust and correct for that.
Jacob Andra:
Oh yeah, that's really well said. And, and I've talked about the same thing. I don't think I've ever said it quite as, uh, nicely as you just did, but that's, uh, such a valuable and important truth for business executives to grasp.
Alexandra Pasi:
Yeah, I, I think we're still figuring it out. Right. But, but we'll get there and we'll look back and I'm hoping that we can see kind of that arc a little bit more clearly.
Jacob Andra:
Absolutely. I'd like to learn more about your specific company and product. I think, um, before we jump into that. How about I pitch you some different,'cause we, we talked a minute ago about, uh, scoping the right technology to the right task. What about if I pitch you some real world business problems and you kind of tell me like, oh, I would go to this type of technology or that type of technology for this specific instance. Uh, does that sound okay?
Alexandra Pasi:
Yeah, I love that. And this is completely on the fly too. So, um, full caveat, you know, there's always gonna be a deeper dive you wanna do, but I love this idea.
Jacob Andra:
the audience. We have not scripted this. We have not rehearsed this. Uh, I didn't even tell Dr. Pasi I was gonna throw this at her. So this is completely thinking on her feet. And we won't hold you, uh, you know, too accountable to anything'cause it is hard to think on the fly. Um, but let's say back
Alexandra Pasi:
I love this though.
Jacob Andra:
back to the example I was talking about a moment ago, where a company wants to optimize its supply chain and obviously supply chain stuff has. So much inefficiency and you know, they're, you're probably talking about an entire suite of different technologies, uh, for different aspects of supply chain optimization. But let's just say, um, one aspect is you want to be able to predict when a supply chain breakdown is going to ha happen or when it's likely to happen. There's probably gonna be like a probability score assigned, and you're going to be looking at. data, the different ways that breakdowns happen. You know, a supplier is underpowered and just can't deliver, or you know, global macroeconomic factors, political factors, who knows what, right? There are all kinds of things that can cause a disruption in your supply chain, and you wanna be able to essentially learn from the past and be able to So there's a monitoring aspect. You wanna be able to monitor what's going on and then. flag with some degree of probability when a disruption is likely to happen. So you can take, uh, some sort of mitigating action. So how would you go about, uh, constructing some sort of AI solution for that? What types of AI technologies would you use?
Alexandra Pasi:
Yeah, that's a really, I love this question, um, because I think that, like on paper, if this is a perfectly observed system, right? If you can observe all of the pieces and you understand all of the pieces that go into it, uh, there's probably a lot of ways in which this can be explicitly modeled, right? So you've got like a graph logistics problem and you can solve it mathematically using some like explicit equations or graph heuristics. In reality, that's often not going to be the case. So if it is the case for you, fantastic. You've hit the jackpot, like very low risk implementation of an explicit, you know, solution to this problem. Um, but like you talked about, there's often these macroeconomic conditions that like we understand, have an impact on the supply chains. We just don't know exactly how it ripples through. Right. And that's the, that's kind of the perfect setting for these, um. You know, we'll call it structured supervised learning. You've seen, and, and, and that's kind of a mouthful, but I'll, I'll sort of explain what shape of problems fall into that camp. So if you have a bunch of historical data where you're pretty sure I can observe the major contributing factors. I have that historically, and I also have the outcome, right? So I have, you know, what quantity of this particular item arrived, what price? And that's the thing that I'm actually trying to forecast out. And I'm trying to forecast that out under new conditions. Conditions that I've never seen before. That's where you actually want to use one of these, um, supervised structured machine learning approaches and. Machine learning and AI in general finds patterns, right? So the question is, how well does it generalize out those patterns on different kinds of data? Because when you're talking about planning for the future, you're talking about data that the machine sort of definitionally hasn't seen before. That's why you're trying to use something like machine learning. And so that's actually the space that we operate, um, very well in. Uh, that's one of the things that we've done when we've designed this technology is try to solve for an issue that we saw around these models, not generalizing adequately. And the other thing too that I'll mention is, and I think this is really important for people who are not, I'm not gonna try to get too deep in the weeds, but if you're not deep in the weeds of ai.
Jacob Andra:
in the weeds. It's okay to get just a little in the weeds. Don't worry about it.
Alexandra Pasi:
Fantastic. I I love to hear that. You might regret telling me I can. Um, but you know, structured data you can think of as like data that goes into a spreadsheet, right? So that's very different from like text or image based or video data. You've got these different fields. They all have values, they're related to each other in some sort of predefined configuration and, um. An LLM is not going to be the right tool for structured data for a particular use case. Uh, and that's really just because the LLM understands the world through the lens of the language that it's learned based on. A bunch of articles, but also the history of tweets and all of those other things that have nothing to do with solving your particular supply chain issue. Um, you're also gonna have a ton of historical data and insight on your problem that an LLM will not have. So it's got all of this noise from irrelevant context and then it doesn't really have the context of your organization. So, um, this is where you would want to use something and getting more specifically, uh, down into the weeds. Historically, what a lot of people have used here is something like auto ml, right? So an auto ML solution. Auto ML is basically like try a bunch of different machine learning models at once and jam them together. Uh, that has the upside of often the fit is good. Uh, the downside of sometimes it doesn't generalize out well to new scenarios. And also a downside of those models can be kind of hefty. So if you're trying to like, implement them on small devices or, um. Inference, you know, and use them very quickly. They're not a great choice. So, but Auto ML has often been one that people go to. XG Boost is another one that people go to. Um, and then our technology is a new one on the market. Um, that's exactly the kind of problem space that it slots into, but it has, you know, sort of better than the accuracy of Auto ml, but it's still lightweight, like an XG Boost would be so that it's a little bit more portable.
Jacob Andra:
Yeah. Uh, this is probably a great, great place to jump in and just kinda pitch a little bit about what you're building. I have a couple other use cases I want. Throw at you, but since you already kind of created the perfect segue, why don't you tell us more about, uh, AF one, your product, you know, and, and what you're doing there.
Alexandra Pasi:
Yeah, so our machine learning approach really aims to solve, I would say two. Big problems in that supervised machine learning space. We've already talked a little bit about when you will want to use a supervised, um, struc, you know, approach for structured data. And there are a ton of these problems. This market has been around for decades. It will continue to be around, uh, forever. Really, like this isn't, the space of problems is not going anywhere, but there has not been a lot of advancement in this space algorithmically for a while. Um, most of the advancement has come, like I said, from taking a bunch of different models and kind of jamming them together. Um, one of the things that you see when you take that approach is that the models then lose a lot of their generalized. Uh, generalizability. And what I mean by generalizability is, you know, you have a specific training set. The world outside of your training data is very dynamic. How well is the algorithm going to predict the future or a new set of, you know, under a new set of situations given its limited training data? So one, we set out to solve that model generalizability problem, and we do see. Significantly improved generalization. So there doesn't tend to be any difference that we see between, for instance, like it's fit to train data and it's fit to test or validation data. The second piece that we wanted to solve was really just around model size. So often these auto ML solutions are not super accessible to businesses that want to train on premise, depending on sort of the scale of their infrastructure and their data. Um. And then also, even if you do have access to the ability to train them, the models themselves can get quite large. So if you want them to inference very fast inference, like when you're using it, um, or you want it to be, uh, small enough to put. You know, on, on an, uh, a wearable device or a smaller device, you want that sort of edge compute optionality, maybe a mobile device. Uh, you're also quite limited there. So we've solved both this problem of model size and, and light weakness and speed and model generalizability. And the way that we did that was really just, we're a team of mathematicians. So we went back to answer this fundamental question of how do machines learn patterns? And, you know, our, our thesis, and this has been borne out, is that, you know, neural networks, uh. Decision trees, which are sort of the other algorithms out there, learn patterns in very naive ways. It's sort of like connect the dots. Um, and the world is not linear, so the world does not connect the dots. You know, you kind of look at a picture of a cat with the connect the dots and you can sort of figure out. There should be a curve here between these two dots, and that's not something natively that these algorithms are very good at doing. So we've introduced a new mathematical framework that's sort of natively non-linear, natively finds those curves and it finds a global solution so that you're really answering that question for the whole space of possibilities and not just for what you happen to put in as the training data.
Jacob Andra:
It's really cool. I love that. And, um, so far, what, uh, what specific use cases have you found the most applicability for?
Alexandra Pasi:
Yeah, so we've, we've spent a lot of time really looking into sort of higher risk settings, uh, to start. And that was because those are the places where generalizability is going to matter the most, right? And accuracy is going to matter the most. So we've taken a look at some. Um, we've taken a look with collaborators at some projects in clinical care, so predicting and preventing pressure injuries and other adverse hospital events, that's something that we're able to do much better than, uh, the existing auto mal solutions. So you compare that to sort of like 80 models jammed together and we're outperforming it in insensitivity and precision. So that's sort of, you know, looking at how well are you gonna predict the thing, how often do you get it right? And then how often do you sound. A false alarm.'cause both of those are important in business settings, right? Like you wanna get the answer right, but you don't wanna get the answer right at the expense of raising the alarm every single time because you just don't have the resources to, um, address that. And certainly in an ICU setting, that's a major issue, right? You have limited staffing resources, so you need to be able to have higher precision with the alerts. And so that's something that we've looked at and we found that we're able to significantly improve the state of the art in. Uh, we've also looked at applications within financial trading. So these are very dy, you know, again, very dynamic data sets where the market is always kind of changing. You can't count on what's happened historically to tell you what's gonna happen tomorrow unless you're finding the right patterns that actually speak to the governing dynamics. So we're able to increase returns, um, in these highly liquid markets while. Reducing some of that risk. And then the last application that's been really interesting, I think from a scientific, uh, perspective, as we've looked at some particle physics data, um, and really trying to, in that context, detect rare events. So, uh, you know, these are often very rare events that are being looked for. Um, and I think that's another interesting application of machine learning is, you know, you might be able to simulate what does a rare event look like, um, but. Finding a machine that can pick those patterns out and not lose it in the noise. That's an algorithmic challenge for sure. And that's, so that's, those are the sort of high risk, um, high precision, high accuracy settings that we've focused on so far.
Jacob Andra:
That's great. But if you were to generalize, like our LF one, uh, model is good for the following types of situations and kind of in the. generalization possible. So not specific to industry, but like, kind of just broadly categories ca categorize the types of real world, uh, challenges or use cases. Um, I know it would be something around, you know, one prerequisite would be you'd have to have enough data to actually, you know, train the model on and there would probably be a few others. So how would you sort of frame that if you were just giving the broadest generalization possible?
Alexandra Pasi:
That's a good question. So I would say, um, structured data, it doesn't have to originally be structured data. We see a lot of cases where you pull a structured value out of unstructured data. So you might have some sort of sentiment score that ends up as one variable next to like something like, you know, consumer price index, like so. You know, in a lot of cases where you're doing something like demand forecasting, you'll have a lot of original structured data sources. But then when you pull them in, you wanna structure, it's gonna say like, here's in, in some inflation metric. Here's sort of consumer sentiment metric here. You know, here's some historical, um, demand information. But the point is you have to have. Some of that data is structured. So, um, you can think of that as like being a spreadsheet or database form. And then the second piece is you want to actually be forecasting or predicting or automating something. Um, so you want a, a specific sort of label that you're trying to produce, and oftentimes that's a forecast. So as far as classes of problems, you might be looking into forecasting like, uh. Cost or demand in these scenarios where you don't have perfect insight into, you know, the exact mechanics of what creates demand or you know, what creates your cost in these sort of, you know, more opaque supply chain situations. Another application is you might, you know, we might be looking at anomaly detection. Or failure state. So you know, there's some failure state and you're trying to preempt it earlier, and you have some historical example of failures. You can sort of learn what does the system look like, where it's being monitored right before a failure occurs, and can I use that to intervene earlier? Um, and then you might also have some, um, applications where you're making sort of like real time decisions, right? So you might have multicenter sensor synthesis. In a robotics or a wearables application, um, where you're trying to take information from all of those sensors and then make a decision to, you know, send some sort of signal or move some sort of direction. Um, and then the last case that I think is really interesting too, that um, I think is underexplored is exploratory machine learning. So this could be things like simulation. You know, one of the things that machine learning provides is not just sort of a prediction of what's going to happen, but a model of. How outcomes happen. And because of that, you can also use machine learning models that are good ones, right? So with good fit, with good generalizability to sort of go in and say, okay, well, like this is what happened. This is what will happen, but what would happen if I changed this, right? And so then you can really start to interrogate those questions of interven ability and optimization. And what if planning in these very complex and dynamical systems without having to go make that bet for your business?
Jacob Andra:
Yeah, that's sort of like how they had that big, uh, ML breakthrough back whenever it was, when they finally were able to have these breakthroughs in beating. Send other games, you know, it's, they, they figured out that they could actually have it simulate millions of games, um, and, you know, learn from its own simulations, right? And so, uh, I think that's, that has relevance to that last point you were making.
Alexandra Pasi:
Yeah, synthetic data is a big area of interest right now, and there's a few different ways to generate synthetic data. One of them is through the machine learning models. Um, but then there's this really good question of if I'm training other models on synthetic data. I generated with a machine learning model, then if that machine learning model was not good, if it didn't generalize outside the original real world training set that I put in, um, I'm in trouble as I'm ingesting and making decisions off of subsequent synthetic data. So you really want that machine learning model to be solid if you're going to do simulation and, um, create synthetic data off the top of it.
Jacob Andra:
Yeah, there are. There are a couple of use cases, and this is circling back to what I was originally throwing at you, but it's, it's sort of tying it all together. Um, and you said, you know, a prerequisite is having enough, uh, data collected, but I wanna throw a couple use cases at you where maybe there isn't data, but you still have something you want to solve or a problem you want to optimize. And I wanna see if, uh, synthetic data can help with that. And you can comment on, um, whether AF one would be applicable or, you know, comment on whatever type of AI mach or machine learning you might, uh, bring to bear to solve these. It doesn't matter. But the first is, um, it, it relates to what you said about pulling in a lot of sensor data, and we can generalize beyond sensors to just say, this could be a lot of data from a variety of sources. And what you're trying to do, you may not have past data, but what you're trying to do is you're trying to synthesize and cross reference all this to find a signal in the noise. So you might be trying to find, uh, a specific pattern and you may not even know what you're looking for, but you can retroactively identify it once it happens. You can say. let's just say it's a fraud detection, which of course this has already been pretty well figured out by a lot of the financial companies. Uh, but it's the, it's the first thing that comes to mind where you're synthesizing all of these signals and then retroactively, retroactively, you can say, yes, that was fraud, and it can sort of learn as it goes, but you don't necessarily, uh, let's just say it's a situation. You don't necessarily have a, a large body of data to train it on. So you're kinda learning on the fly and giving it, uh, you know, sort of reinforcement learning, labeling stuff as it appears. And then. Essentially what you want it to do is get better at identifying these patterns on its own and then eventually being able to predict them. Um, so that could be one example, but essentially you're just pulling all these signals, trying to find patterns in them. You're looking for specific things. Uh, could be fraud detection, could be. Terrorism, you know, the likelihood of a terrorist attack could be, uh, there are a lot of places you could use this sort of thing, um, across the intelligence communities, financial markets, um, government, you know, et cetera, et cetera. So, um, that's one I want to throw at you. I'll, I'll let you respond to that, and then I have another one.
Alexandra Pasi:
Yeah, I mean, I think certainly fraud and anomaly detection is sort of the, like you said, that classical machine learning approach for this class of machine learning and AI tools. Um, but it's interesting too because like if you're in a case where you can model some of the aspects of. The, uh, dynamics of the system a little bit more explicitly, you can create synthetic data off of that. So, um, you know, a lot of people, there's something called a dag, uh, directed ASIC graph that you can use to kind of map out the causality of a particular situation, right? So if you are in a case where like you understand. What causes failure, right? You understand the mechanics that cause failure. You just need to be able to like make that decision really quickly. Or like sometimes it's these mechanics with a little bit of noise. Um, what you can do is you can actually model that out, like on paper using this dag, which is like a graph that's just like, if this happens, then this happens. If this happens, then this happens. And you can use that to simulate a bunch of different scenarios to produce that data. Then you can create a little bit of noise artificially on it to be like, okay, and then like maybe the sensor had this kind of error in the measurement. And you can use that to create a synthetic data set to train a machine learning model to understand like, okay, here's, you know, you had the mechanics to construct the data set, but you also understood where the error was. And then the machine is gonna learn the patterns of. How can all of these pieces interact to create patterns that I can identify as, you know, an error state or some sort of security breach or whatever the case, you know, terrorist event, whatever the case may be.
Jacob Andra:
Yeah, I like that. Okay, I'll throw another one at you. You're trying to optimize a lot of factors. You have an outcome you want to achieve. Uh, you don't have past data to guide you, but. You just need to have machine learning. You know, say, Hey, this is my set of constraints, this is the outcome I want to achieve, which is like the most optimal. Um, something, I'll throw a real example of this, and it's a, a current client of Talbot West. Uh, we haven't created a solution around this. We're just initially scoping it. So this is perfect timing. But, um, it's a company that does, um, environmental monitoring and they have. A lot of clients across, you know, hundreds of different sites spread across several states, um, about a hundred, maybe 120 technicians that have to go monitor all these sites. Each site might be on a different schedule. Each technician lives in a different location, and it's really like, it just causes your brain to explode to try to like even figure out who should visit what site, when, and how to get the mo You're trying to get the best bang for your buck with human hours spent. Have people drive the least amount of distance, all of that, right? And so it's not like you have a lot of past data. I mean, you have data on where the locations are, you have data on the schedule, uh, of when each location needs to be visited and reported on. And you have data on where each person lives there, you know, locale. And so given that, um. want this machine learning model to just tell you a optimal schedule that's gonna optimize everything. And I suspect I'll let you speak to it, but I suspect the answer is having it run simulations and, uh, grade itself on those and create its own synthetic data. But why don't you go, tell me how you would approach that and if, uh, if all of one would be a good candidate for that.
Alexandra Pasi:
Yeah, I mean, I think that this is a really interesting example because this does have a really explicit underlying structure, right? So where you are trying to sort of generate more data, you can absolutely generate some. Again, this is really similar to the prior example that you have this underlying graph structure of like who lives where and where are they trying to go, and how long is it it gonna take? So you can come up with an explicit solution in those cases. But then there's probably a lot of. Places where you want to be able to add in some error, you know, on time, and you can model some of that explicitly. But you know, as you bring in more and more extrinsic variables that you want to also use for specific pieces of the forecast, this is where you might start to want to bring in a machine learning solution. So I think that, you know, what I would recommend in a case like this is there's a lot of explicit modeling that can be done, um, depending on sort of. How broad you want that, uh, set of scenarios to be able to be expanded, right? For you to be able to accommodate for, um, and you know, the pieces that are maybe blind spots. In that explicit model, you can then start to gather additional data to supplement. You know, call it a synthetic data set and turn that into a machine learning problem, sort of expands the domain on which the model operates. So that's a really cool example where you do have some explicit model ability within a specific domain, and then depending on how you want to expand that out, you may end up bringing in a machine learning solution where something like an al of one can really help too. Run simulations and generalize given a number of external variables that are not yet accommodated for within that specific, uh, explicit model.
Jacob Andra:
And so the model could run a lot of simulations and essentially you could build in all the, uh, variables you needed. And you could even imagine a situation many times more complicated than. Than the one I just said. You know, far more, far more complex with far more variability and far more factors. Uh, the one I said is actually, like you said, relatively simple'cause you really have three factors. Um, but you could bring in like dozens of factors with tons of variables, right? And so the, the model could run all of these simulations and essentially be. Grading itself. And, and essentially you could give it the parameters it's optimizing for, which is least miles driven, uh, least human hours spent or something like that, right? And so it can run a lot of different, um, simulations of how it would write route drivers in different ways or different schedules it could create and then grade itself on, on those and, you know, eventually come up with the optimal one and, uh, report it to you, right?
Alexandra Pasi:
Yeah, that's exactly right. And you know, you might, for instance, just care about distance and distance is something that's fixed and, and that's great, that's really easy to model explicitly. But if you start to care about time, then you're taking in, you know, things like traffic and then there's a bunch of external variables that come into that that are a little bit foggier on how they interplay. Um, and so you can use machine learning solutions to help predict some of those variables. And then either plug them in explicitly or kind of invert that problem. But either way, you have sort of this like explicit modelable core and then you have machine learning that can be brought in to sort of, you know, carry some of the uncertainty. Extrinsically, like things like traffic, if that is one of, you know, the variables that you're optimizing for, or if some of the variables that you're optimizing for have these more external factors that are not as explicitly modelable.
Jacob Andra:
Yeah. Oh, that's great. I like that. Well, anything else you'd like people to know about what you're up to?
Alexandra Pasi:
Um, I, you know, I think that a lot of it is where we, where we started really, that, that just that there's different kinds of machine learning. Um. The, the hallucination question is not, you know, as relevant with certain types of approaches. So there are really good ways to mitigate and quantify the risk that you're taking on with AI adoption. But a lot of that just comes down to understanding the landscape of what's out there. So there are a lot of options, um, for businesses that are looking to implement ai. You know, I, I think. To really significant impact. And a lot of that will come down to, like we discussed, understanding your strategy, not necessarily just looking to automate a specific task. That's, you gotta kind of think bigger, but then on the flip side, as you're looking to mitigate the risk of making those changes, um. If you're looking to transform your business or the way that you do business, the last thing you want is the machine learning model and some technical quirk thereof, or some hallucination being like a major part of that risk. And luckily, there's really important ways that you can mitigate it. And a lot of it's just using the right technology. Um, and then I think another piece is a lot of businesses are concerned about cost, right? Uh, going into adopting AI and there's. Good news on that front as well, which is that there are a lot of, um, machine learning and AI solutions that require less compute. You're not gonna wake up in the morning to a devastating surprise cloud bill and are a little bit more approachable to implement. Um, you just have to. Be willing to do the critical thinking about how you're trying to change your business and then working with people who can help you understand or, you know, taking that to your team who can help you understand where to bring the right technologies to bear on the problem.
Jacob Andra:
I couldn't have said it better myself. Well, thank you for coming on the podcast. We'll have to have you for a follow up discussion.
Alexandra Pasi:
Yeah, that would be fantastic. Thanks so much for having me.
Industry insights

McKinsey in WSJ: how Big Consulting is adapting to the age of AI, and how Talbot West is already there
