Artificial Intelligence & Machine Learning in the Age of Ransomware

Video alarm

print
PRINT
share
- EMAIL
- FACEBOOK
- LINKEDIN
- X

In a world of ever increasing attacks, this session discussed modern techniques for mitigating the biggest cost drivers should your company suffer a breach. The session facilitated a collaborative discussion on the different types of Artificial Intelligence & Machine Learning workflows available in the market today.

A thoughtful approach to deploying these powerful solutions will provide the quickest time to insights, delivering the strategic advantage needed to lower costs when analyzing exfiltrated data.

This transcript was generated using AI and reviewed by an editor. To the ACED's webinar channel. My name is Deja Miller, and I am the marketing manager here at ACED. Today, are hosting a webinar with our fabulous partner, BDO, and we'll be discussing artificial and intelligence and mis or sorry, artificial intelligence and machine learning in the age of ransomware and data breaches. Before we get started, please know that we are always happy to take your questions. So if you have any questions, please submit them in the Q and A widget located at the bottom of your screen. All questions will remain anonymous. And all questions will also be saved at the until the end of the presentation. If you would like a copy of today's slide deck, it can be downloaded from the resource widget also located the bottom of your screen. Without further delay, I am very pleased to introduce Brian Wilson who will be kicking off the presentation today. Take it away, Brian. Will do. Thank you very much. And first, let me say thank you for the opportunity, to share this information through ACED. Obviously a great partner to Bdo, and, you're really keen to kind of get into what this presentation is all about. We are going to just to level set at the outset. This is not a course in machine learning or data science. So You know, if you have those types of questions or you're looking to figure out how to build your own neural network, that will not be covered today. But we do have a lot of good insight, that we'll share with you. And, the agenda just we kept the the slides pretty light in terms of the areas we're gonna talk about, that are on the screen there for you. And in terms of intros, I'll go ahead and introduce myself first. I guess, my background's a little bit, interesting in that twenty five years in the industry, helped build several, kind of forensic practices and including technology practices across a couple of different big four. So KPMG, I was a partner there for a better part of a decade in Deloitte. I was there as well, mostly over in Asia, frankly. I spent ten years living in Asia, working out of Shanghai, and then Hong Kong. And in that role, I got the opportunity to build a lot of technology first focused on cyber and incident response. And then, you know, as data breaches became more common and the privacy laws became more kind of, just kind of continue to develop, particularly in that region. I was very much involved in in building out those service offerings there. So, a little bit about me, I guess, in terms of my role at Bdo, I, I do globally, lead our data breach advisory services. And really what Jay and I are particularly focused on is the meaty part of that, where if you've had a data breach and you've got data outside of the data estate, our job is to help you, you know, resolve that, you know, get the quickest time to insights. It's the lowest total cost for clients with the, with the approach and the methodologies that we use. So a little bit about me. I guess Jay, do you want to go ahead and introduce yourself? Yep. I'm Jason Park. I'm the global lead for technology innovation for for Bdo in the data breach service line. So I have about twenty two years of experience in technology. I too have experience inside of the big four started my career at Anderson, at as a, technology operations lead for our, forensic technology practice. There I moved to KPMG when Enron happened. So as a part of that, ten years inside of forensic technology with KPMG, and then I had a little bit of a career change. I was an enterprise solutions architect for for for KPMG for another ten years building advisory solutions for the different, service lines. So, I did a little bit of a boomerang, came back to forensic technology working with Brian, looking at technology and innovation and solving data problems for, for data breach. Great. Thanks, Jay. One other thing I will note, will note, if you've got a burning question on a particular topic that where Jay and I are are talking about, please do put it in the Q and A. We are we are monitoring those. So we'd like to have this be as interactive as we can. You know, be it, you know, over the, over the internet. But, we, we also will reserve some time at the end if there's just questions that you know, kind of pile up for Q and A. We're planning to have at least ten minutes at the end to try to address all of those if we can in the time that we have what. So, maybe we'll get into it then. The first bit we're gonna talk about is really artificial intelligence. The reason why we like to start with this, it's more of a level setting than anything else because depending on who you ask, you may get a you may get a lot of different answers what AI is and and what it isn't, particularly in the ediscovery world and and how folks are using it to solve problems and and the types of problems that they're solving. So, I guess, Jay, I'll take the one on the, on the left just to level set on on where we are today. In terms of the AI that we use and we deploy and it's it's commonly available in, you know, in the marketplace is really narrow AI focused on a particular problem. That's trying to be solved. Right? And so there's a lots of different use cases that you see in the marketplace, but it's really, again, tasked with. We're trying to solve a problem. We use we use a lot of algorithms and other technologies to to solve that particular problem. And that's where we are today. Now general AI is kind of Right? I mean, they can handle lots of types of requests, etcetera. We're not there yet. I think, at the end of the day, we will be there. I think most most of the, the pundits they will be there at decade or two. Super AI, you know, I think it's I think of, terminator, if you ever saw those films. That's how I see it. I don't know if we'll ever be there. So at the end of the day, you know, time will tell. But those are kind of the the three general categories of of artificial intelligence. And again, we're we're really focused on the narrow AI, particularly as it solves data challenges for our clients, you know, before, during and after a data breach from a data breach advisory perspective. But specifically as it relates to data rates reviewed. So, I guess with that, Jay, when you wanna talk a little bit more about how we are looking at AI generally and within the e discovery business and some of the use cases? Yeah. I think so. You know, just foundationally, From a data breach perspective, I wanna set the table as far as, like, what the problem is. So on average, a company manages about a hundred and sixty three terabytes of data. And enterprise, they're about three hundred and fifty worldwide, today, we generate about one hundred and twenty zettabytes of data per year. That is a billion terabytes of data. That's a lot of data. So traditional knee discovery and, and the methods that we use in the EDRM model, from a data breach perspective, because of the SAC, seventeen a eleven c d e b, whatever rulings, you know, the notification requirements are getting shorter and shorter. So we have a compounding problem of data growth and shorter timelines. So we need AI and machine learning to, to be able to interrogate datasets for PII, PHI, and those types of things so that we can stay within the guardrails of what are the regulators are asking this to do. So from a machine learning perspective, to be able to take a data set and cull it down to, to what would most potentially and is most likely to have data in it to, to, to be able to interrogate that with eyeballs on documents and, and be able to extract data out or have machine learning as well as the eye models extract out those those entities for you. So today, I think the landscape inside of e discovery everybody's familiar with relativity. There are other tools like reveal, Cannopy, Nuicks that are doing good things, but I don't think there is a best in class platform. Wouldn't you say Brian? There's no single best in class, but we are striving to to get there. Yeah. I think that's right. I mean, I also think about, you know, even outside of the platforms, the traditional e discovery platforms, you know, you've gotta look at, you know, kind of cloud based services, particularly or in the Microsoft suite in terms of how the data can be analyzed as well. We'll kind of get into this a little bit later on one of the other slides, but you know, the goal, the goal really should be that if, you know, to keep the data, you know, if you already have data outside of the data estate, If you could minimize the amount of data as a professional who's helping a client solve the problem coming out of the organization using some tools they probably already have access to, you know, you're you're you're you're helping to minimize that risk of of of that extra data that now has to come out of the organization and get analyzed somewhere else. I mean, we have all sorts of security controls and protocols around all of that. But at the end of the day, it's, you know, we tell our clients, you know, with threat actors, you know, they that one, they are using, artificial intelligence and machine learning to, you know, optimize their their attack vectors and their their, threat opportunities, but they only really need to be right once. Right? If we're we're we're playing with client data or receiving client data or helping client, you know, we have to be right all the time, right? And so it's, it's one of those things where if you can, you can, you know, minimize the amount of data that gets out of an organization. You can do that work behind the firewall. It's gonna be more efficient. It'll probably be, you know, less costly depending on the tool set being used and the expertise involved. So I, I agree with you. There's no, there's no silver bullet at the moment. Right? I think that's one of the reasons why Jay, you and I, we we get it we love talking about it. We like we get excited about it because we like really sticky difficult data problems. You know, the messier, the data, the better. We've had you know, over the last couple of months, some pretty interesting legacy data that, you know, you wouldn't think if, you know, if you're if you're a startup and dealing in the SaaS world, you you may not be thinking about some of the legacy mainframes that are out there from a more mature organizations in around twenty years where, you know, don't want, you don't think about those in, in, you know, we need to, as professionals in this space, we need to be ready to solve those problems for our clients. So it's, it's really fun. It's, very challenging. And I think, you know, maybe, Jay, if you can touch on a little bit around these four kind of pillars of how the AI is is a slide, like maybe provide some use, you know, use cases around, for example, computer vision or the natural language processing. How do we you know, how do we see that used daily in in what we what we deal with? I think the most important quadrant in in in that but all of it is important, right, natural language processing. Inside of data breaches, they're extremely important. It it if you have a good natural language processing engine, you, you can identify more accurately PII inside of inside of the data estate, especially if it's been verbiaged, complimentary to that And let me take a step back. So from a natural language processing perspective, context of a nine digit number is extremely important because it's just a nine digit number that that we're looking for. And I know we're gonna touch on this in a little bit like, search terms and rejectes are very specific. Yes. They can be fuzzy. Yes. They can do a lot of different things. You know, however many characters, you know, from that, search term hit or or reject expression. But understanding the context of the entire document will help you identify bits of PII much more accurately. And that's where machine learning comes in, right, to be able to take what it seen before and the decisions that you've made before and and improve that model, depending on the number of vectors that you wanna use inside your learning model. You know, we've seen, and we've created multiple models that are hundreds of vectors inside of that model to be able to identify and interrogate a data set, to be able to find a social security number, difference, between a Medicare ID, a student ID number, those contextually are all gonna be very, very different. I think computer vision. One of the things that, that I read wants to I know I gave a bunch of statistics before, but we create about a hundred and five hours per minute of video content. Whether it's in a Zoom meeting or, you know, whether it's something that's shared Snapchat, YouTube, all of those things. But to be able to interrogate a video and be able to strapulate, a driver's license plate on, on a car, or to be able to do facial recognition in identify it as, you know, that is not, you know, John Doe, but it's a person inside of a video. Being able to do those things, as we get into outside of email and and text that EDRM and, traditional e discovery does fairly well. Looking at those other different, formats that actually could contain information. And then going into what we do, we want to be able to, create models, from not not just the, the entities that we want to target, but by jurisdiction. So California fifty states, one of the fifty states has one of the highest, amounts of regulatory and the strictest compliance, notification requirements, in, in the United States. We wanna be able to prioritize data that comes from California first when you're looking at a large breach because those have the strictest requirements. So when we talk about our workflow and we'll get into some of that, in the next next bit, but know, being able to prioritize and orchestrate a workflow that that, you know, services are client is extremely important. Leveraging technology is extremely important in the right order. I appreciate that. And I we got a couple of good questions in the in the Q and A, but, you know, I we got it. We gotta talk about about it because everybody's talking about it. Generative AI, Jay. Yep. And you made me just talk about what that is, what the use case is, and why is it not on the screen here? Why is that not one of one of our pillars at this time? Well, I think by the nature of generative AI, it's it's generative. Right? So it's using, data in a model to, to create, you know, unique content based on the props that you provided. I think, Microsoft just released a co pilot for for security, I think last week. I think that's really, really exciting. So but we can talk talk more about behind the firewall and how we we think the future of of data breach is gonna go to to answer some of the questions, you know, to be able to, provide good data mapping, data governance, behind the firewall, knowing where your data lives, being able to use generative AI to to be able to tell you where your your data lives. That's gonna be super, super exciting, but I think, there's there's some road that we need to pay before we can get to to that point. But AI and, machine learning relative to generative AI, it creates data. And I think the prompts and the way that people use generative AI could be potential for data breach, but the the data that it create and where it's stored, you know, I think that is also super interesting because these models if if you are providing them, your IP, whether it's chat GPT or or some other tool that that uses one of those AI models, if you if you send out into the world, IP from your company, then, you know, that that potentially could be conceived as a data breach. Generative AI, and I think the, the private models that that people are and companies are creating to, to, to sort of, control that. I think from a data growth perspective, I think some by estimation, the the one hundred and twenty zedabytes of data that the globe is gonna create. I think it's estimated at, by two thousand twenty five, one hundred and eighty one zettabytes. Is gonna be created with generative AI. It's that, that's, that's only gonna grow exponentially. So, where the prompts are stored where the results are stored, is extremely important. So, you know, I think it's it's, I think companies are wising up and, and understanding from, from, generative AI perspective that you know, the, the prompts as well as the, the output needs to be secure. So they're putting that behind the firewall. Yeah. I think that's right. And I appreciate you kind of expanding on that. And actually, I'm glad you mentioned co pilot because we got a couple of questions in the Q and A, which is really around preventative measures that can be taken, either in anticipation of litigation or or kind of pre breach, if you will, you know, that can obviously help to automate optimize some of the the security risk assessment and and and kind of, you know, the processes you can put in place to reduce your overall risk to a data breach. And as you know, that's part of our solution, in the data breach advisory services world, and we have a very strong team that that focuses on that, but maybe just take, a few more minutes to talk about your thoughts around, for example, co pilot, being its, you know, Microsoft's product and, you know, once, I think, fully leveraged could be very useful, from a from a proactive standpoint. But, and what are your thoughts about that? Well, this is all speculative at this point. Because I have not had a had an opportunity to to test drive, co pilot from a security perspective. But I think the idea that I can provide prompts to security, and compliance, and and generate a report where from a data governance perspective, where, potentially, areas of improvement are needed, or if you're responding to an incident, doing incident response, being able to generate customized reports from multiple data sources, aggregating that into a a a consumable format very, very quickly. I think one of the biggest, what would you say? You know, from a Microsoft, the the critics would would say that, you know, some of Microsoft's tools aren't integrated enough, hard to use. It's a little convoluted. I'm hoping that co pilot can become that, easy vector for, for folks be able to get to their information much quicker. I know it's helping, you know, you know, students, you know, report generators, folks that that want to generate, you know, content for presentations like like we're out here. Co pilot and Chat GPT is is a great resource to be able to aggregate that, a large amounts of data, into a simplified consume format. But, you know, I think, you know, I think these are all accelerators. You still need to have the practitioners that know what they're looking for to, to be able to prompt on the things that they, you know, want to focus on and, you know, our threat actors are, are evolving as well. Threat actors are very, very sophisticated. Have access to commercial resources. A lot of these clearing houses for, for ransomware, are, corporate where they have, you know, help desks for both people that want to pay ransomware as well as get help to get ransomware, working because they do both. So Yeah. Good point. Good point. Another comment in the Q and A is about, you know, how large language models or AIs or or or eventually could be used and and targeted towards a data breach. I mean, I would love to be able to get, you know, help help a client be able to say, where's all my PII or PHI within my and have, you know, prompt it that way and get a response that says that you noted earlier, you know, it's in these, you know, these areas, and it's targeted and all that good stuff. So I do have help. I do have hope, but as you know, with with all AI, it needs to be trained, needs to be tested and validated. And then you know, kind of as it grows mature, matures over time, it needs to be monitored. So, but I think there's there's definitely hope there. Maybe we'll move forward into kind of the next, the next slide, which is around. And I thought this is this is useful because not not everybody has really spent time in a, you know, in a cyber incident responding to it, solving it, leading into a data breach review, which is where we spend a lot of our time. So I thought this this slide was helpful just to kind of level set. And so on the left, just for the, for the audience here, you know, threat actors, you know, the it depends on the size scope and scale of the organization and a level of sophistication of their cybersecurity practices. But, you know, the day, an average real time for threat actor inside of a corporate environment is months. I'll just leave it at month. They could be six months. It could be more. It could be less again depending on the level of sophistication within the organization and and their, cybersecurity maturity. But once they get access, however, they gain access, because there's lots, lots of attack, vectors and, and, and methods. They generally spend some time and really do their homework. They, you know, it's it's rare where you got somebody who gets inside of an organization and you know, lightning fast within, you know, minutes. You know, they've they're they're taking data out and then, you know, detonating the ransom. I think where we hear about these stories where sounds like things are moving very, very fast. You know, what's really happening is the threat actor was in there for months. And did everything, you know, what they call living off the land under the radar. They're very they use the tools built into, like, you know, the windows platform or what have you. To do all the surveillance that they need to to prep, you know, their, other activities. I mean, and, you know, if you think about it, for those of you who unfortunately have had the unfortunate, fortunate experience of having to deal with a ransomware case, you know, two, three years ago, maybe four years ago, you'd have threat actors get in, you you would have the ransomware detonated, and they just wanted, you know, they they they wanted did you pay the the ransom to decrypt your files. Well, you know, then as as the marketplace cut, you kinda caught up organizations became more mature, everybody's doing a lot better job at getting the backups you know, you know, kind of online offline and and, you know, inaccessible to threat actors. And so now the threat actors have moved into the double rants in worse scenario, very commonly where, and they come in quiet, they live off the land, they leak out the data, they get what they want. And then they, you know, detonate the ran somewhere. And so now you've got a situation where, you know, you need you need to pay a ransom to decrypt the tools or decrypt the files and potentially, you know, keep them from, you know, disclosing the data that they just exfiltrate it from your environment it's referred to as double r double ransomware. And then and you'll I think some of the, you know, there's there's also triple ransomware where they have a look at your customers and your supply chain and they target them as well to put additional pressure on, on, on the ransom event. So I think it's important that we all understand that because when, you know, when when a, when the data breach happens, You know, we're all running very fast. It's it's it's a lot of fast moving conversations. Hopefully, you've got an incident response plan in place that's been tested and vetted and kept up to from a communication standpoint, but, you know, generally, it's you respond, you know, to contain the incident. You bring in some compromise and and the different types of logs and artifacts that might be across your systems, And at the same time, you've got strategic strategic communications going on internally, externally, and thinking about, you know, regulatory obligations depending on how big your operations are and where they sit in the world. You need some of the time line or, like, I always I like to use the example of France. You have a a date of reach in France. You have to report it to the police within forty eight hours. Otherwise, you you your cyber insurance, you have it may not may not kick in. So a lot of things moving on very, very fast as the team that's responding to an incident, but I always try to and keep in mind, it's like the threat actors have more than likely been in there for months. And they've taken stock. They've done what they can. Now, I can share with you that know, some of the work we've done just recently, there's definitely a quality, you know, there's really good ones, you know, you, you know, you may hear of nation state threat actors and others that are, you know, organized crime, right, that they, they're usually pretty good. They know they're doing. But you also see, you know, through this evolution of what's called ransomware as a service, then you got some folks who aren't as sophisticated. Or they're copying playbooks from, other ransomware groups who have been recently, been kind of taken down by federal authorities or law enforcement. So I kinda see it all, Jay, and I had a recent case where, you know, we had a, a threat actor that you know, had they known what they had and the excess that they had at the time, it could have been very, very bad, a very, very bad outcome. But they kind of didn't do as much homework as they should. They stuck to what they had access to, exfiltrated some data. Yes. And so that was still a painful event for the for the organization, after the ransomware was detonated. But, you know, fortunately for the for the company, you know, they were able to stop the ransomware from fully detonating. And as I said, the the one piece of their organization, their network, that was really very important for their ops. The, you know, the, the, the threat actors ignored it. So it's a very interesting scenario. But I wanted to run it out, and I think in terms of the privacy notifications I touched on the timing, yeah, for France and other places have seventy two hours. I think California's seventy, seventy two hours for notification. And we now have the SEC requirements for public companies, which I believe is four days, if it's material to, to a public, a listed organization. So that's kind of all in play. And then we've got a tail and it's really these class actions where you know, these the state AGs and the and regulators as as breaches are being reported in, they're posting them on their website. And so we've got a plaintiff's bar that's pretty sophisticated or becoming more sophisticated and they're tracking that. And so now it's not that uncommon where have a, a client that we're assisting through their data breach. They've gotten to the point where they need to report their, you know, their data breach to what whichever regulators or authorities that are are notifiable and start notifying data subjects. And, you know, within weeks, we class actions are being filed, right, and any campaigns are being made. So it's becoming definitely more and more high stakes in terms of organizations And, again, time to insights is incredibly important in the use of technology to assist to get the right answers and the right technology to get those answers or is is really, really key to being successful or to be at least mitigating, you know, organizational risk and and reputation risk in, in, in the lives of legal risk. So, Jay, I'll pause there. I said a lot in a little bit. What are your thoughts on this? Is, particularly around, you know, expectations and, and timing? What are your thoughts? Well, I'm gonna almost flip flop. You're usually the optimist and I'm the pessimist, but, you know, lately the, the, the types of jobs that have come in, yes, there was a data breach. There was exfiltration of data, but I think, you know, I think the the industry is moving in the direction where, you know, because we've seen at least two, two incidents where it was part partial detonation and and internal IT and and the tools that they had internally, whether it's, you know, crowdstrike or unit forty two tool set of tools, Rapid seven, or or the Microsoft tool they were able to to contain the the data breach so that you know, there is only partial detonation. I know that's, not necessarily the best scenario, but, you know, partial is better than complete, right? Your, your, your entire estate being leaked to to a ransomware site, you know, your data posted to the dark web, the you know, where where we come in, it's it's it's the data piece, but I think the the proactive nature of the services that Bdo provides, as well as a lot of, a lot of, good good practitioners out there in the industry. You you have to have the reactive and the proactive, as well as, you know, making sure that from an e traditional e discovery perspective, you know, class action lawsuits and you're ready to do, that sort of work. You mentioned that You know, I I was talking to another practitioner earlier this week and, we get excited about, you know, they get excited about finding that needle in the haystack. Smoking gun and, you know, e discovery case, for for data breach, I have to find all the guns and all the bullets and all the needles in that stack. So, you know, we, we we have to strive for precision more than you know, stumbling on on something that that's that's relevant or not relevant or or whatnot. So I think the industry is moving in the right direction. We're all, definitely getting smarter, but I think there's still, you know, companies out there that that that, both need incident response, doing tabletop exercises, understanding their data landscape, where their date is, you know, where where are their crown jewels? I think that's that was a huge message. At the net diligence conference last, couple of weeks ago. Yeah, so those are my thoughts on on you know, context here. I appreciate that. I think, you know, you did touch on one thing, and we'll get into this on the next slide where we talk about how we do the work, You know, I, I, I, I remind people almost daily where, the people eighteen months ago, they had a data breach and they're thinking, oh, it was a fishing exercise, and I've got, you know, you know, executive inboxes that, you know, have been traded. And so the the natural, the natural thought was I'm going to go and do I'm gonna use, you know, traditional e discovery platforms probably a little bit of, you know, technology assisted review, right, to seed our data and to kind of get through that review quickly because it's email and we've been working with email for decades. Today, it's different. I mean, there's still email for sure. Right? There's still the business email compromise that that are suitable for that type workflow, but I in the last six months, really, last year, I mean, we're seeing entire file, you know, shared, you know, file shares x traded terabytes of data in, very large scale structured data that's coming out of the organization. To the point where you can't you know, databases can't even be put into, e discovery platforms. Yet, I think there's a lot of work being done there on how to better ingest and and and use structured data within technology platform, but, within the discovery platforms. But it's just the nature of the evolution of this space. And I think that's one of the things that makes it so interesting and, but also very challenging as you said. You know, you're you're not looking for the, the smoking gun. You're looking, yeah, you said, all the guns and all the bullets are, you know, instead of the way, not just the needle in the haystack, but you gotta, like, sort all the sort all the needles in the haystack and make sure that we've got them grouped and identified and the regulations that are applicable, and in prioritization as well in terms of notifications to regulators and obviously, data subjects as part. So, I just think it's it's it's still a really unique space. I'm a big fan of, you know, leveraging up you know, AI and and machine learning. And I know we had a question in the in the Q and A. The difference between AI and machine learning. Simply AI just kinda covers a wide variety of of analytic use case, machine learning is part of that. So our deep learning models and neural works where you really get, you know, in terms of contextualizing the work that's being done, you really need to get into the the deep learning exercise and really building those models and training those models. I I tell I tell my, my clients and and and my friends You know, it's like our even our own models, having done this work, for so long, you know, they they started off as babies, and then we fed them some data. We trained them on that data. We validated that. But it's it's an evolution that they need to mature over time to get better and better and better and better, to the point where I don't know that we're ever gonna get comfortable, at least to say, okay, push the AI button and it's, it's found all of our data elements And, you know, we can we've got automated workflows to link in the data subjects. I don't know if we're gonna get a hundred percent there, but, you know, that would be my utopia. We've got models that are so good that, you know, the the false positives are very minimal. The quality of identification and the linking to data objects are, you know, as automated as they can be. So that would really take the cost out of the, you know, data breach, scenario, but also it ex the accelerator, as you mentioned, to to the, the timing insights. But it's gonna take a lot of time. It's gonna take a lot of data. I think part of, the challenge in this space is, the use of large language models, which A lot of the platforms are looking at at that now, a technology discovery platforms, to train up different models, their synthetic data being used in one, there's large language models being used in others and, you know, but at the end of the day, you have to really be thoughtful about how you're training these models. Right? You know, LOMs are great for lots of different things, but at the end of the day, may not be the best way to build an AI model for data reach, what you really need is actual data that has, you know, PII, PII in it that you can train those models on. And they continue to grow and mature. So I mean, we can get into, the next slide. It's our last slide. So as I, I promise the audience, we, we don't have a lot of them. So this is the last one. But this one's really more about how work gets done at the end of the day. And so we've got, I want to leave about ten minutes maybe or so at the end for, any further Q and A that comes in. But, this is really this is a little bit about this is a little bit of the the sausage making, if you will. And there's a lot more steps than are on this slide. But at the end of the day, it's you know, this is how we see the world today with today's environment, in the types of datas, data that we're receiving, and having manage through and process and ultimately, you know, report out on. But, maybe I'll I'll take the one on the left in terms of some leading practice J. And then you can walk through this workflow, and with your thoughts, if that's okay. The, I think the three, the three main points for leading practices right now and in the data breach of use space is, you know, when you get the data, inspect the data, reconcile the data, but know what you've got. And this kind of goes back to my reference previously where eighteen months to go. We're just getting up. We're getting a bunch of PSTs. Alright. We can't do that anymore because we're getting a lot of data that will will let the platforms are not designed to intake. And it'll either run very, very slow. It'll take forever to get resulted. They can process them at all. Or it'll it'll it'll put a halt to those those jobs if you drop them into a platform without really taking good careful stock of what you've got having a conversation with, you know, the organization about the types of data you have, and, and, and with counsel as well. Usually, privacy counsel is involved. Of course, And so it's really a meaningful conversation around. Well, we don't want to just dump everything, you know, terabytes and terabytes of data into a e discovery platform. Here's the file types we have. This is what we're seeing. We do some sampling. We do we discuss it with the client or the organization and council to say, well, what's the best approach on this. So, you know, definitely inspect and make sure you're picking the right tool for the job is because, you know, for us at Bdo, we provide our clients with optionality. We've got several different platforms that we have access to. We also have, you know, cloud based services and, and a whole team of folks data science, well, data forensics folks that can can solve some very complex data problems for us. So we were well equipped, but We're really are thoughtful about what's the right workflow, what's the right tool for the job. And I kinda mentioned the second data point is around, you know, log files, legacy file formats. I mentioned that earlier, but I know, Jake, we we had some very interesting data files in the last six, data type in the last six months. And requests for even newer SaaS based, platforms that folks are using to manage their operations, which you know, connectors aren't built out for. And, you know, you really have to kind of work through what's the best way to obtain the data, validate it, obviously chain of custody because we're doing forensic work very, very important. In case at the end of the day, we have to have a defensible argument around the workflow and, you know, what we did. So, but that's really important. And then the last one is is really knowing the AI, you know, and touched on this earlier, but particularly within the platforms, the discovery platforms, and understand how those platforms models are built. Right? Some of them are continuous learning models, that, you know, kind of, they grow immature as you put more data through it. So they're trained on, data that has PIIPHI in it. Some of them have synthetic data models that are either use a very quality synthetic data, but, again, it's synthetic. So, you know, there's some risk in that. Some of them, some of the models are LMs. They are based on MLMs, but they're very specific in their in their approach in the model design and development. So, And again, we touched on it. There's no silver bullet. There's no perfect answer right now. So that's why it's really important that, you know, if you're in this, if you're you find yourself in this situation, you know, be thoughtful. I mean, the the clock's always ticking. You gotta run very fast, but be very thoughtful about the tools you use that in the data you have and the AI that's applied because you can't find your if you just if you if you don't manage it correctly, you could end up costing yourself more time than you would have otherwise needed and cost and may miss deadlines. So, those are really the, the bleeding practices for my thoughts on on how to get this done quickly and efficiently, effectively using AI. Jay, do you wanna, one, contribute additional thoughts and leading practices from perspective and then to kind of walk through, kind of the workflow we have on, on this last slide. I think, collaboration is the from a leading practice perspective, having a great relationship with privacy and outside counsel. Number one, I think that's extremely, extremely important to be on the same page of of what you're looking for, what you're targeting, what they what they think is has been exfiltrated and, you know, being able to understand the regulatory requirements across the board. Number two, I think it's, important to, to work with your the client to understand what their and goal is. So, you know, one of the things, from, one of our previous engagements was, you know, the there was data exfiltrated, but, you know, there wasn't enough information inside of that in side of the data piece that was was that was stolen to trigger, notification for regulators, but they wanted as a as as a company to be able to notify their employees. So It's extremely important that privacy council understands what the client wants to do and be able to prepare for that notification because at the end of the day, class action, is is definitely out there. Just to notify people, you know, just, to do the right thing, you wanna make sure that that the plan of action is is laid out, and we're we we wanna do right by the client, but they need to understand the applications of of what they're doing. So over notification is, is a huge risk. Right? So we, we, we wanna be able to do that balance of, of doing right, by the folks that are affected, but but also you know, for the client as well. The other piece is, and I think you touched on it a little bit before, was the data landscape, you know, business email compromise, fishing, you know, that that's still real today, but you know, the, the large data breaches that have happened may start with that, but then it's, exfiltrating the network, whether they're compromising move it or, some other file share or RDP, they come in and they go ahead and expand. Right? Once they're inside of your network, as long as there's paths to other other areas, whether it's, peep People saw database or, you know, some other financial, CRM, or financial information database. They could be taking it all. And, you know, you need to have the right tools to be able to detect that. So the the the third area when a breach happens, that a relationship is extremely important is with the cyber forensic team. You know, that are that's doing the incident response. The identification and the containment to be able to understand, what was taken, how it was taken, and, you know, is there an opportunity to enrich data later on to be able to fulfill a client client request? Okay. I think, maybe if you wanna talk a little bit more about, a couple of a couple of thoughts come to mind, and then we'll we'll we'll turn it over for you. And then we had a couple more questions in the Q and A. But before we do, is some of the more challenging, I would say, engagements we've had in the last six months. I think it's worth touching on that there was some, there was, there was, there was some information, highly structured information that was legacy and so large. You couldn't open the file, so we had to write, you know, scripts and codes to programmatically analyze it. Can you just talk a little bit more, just a little bit about what it's, you know, what that looks like? How we solve those those challenges for clients, you know, at scale and, and, and kind of what that means, you know, in the future. Is it should folks be trying to gear up for that type of scenario, or you know, do you think that's a one off? I think we're gonna continue to see unique unique situations and different types of data because I think, in the last, you know, year, you know, things have moved. And I I sort of sort of said, you know, from an email, business email compromised to some of these enterprise systems. I think the enterprise systems are where the crown jewels are. And so threat actors are gonna be targeting these systems. And you don't know what they're gonna get They may have direct access to a database. They may have backup files that that you need to be able to, interrogate, extract, and these are where proprietary tools have different ways that they store data that's not consistent or industry standard, right? It's proprietary to the, to the system. So having, enterprise data architects enterprise, data engineers that understand you know, a wide variety of these different tools, to be able to have people that are proficient in Python and text manipulation languages like curl, extremely important, right? To be able to take data that is structured or semi structured, and to be able to extract entities out of them. I can tell you that none of the current platforms do that. So, you know, these are all outside of those platforms like relativity and and reveal. So you need to have data engineers and data analysts that that that you know, are are gonna dig into the data to be able to make sense of it and extract out data. So and that's why we have the bifurcation here. Right, inside of our workflow, traditionally discovery is the emails, the office files, the things that you get out of m m three sixty five, data analytics is looking at the semi structured and structured data where we're using tools that the data engineers and data analysts use, and making making them into consumable format so that our machine learning and AI models can can consume it and extract out or at least categorize, right? You said time to insights is extremely important. You know, I think what we strive for inside of our business is within forty two hours of processing the data that we're gonna have a full, categorization of the, you know, the, the data. You know, this is the who what and where type stuff. Right? What are the jurisdictions? How many names do you have? How many phone numbers? And Social Security numbers do you have? You know, within the first forty two, seventy two hours. You know, the data subject linking and and the extraction of the entities depending on how much data there is. It could take, a few days to two weeks. Right? We've seen, data sets come in that that are close to ten terabytes. And that is the challenge data growth within the enterprise is the biggest challenge to, to meeting these regulatory requirements. Yeah. I appreciate that. And I think one of the things that that we've been providing some thoughts and some insights to all of, frankly, all of the discovery platforms is is clustering of documents. You know, it's really important that, you know, for example, we had, you know, we got a question here about some use case and some case studies, but I give a, I'll give a good one where, you know, if you've got a file share that's being dumped out of an organization and it covers a counting and HR and IT and everything under the sun. You know, it'd be, you know, there there's within that there's a lot of documents that are, kind of categorically the same over and over and over. So, you know, w twos for employees, w fours, the employment records. I mean, there's large groups of documents that rather than just dump it into a platform, really taking the time to understand what you have, and then deciding, one, is that a category of documents I need to look at, period? Two, if, if it is, what's the best approach to analyzing those documents? What tool in the tool belt is the right one? That is, that is going to, one, get you quicker time to insights. Even though it sounds like we're taking more time upfront, and we are. We're going to, in many cases, we can carve out a large chunk of data that we, we, we agree with counsel and company that due to the nature of the document and the information included, we don't need to process that invoices from, from suppliers where they're not paying for where they're not, you know, where they're not associated with employee information or protected health information. So big chunks of data, even though it was exfiltraded, we can set them aside. We can make a reasonable argument and and have it be defensible. Certainly, of course, encrypted. Any dot, any data that was encrypted before the incident, and that we, you know, we believe was, you know, not even if it was exfiltrated, but it was encrypted in some form or fashion. There's an argument to be made there too, right? The encrypted documents, Fred actors probably didn't have the keys or the passwords. So we don't need to look at those either. So I think being really, you know, thoughtful about the approaches is important. So you've got a couple of good good and three minutes left here. I think I just I just gave an example about one of them, about case studies. And certainly, you know, Jay and my emails are available. You can reach us anytime. We can talk through not just case studies, but really the use, you know, the use cases. For these types of things, particularly with organizations. We do a lot of work, with the small, medium sized businesses. Right? Just the way that media is structured and how we where we fit within the market. We have a lot of, you know, SMB type clients that, you know, they have In many cases, they have, a lot of tools and licenses that, and third party contracts. That they're trying they're using to try to solve the cybersecurity problem. And not all of these, third party third party providers or tool sets work really well together. And there's, even in some cases redundancy, you know, the same different tools doing very similar things. And internally, they don't really have the right team to really review and tune and tweak, the different alerts I might be throwing might be, thrown from a know, intrusion detection system or, you know, and it's in response system. So give give us a call, reach out to us. We'll share lots of information about, you know, what we've seen and what we think might make sense particularly if it relates to your organization. I'll tell you in terms of, identifying and apprehending bad actors, one of the first things you should do if if you're in the United States is, when you've been, I could all defer to private the Council always, but generally getting the FBI involved is a really good idea, particularly if it's a ransomware case and, you know, ransom demands are being requested because, the FBI can, help at least we, we generally provide information to the FBI. And they can determine whether this is a known threat actor, whether they're going to follow through on their threats, whether they're going to not, you know, where they're going to decrypt your data do pay him, etcetera. So and they can help with, coordination across other law enforcement agencies. So, you know, generally rule some, have the FBI involved. And a lot of times, the FBI, in in SISSA, it's another part of the organization, the government. They're already looking at these threat actors. Right? Know, they're they they make me on the verge of taking them down kind of thing. So that's been a lot of that. And they work with some of the industry, leaders as well, like Microsoft and others to to really try to, you know, kind of minimize the impact of these threat actors. Tools and best practices on threat or intrusion when talking to a lawyer. So this is really about the risk, and governance. And I know that, you know, insurance is in here. So I think this will be the last question. I'll just say, you know, cyber insurance is a great thing, for, for a lot of organizations because it provides a stock gap. But there we've seen we've seen very often that there's not a a great deal of thought around the rightsizing of the policy to the risk. Frankly. And occasionally, well, often, you know, the the insurance companies, when when we're doing our investigations and the insurance is involved. You know, they're they have very specific views on what needs to be done versus what doesn't need to be done, what is remediation and betterment versus needs to be done to investigate, you know, all within the, what's covered by a policy and what's not. So that the, the insurance piece and the mitigation, the risk mitigation is, is a complex conversation, across the industry, whether your privacy council or, your insurance council or your, you know, the insurance provider or the broker. In our, our role, at least, you know, within, within Bdo is we, we really, we're there to get the facts, we're, we're the hairless. Quickest time to insight lowest cost a client that generally keeps everybody happy, but how much we, how much we do, you know, here versus there can vary depending on you know, who's who's at, who's got, you know, seated at the table. So and with that, I guess we're at time, and so I appreciate everybody's time and, in, in questions. And, certainly, if there's more as I questions that we didn't cover on this, this webinar reach out JRI, we'd be happy to hear from you and continue the conversation. And, with that, I will hand it back to ACEDS to close us out. Alright. Thank you everyone for joining us on the ACEDS webinar channel. Thank you to our partner BDO, and, of course, Brian and Jason for an amazing presentation today. Please visit aced.org for a complete list of our upcoming webinars. Have a great day, and we hope you can join us again next time. See you everybody.

Have Questions? Contact Us

FEATURED

BDO Talks Tariffs Webcast Series

BDO Talks Tariffs Webcast Series

Risk Management: Board Questions

Risk Management: Board Questions

9 Compliance Priority Shifts

9 Compliance Priority Shifts

FEATURED

Trends in Tokenization

Trends in Tokenization

Tax Planning for Data Centers

Tax Planning for Data Centers

Epic’s AI Charting

Epic’s AI Charting

FEATURED

Government Implications: 2026 State of the Union

Government Implications: 2026 State of the Union

Recovery Efforts After Natural Disaster

Recovery Efforts After Natural Disaster

Rules on Tax Deductions for Tips

Rules on Tax Deductions for Tips

FEATURED

BDO Named Greatest Workplaces for Entry Level

BDO Named Greatest Workplaces for Entry Level

BDO Recognized as ‘100 Best Companies to Work For’

BDO Recognized as ‘100 Best Companies to Work For’

BDO Collaborates with Microsoft to Pilot Frontier Flashpoint

BDO Collaborates with Microsoft to Pilot Frontier Flashpoint

Artificial Intelligence & Machine Learning in the Age of Ransomware

Deepfake Fraud: Prepare Your Organization with Tabletop Exercises and Red-Team Simulations

Deepfake Fraud: Prepare Your Organization with Tabletop Exercises and Red-Team Simulations

Putting People in the Center: AI’s Role in Healthcare

Putting People in the Center: AI’s Role in Healthcare

BDO's Legal Tech Talk Podcast - Season 4, Episode 7: AI, Accountability, and the Future of Discovery

BDO's Legal Tech Talk Podcast - Season 4, Episode 7: AI, Accountability, and the Future of Discovery

Practice Is the New Proof: The Evolution of Resilience Exercising

Practice Is the New Proof: The Evolution of Resilience Exercising

BDO Talks Tariffs Webcast Series

BDO Talks Tariffs Webcast Series

Risk Management: Board Questions

Risk Management: Board Questions

9 Compliance Priority Shifts

9 Compliance Priority Shifts

Trends in Tokenization

Trends in Tokenization

Tax Planning for Data Centers

Tax Planning for Data Centers

Epic’s AI Charting

Epic’s AI Charting

Government Implications: 2026 State of the Union

Government Implications: 2026 State of the Union

Recovery Efforts After Natural Disaster

Recovery Efforts After Natural Disaster

Rules on Tax Deductions for Tips

Rules on Tax Deductions for Tips

BDO Named Greatest Workplaces for Entry Level

BDO Named Greatest Workplaces for Entry Level

BDO Recognized as ‘100 Best Companies to Work For’

BDO Recognized as ‘100 Best Companies to Work For’

BDO Collaborates with Microsoft to Pilot Frontier Flashpoint

BDO Collaborates with Microsoft to Pilot Frontier Flashpoint

Artificial Intelligence & Machine Learning in the Age of Ransomware

Related Resources

Deepfake Fraud: Prepare Your Organization with Tabletop Exercises and Red-Team Simulations

Deepfake Fraud: Prepare Your Organization with Tabletop Exercises and Red-Team Simulations

Putting People in the Center: AI’s Role in Healthcare

Putting People in the Center: AI’s Role in Healthcare

BDO's Legal Tech Talk Podcast - Season 4, Episode 7: AI, Accountability, and the Future of Discovery

BDO's Legal Tech Talk Podcast - Season 4, Episode 7: AI, Accountability, and the Future of Discovery

Practice Is the New Proof: The Evolution of Resilience Exercising

Practice Is the New Proof: The Evolution of Resilience Exercising