Data Challenges in the Defense Sector

Episode Summary: This week, we’re going to be talking about the defense sector. We interview Ryan Welsh, CEO of Kyndi, a company working on explainable AI. We focus specifically on the unique data challenges of the defense industry, as well as the general use case of AI in defense writ large. Many of the challenges that the defense sector has to deal with transfer to other spaces and sectors. Business leaders that deal with extremely disjointed text information, what is sometimes called “dark data,” and information in various languages or different dialects, will be able to resonate with some of the unique challenges talked about in this episode, and maybe even gain some insights for how to handle them.

Subscribe to our AI in Industry Podcast with your favorite podcast service:

Guest: Ryan Welsh, Founder and CEO – Kyndi

Expertise: economics

Brief Recognition: Welsh holds a Master’s in Applied Mathematics – Economics from Rutgers University and a Master’s in Business Administration from the University of Notre Dame.

Interview Highlights

(03:00) What’s unique about defense in terms of considerations for bringing AI?

RW: I would say the most challenging things in defense is the data that you’re working on. It is often extremely dirty data, fragments of text. Unfortunately, the people that you’re getting it from don’t label it for you before you get it from them. So that’s also a challenge. So I think the biggest thing in defense is the actual data that you’re doing the operations on.

And of course, when you’re applying AI and machine learning techniques, it really does come down to “do you have to label data? Is the data clean?” Ultimately, that’s going to result in the value of the system that you’re using. So, the biggest challenge is the dirtiness of the data.

(04:00) What makes data extra challenging?

RW: Oftentimes, you’ll do a collection on the battlefield. So, for example, if you take a town, you’re going to collect hard drives, collect receipts, collect all that stuff out in the field. And you’re going to want to do analytics on that. You want to give it to analysts and see what folks are doing. I mean, this isn’t anything new. They’ve done military throughout history. It’s collecting information from other military combatants.

So, when you collect those snippets, they may be all torn up because he threw a grenade in there [crosstalk 00:04:51] piece together all that information. So now that’s very, very dirty data versus very clean data that you wrote internally by college graduates.

We’re not orchestrating it. We’re ripping it out of the world in whatever gruesome format it’s in. So it could be in, you know, in Arabic, stored somewhere in some bunker. And now we have to figure out is there a plan in here that has to do with attacking these two forts, where we think that there might be some, you know, preemptive strikes coming up or something like that?

I think the best way to handle it is to go down to the character level within words. So actually get down to the characters, like down to the letters within words. If you’re building a knowledge graph type network where you’re doing so reasoning over it and things like that, you may just be chunking words, but you actually need to go down a level of granularity down to the individual characters and start to draw connections between all the different characters to words. Words to sentences. Sentences to other sentences. And specifically local dialects as well and that really becomes a challenge. Like if you take an NLP parser from Stanford NLP for Arabic, that’s going to be trained on your standard Arabic versus local dialect. So it’ll fall over.

You have to do it in the field. So the reason why I bring that up is because one of the big thesis behind why AI is working today is the amount of data that you have in the math computational resources. Well, how much computational resources can you put out at the edge? So you now need efficient algorithms that work on limited resources out in the field on dirty data. So these are all the types of challenges that come with using AI and other machine learning techniques in defense.

(11:00) Who cares about this? Is it the wacky innovation arms? Or are there other elements of defense that are starting to wake up and say “Can we use this?”

RW: Yeah, what I think most people miss about defense and intelligence is that they bundle it in the government sector. And they think it’s red tape, slow bureaucracy. Candidly, the folks in defense and intelligence are extremely discerning, analytic customers. They’ve been doing it for forever. That is their job to get intelligence out of data. So who is interested in AI in the defense sector? Every department, and they have been for a very long time. This isn’t anything new to them. And they’ve always had the data and they’ve always had the computational resources to make these techniques effective. So they’re very discerning customers.

We need to start thinking about AI as a feature, not the product, right? Where as a lot of folks think about it as the product. It’s just a feature of a product that delivers value to a customer. So we need to change our way of thinking and stop saying “I’m selling you artificial intelligence”, and rather say “I’m selling you a solution to your problem.”

Take something as simple as enterprise search. A lot of folks are using Elastic Search. And it’s a phenomenal search engine for terms. And then you take an AI based search like what you see being published on the Stanford question answering data set, and no one would confuse Elastic term based search with these kind of advanced question answering systems.

I think we need to kind of move something as simple as that and saying “Hey, I now have AI within my enterprise search capability. I can now get you better information much faster than you could with a term based search.” And something as simple as that, you can go into any executive and say “I have an enterprise search capability for you. AI is a part of this. And because of that, your people are going to save two hours every day.”

It’s something so simple that I feel like with AI, people kind of go towards this kind of bigger picture automating the way, you know, giant groups of people, as opposed to hey, no just make an enterprise search better, and save an hour of everyone’s day.

(16:00) In terms of the cutting edge innovation, do we expect like the real bleeding edge adoption of whether it’s search, vision, sensors, whatever, to be happening sort of within the DOD in some sense, or within defense kind of in the public sense?

RW: So from my experience, the larger SIs are really good at taking techniques that have been proven to work on a class of problems and then doing them at scale. And companies like mine kind of push the limits of what’s possible with AI. And there are groups within the defense industry that are willing to work directly with us to test things out. Then either we scale.

So take a company like Palantir, they are able to scale with the size of the contracts that they were getting. Someone like us, we’re still figuring that out. Do we have a completion partner that we work with? Like a larger SI? Or do we bring on, or build out rather, that services component like Palantir did?

The one thing that the government has done really well, and I would say probably since Palantir, so I guess Palantir was founded about 2005, the government has done a phenomenal job of now engaging directly with startups. They’ve done a phenomenal job to meet directly with people in Silicon Valley.

I mean, every time I need someone in DC at some conference or anything, typically they’ll say “I’ll be in Silicon Valley within the next 30 days. So we should meet at your office.” I think the government has done a phenomenal job since 2005.

They’ve got like these arms for this, too, right? Like DIU, which used to be DIUX is out here and they got one in Austin. I think one of Boston. And so, yeah, I heard that a National Defense University that there’s this attempt to basically to keep up with the times. I think they realized that it’s not going to be something you want to link to trickle. Like we have to jump because the other countries are jumping. So it sounds like there actually is an interface between the startup world and government, maybe better than that?

Well, they’ve adopted or adapted rather because if you think about the history of innovation maybe prior to the ’90s, all of it was coming out of large institutions, whether it’s Bell Labs, DARPA, these big institutions. Then all sudden, they started to see diffusion across like smaller startup can build phenomenal stuff because it’s pretty cheap to write code, right?

And they had to adapt to that. They could no longer just talk to five organizations and get all the innovation in the United States. They now have to talk to 5,000. And I think that was a bit of a challenge from them probably from mid ’80s to early ’90s all the way up until 2005, but then they’ve recognized that and changed really quickly from 2005 to today.

(21:00) What might we estimate to be lower hanging fruit than other things in terms of AI norms in the future?

RW: Yeah, everything that’s built on top of your classic enterprise data, your classic structured enterprise data, all of it will have machine learning in it. Take every application of the last 25 years, add ML. Now you got predictive capabilities, right? I mean, that’s effectively what is going on. But really what’s going to significantly move the needle and totally transform everything is AI on the unstructured data, on the text data, on the human communications. That’s a really tough nut to crack.

Some folks are starting to address that, like us focusing on unstructured text data. And that is going to, I think, fundamentally transform. So I think there’s three legs to the stool for enterprises. Your classic structured data. You have your machine-generated data, so you’re like IoT platforms and other things. Then you have this unstructured text data. And they’re kind of three legs to the stool.

And we’ve solved the analytics for storage analytics of your IoT data and your structured data, we still haven’t solved the storage, and ultimately analytics on unstructured text data. And that’s really because machines don’t understand language very well.

Subscribe to our AI in Industry Podcast with your favorite podcast service: