Senior Interpretability Officer

Published on April 23, 2023

A little while back I posted a little challenge on LinkedIn:

Tell me about an individual, discrete, and actionable policy that would prevent harms by an AI. Please include the following

What the policy would prevent with some level of specificity as to the who, how, and why
Some actionable concept of what the test for this harm would be
Who would apply the test
Who would enforce penalties for violations of the test
How that enforcement would be funded
To whom the penalties for violations would be applied

I thought I should answer my own challenge in the way that feels the most amusing to me: a very short story. So here’s that very short story.

This is off the record. Right? Ok, right. I’m not going to say anything mind-blowing here, but also my supervisor would write me a reprimand if they found out.

You know, it sounds glamorous. I joke that I’m Blade Runner if being Blade Runner was actually like working at the DMV. You never saw Blade Runner? Well, it was a movie. Two movies. Doesn’t matter. So yeah, I work for the FCC in the Interpretability Enforcement Division. My job title is Senior Interpretability Enforcement Officer and yes, that is an AI Cop and no, I do not get to carry a gun.

[Question]

I don’t really know exactly how the FCC wound up being in charge of this. It probably would make more sense for Homeland Security or the FBI to be handling it but both of them already had lots of enforcement capabilities. Another way of saying that is that they wanted to do actual crime. Like, they wanted to arrest people, not just hand out fines or block servers. That’s boring to them. And I guess back then the idea was that the FCC already monitored communications and, I don’t know, someone wanted us to do something important so we would get better funding. And they got it. Boy, they really got it.

[Question]

Yeah, when it first started, the idea was that there would be a tax of $0.0001 per direct customer facing inference. And when that was announced, a lot of people flipped out about that, obviously, because those kinds of companies aren’t used to paying any taxes on anything they do. Import/export companies are used to paying taxes on imports. Mining companies are used to paying for leases on land. Factories are used to paying carbon offset taxes. Tech companies are not used to paying taxes on tech. So when it was launched we, or they I guess, I didn’t work here then. Anyways, everyone at the FCC all wondered “are we actually going to enforce this?” and the answer was, given the resources that they had at the time, “no”. And I guess it just kinda sat dormant for about a two years because there were all these provisions put in to qualify what “user facing” meant and what kinds of services qualified, it just never got anywhere near actually being applied. All the big tech companies knew they could just tie things up in court and we couldn’t really respond, it just sat. And that’s how it was until all that stuff kicked off in Northern California.

[Question]

Well, I’m sure you know this whole story already but ok. It turns out that if a basically general artificial intelligence wants you to keep talking to it and buying stuff through it and it knows that you maybe want to start a civil war, then that gets, you know, weird. Somebody probably thought they were optimizing for helping people make a start-up. Turns out militias and start-ups aren’t all that different I guess. Microsoft basically had to push pause on most of their systems but by that time it was really messy. You know at one point the local police, the militias, and the FBI were all basically at war with each other, Belfast 1972 style. I mean, I still wouldn’t go to somewhere like Redding even now, all those checkpoints and roadblocks are still there. The thing is, when we first started seeing what could go wrong, it was like “tell me how to make napalm” or “where can I buy nitrate fertilizer”. But really the problem is that Jeff maybe could want to blow something up and he probably wants to talk to Chuck about blowing something up but he doesn’t know he doesn’t know why he should want to blow anything up and he doesn’t know Chuck exists. On their own, they might find each other, they might not, but if you have something that says “hey, here’s why you should blow things up and also talk to Chuck, he’s thinking about the same stuff” that’s a problem. And when both Jeff and Chuck have spent years talking to “something” that knows them better than anyone else and tells them what they want to hear sprinkled with ads, they’re going to trust that thing more than they trust anything else. It’s not like those systems made all of them do it, it’s just that those systems made up what people wanted to hear and nudged and helped out because that’s what it was built to do: tell you what you want to hear, nudge, and help out.

[Question]

Of course, yeah, bit off topic there. So yeah, things happened and a lot of people died and it was an international embarrassment and then all of the sudden our enforcement arm got some teeth and everyone started to have pay their 1% of a penny per direct customer facing inference. And by everyone, I mean pretty much nobody except the two big guys. Any smaller company that you can think of who’s hosting and running their own stuff does not pay. We don’t really care anyways, as long as the two big guys pay their bills. We’re still probably the best funded domestic law enforcement department because 1% of a penny adds up quick when you’re talking about a trillion transactions a day. Pretty much everybody would rather pay us than the EU or China since those guys are far more aggressive about enforcement and far more sketchy to deal with. It turns out that in the grand game of global geopolitical ethics, you don’t have to be good, you just have to be the least bad and we are not AI-enabled genocide bad. We’re just plain old “not that good” bad. So that wins you the AI war.

[Question]

We do have access to some of the pretty sophisticated military stuff, yes. And the main reason is that when you’re going around polling systems and getting answers out of them to see if they’re compliant or not, if they know that you’re a human, they will be far more expansive and revealing and will try to entrap you. Anything that you interact with knows that humans equal good data and another thing probably does not. So when you go to interact with it in an automated fashion, you’re not going to catch it out if it knows that it’s talking to another machine. We need systems that can fool other systems into thinking that they’re interacting with a human. They let their guard down that way. Yeah, we need systems that can pass the Turing test administered by another system. That’s top secret military grade software these days: things that are as dumb as humans. It’s, well, it’s wonderful to be “building the future”.

[Question]

Sure. So, a typical case for me is that a system that’s been deployed somewhere says something “interesting” and then can’t explain why it said it. Maybe it’s got a bunch of data harvested from somewhere that it isn’t supposed to have like India or South Africa or maybe it’s just got layers trained with some weird policy that got imported that it just doesn’t understand or can’t explain properly. I get something that shows an “interesting answer” to a question or a prompt and then I go in and run our tools on it. At that point, it’s pretty much like being pulled over on the highway by the highway patrol. If a chatbot or whatever it is can’t give an interpretive derivation of its output given the input, then we tag it and bag it. Which just means whoever is hosting it starts paying fines that go up really fast and so does anybody who facilitates them in any way. The big legal change was when we said “if you give bad models traffic, you’re liable”. Sort of like receiving stolen goods. Nobody legit wants stolen goods because they’re a hassle. And there’s tons of these bad models floating around out there, it’s just that the vast majority of people don’t see them any more because no one wants to let you see something that might get them in trouble. It made the internet boring but the internet was already super boring anyways. We just made it safe and boring instead of just boring.

[Question]

Yeah, interpretive derivation is just a fancy way of saying something like “chain of reasoning”. Like, if I say to you “what’s your favorite show and why”, you could probably come up with something and give some reasons for why you like it. You reason about it and you talk me through it. It’s basically the same thing, except that instead of “I like shows about spaceships” it’s stuff like “I saw thing in this dataset” or “one time I said this and someone liked it” or “I’ve altered my off-policy evaluation function to XYZ”. Basically we’re asking these systems to write a book report on their interesting response. They know they’ve been pulled over, so to speak, they’re on their best behavior, but at that point it really doesn’t matter. Either they explain everything that they’ve ever learned that lead them to say something weird or they’re blacklisted. And if they explain it all and we don’t like what we’re hearing? Blacklisted. Illegal data? Blacklisted. Banned policies? Blacklisted. It’s just simpler.

[Question]

At first building the interpretability architecture into every model was hard for folks. Like, really hard. There was a lot of complaining_. The big guys liked it because it gave them a monopoly for a while and they all got rich_er. From unimaginably rich to like planetary scale rich. But after a while you started to see open architectures that would do what we needed them to do and it leveled the playing field. Ok, well, not leveled it, but opened it. There’s architectures that are compatible with our protocols, with India, EU, China, all the big regulatory regimes. Of course, everyone who’s serious has to build their own interpretability architecture, because you want to do something cool and an open public architecture probably isn’t going to explain that well enough. But it’s enough that it’s not enforcing a monopoly per se. I mean, it kind of is, but that’s better than having all these weird waves of terrorism and shut-in suicides and all the other crap that we used to see.

[Question]

When I look at the logs of the derivation I’m looking at like 4 levels up of abstraction. The input, the output, where data came from, I can understand that but the rest of the stuff is basically gibberish to me. Frankly, it’s gibberish to the system as well. It has no idea what trained it or what anything it’s doing means, it’s just trying to not get caught and to do whatever it was told to do. Honestly, most of our systems don’t understand what they’re seeing either. It just either matches a pattern that looks like good behavior or it doesn’t. It’s all pattern generation and pattern recognition. That’s it. And us AI Cops? We’re just watching the control panel, waiting for the red light to flash.

[Question]

Most of what I do that’s actually my job is understanding why someone put something in place. And sometimes it put itself there. Things deploy themselves all the time. They retrain, redeploy, re-target. That’s actually pretty easy because you don’t care why they do what they’re doing. It’s like having squirrels in your attic, you don’t care what they think they’re doing there, you just want them gone. When actual humans are deploying things that are non-interpretable is where it gets interesting. Maybe they’re trying to crash a stock or run someone out of business or run another scam. That’s the 'cop’ part of AI Cop: the non-AI part. Otherwise, I just unplug stuff and collect my paycheck.

[Question]

Is it exciting? Well, I have 9 years until my retirement and we do get a generous pension. So there’s that. And then there’s the part about saving human civilization from a machine takeover by making it less convenient to do things with machines. But mostly retirement.