TypeScript: Typechecker-Assisted Runtime Data Validation

Background

I was recently experimenting with Socket.IO while building out the tada! demo, and came across an interesting challenge:

The Server handles untrusted events/input from the Clients, and the Socket.IO TypeScript type documentation–excluding an explicit warning in the docs—suggest a developer might unwittingly assume that the data is validated. Afterall, the code compiles, typechecks, and autocompletes!

However, TypeScript itself does NOT validate data at runtime. TypeScript (and its types) aren't even around at runtime. This is not unique to Socket.IO; other projects that deal with untrusted user input like Remix have similar warnings.

Knowing this, I wanted to see if I could write the TypeScript types such that the typechecker would force me to call a validation function on the untrusted input before using the data as an assumed type.

In other words, I wanted TypeScript to tell me where I needed to put runtime checks. Formally, I wanted to implement static taint-checking.

(👀 Spoiler, it worked! Scroll to the end if you just want to see the solution.)

The Goal

In code, given a Server-side event handler that calls some handler with untrusted user input:

socket.on("Sum", (untrusted) => {
  untrusted.toUpperCase(); // 💥
  const validated = validate("Sum", untrusted); // 👍 / 🎁
  const sum = validated.reduce((c, acc) => c + acc, 0); // 😁
});

💥 I want the typechecker to prevent me from assuming this is a string. I want a compile time error
👍 I want the typechecker to tell me to call validate BEFORE I use untrusted so I can use it safely as number[]

🎁 I want the compiler to tell me which validation function to use, or—better yet—keep the validation function in sync with the event name and the data that untrusted is implied to contain.
😁 Once validated, I don't want to think about the validation any longer

Slightly more formally, I want on's type signature to be something along the lines of:

on<EventName, EventPayload>(name: EventName, (Untrusted<EventPayload>) => void): void;

and given some variable untrusted: Untrusted<EventPayload> I wanted TS to prevent me from using untrusted as EventPayload without first calling validate whose signature would roughly be:

validate<T>(eventName: EventName, untrusted: Untrusted<T>): T;

Fun! Let's see what we can do!

Starting Point

Taking a step back, let's look at what Socket.IO gives us out of the box with its Server type definitions:

server.ts

import { Server } from "socket.io";
 
// 1. Define an interface that maps the EventName to the assumed handler function type:
interface ClientToServerEvents {
  "Sum": (numbers: number[]) => void;
  "SayHello": (name: string) => void;
}
 
// 2. When creating the server, be sure to supply a value to the ClientToServerEvents type parameter:
const server = new Server<ClientToServerEvents>();
 
server.on("connect", (socket) => {
  // 3. This then means, the type checker (using the Socket.IO typedefs and the supplied type param to Server)
  //    (A) knows "Sum" | "SayHello" are valid event names
  //    (B) given one event, in this case "Sum", the typechecker knows handler's argument types (`numbers` as `number[]` in this case)
  socket.on("Sum", (numbers) => {
    // numbers.toUpperCase(); // 😁 The typechecker helpfully tells me not to do this: numbers is not a `string`
    const sum = numbers.reduce((c, acc) => c+acc, 0); // 💥 / 😁 Compiler is fine with this, BUT at runtime, the user could have sent us `null` or anything else for numbers and this would produce a runtime exception
    // …
  });
 
  socket.on("SayHello", (username) => {
    const formatted = username.toUpperCase(); // 💥 / 😁 Compiler is fine with this, BUT at runtime, the user could have sent us `null` or anything else
    // …
  });
 
  // Compiler will tell us to delete this line since "ImNotAValidEvent" is not in our event map
  socket.on("ImNotAValidEvent", () => { // … })
});

(👀 Be sure to read the comments in the code!)

The above code compiles and typechecks. Good! But lines 18 and 23 use untrusted data and can error on runtime since the code does not validate numbers or username. 😭

The Solution

After much reading TS docs and playing on the TypeScript Playground, I came up with the following:

Untrusted<T>

The key bit is we define and use a type—known as a (pseudo) Nominal Type (TS Doc Examples and Definition, Wikipedia Definition):

type Untrusted<T> = T | { __brand: "UNTRUSTED" };

This type is pretty neat! Given a function whose TypeScript signature is:

function save(data: string): void;

and a variable untrusted of type Untrusted<string>, the compiler will tell us not to do:

save(untrusted);
// Argument of type 'Untrusted<string>' is not assignable to parameter of type 'string'. ////
//    Type '{ __brand: "UNTRUSTED"; }' is not assignable to type 'string'

Even if there's actually a string in there, the compiler demands we narrow the type before using it!

This is excellent news!

If you look at the official TS Playground on Example on Nominal Types, you'll see a really similar, type:

type ValidatedInputString = string & { __brand: "User Input Post Validation" };

However, it's different from what we have: notice the | vs. &.

Our Code (| a.k.a union type): We want to ensure an Untrusted cannot be used as a string parameter without first narrowing Untrusted -> string.
TS Playground (& a.k.a. intersection type): It's the other direction: ensure a string cannot be used without first transforming string -> ValidatedInputString.

If we had a variable unvalidated of type ValidatedInputString, it could be used in our save function above, but this defeats what we want to do!

How Does It Work?

type V = A | B is a union type, and from the TS Handbook: a union type is a type formed from two or more other types, representing values that may be ANY one of those types.

So, let's say we have a variable v that is type Array | number, and we want to push an element onto the array like v.push('foo'). Since v can be any of Array or number without further type narrowing, until there is an implicit type narrow (e.g. Array.isArray(v)) or an explicit type assertion (e.g. v as Array), the typechecker will compile-time error if I call v.push('foo') since it could be of type number.

The type this article uses:

type Untrusted<T> = T | { __brand: "UNTRUSTED" };

takes advantage of this fact. The __brand bit does not exist at runtime. It's solely a convention and serves to force type narrowing to T. In the next section, we'll define a function that does the type narrowing (analagous to Array.isArray above).

`validateEvent`

Now that we have Untrusted<T> that prevents us from using a variable without first narrowing it to T, let's define a function that does that. It will actually perform runtime validation, too. We're bridging the type- and run-time systems. For now, we'll just create a specific validation function for a string:

type ValidationResult =
  | { success: true; data: string }
  | { success: false; error: Error };
 
function validateString(untrusted: Untrusted<string>): ValdationResult {
  if (typeof untrusted === "string") return { success: true, data: untrusted };
  return { success: false, error: new Error("not a string :(") };
}

Cool! It can be used like:

const result = validateString(untrusted);

If you immediately try to access result.data without doing anything else, you'll get a compiler error.

The compiler will force you to narrow the type first and check success before using it:

const result = validateString(untrusted);
if (result.success) {
  // compiler now knows you've narrowed the type and can access .data!
  console.log(result.data);
}

Validation/Schema Map and `zod`

Writing validation logic is cumbersome, and we also want to programmatically create our validators and event typing map, so the next piece of the puzzle looks like this:

import z, { ZodError } from "zod";
 
const EventNameToValidator = {
  Sum: z.array(z.number()),
  Greet: z.string(),
};

We define an object (note: not a type), that maps the name of each of our events, and then a zod Schema. zod provides two really helpful things:

Runtime Typechecking - z.array(z.number()).parse(data) will either validate and return a number[] that we can trust, or throw an error. Or, in our case we'll use safeParse, to return a ValidationResult so we can communicate errors to the users if needed (or silently ignore input).
TypeScript Utilities - Given a zod schema like const schema = z.array(z.number()) we can write z.infer<typeof schema> and it will give the TypeScript type of the valid result (number[]).

With this map, we can make our validation function more generic:

function validateEvent<EventName extends keyof typeof EventNameToValidator>(
  eventName: EventName,
  data: Untrusted<z.infer<typeof EventNameToValidator[EventName]>>
): ValidationResult<z.infer<typeof EventNameToValidator[EventName]>> {
  return EventNameToValidator[eventName].safeParse(data);
}
 
type ValidationResult<T> =
  | { success: true; data: T }
  | { success: false; error: Error };

data's type is dependendent on the EventName type parameter, so this also prevents validators from being used in an unexpected context. For instance, if I have a variable untrustedStr of type Untrusted<string>, the compiler yell if I write: validateEvent("Sum", untrustedStr).

Constructing the Socket.IO-Compatible Type

The original Socket.IO type takes a different form, but we can generate it from our map:

type ClientToServerEvents = {
  [EventName in keyof typeof EventNameToValidator]: (
    arg: Untrusted<z.infer<typeof EventNameToValidator[EventName]>>
  ) => void;
};

As we update the EventNameToValidator, this type will reflect the changes! This coupling is great!

Tying It Together

Finally, we can go to use all these pieces together:

import { Server } from "socket.io";
import z, { ZodError } from "zod";
 
const EventNameToValidator = {
  Sum: z.array(z.number()),
  Greet: z.string(),
};
 
type Untrusted<T> = T | { __brand: "UNTRUSTED" };
 
type ClientToServerEvents = {
  [EventName in keyof typeof EventNameToValidator]: (
    arg: Untrusted<z.infer<typeof EventNameToValidator[EventName]>>
  ) => void;
};
 
function validateEvent<EventName extends keyof typeof EventNameToValidator>(
  eventName: EventName,
  data: Untrusted<z.infer<typeof EventNameToValidator[EventName]>>
): ValidationResult<z.infer<typeof EventNameToValidator[EventName]>> {
  return EventNameToValidator[eventName].safeParse(data);
}
 
type ValidationResult<T> =
  | { success: true; data: T }
  | { success: false; error: Error };
 
const server = new Server<ClientToServerEvents>();
 
server.on("connect", (socket) => {
  socket.on("Sum", (untrusted) => {
    // validateEvent("Greet", untrusted); // this gives compile error as it should. `untrusted` here is not just some untrusted data, it's associated with `"Sum"`
    const res = validateEvent("Sum", untrusted); // runtime check!
 
    // res.data; // want compile error, get compile error, remove before runtime
    // res.data.reduce(…) // compiler will tell us not do this. yay!
    if (!res.success) return;
    const validated = res.data;
    const sum = validated.reduce((c, acc) => c + acc, 0); // this is SUPER safe now!
  });
  // …
});

Notice how all the type's are derived from EventNameToValidator. This ensures all our code, types, and validation logic stay in sync.

If you change Greet to take a number, the compiler will assist in telling you what code you need to change post-validation in the Greet handler.

Future Directions

This was fun, and although I'm happy I got the compiler telling me where to call validate, it's a bit cumbersome to have each handler do:

const result = validate("Greet", untrusted);
if (!result.success) {
  // …
  return;
}
// …

We can likley register some middleware to do this for us, or some wrapper so this is done automatically for us, and we can safely write the natural, unwrapped type and trust the middleware has already runtime checked it.

In other words, we just write our code more or less like the example we started with!

That being said, I think the learning and insights here are really useful!

Addendum: `unknown`

You may be wondering, why not just use unknown? Or unknown | T instead of our fancy Nominal type? (I tried that first.)

unknown by itself is too vague. If everything is unknown, each event handlers validation function could be used in place of one another from a compiler standpoint, and while this would be runtime safe, it can mean things can get out of sync.
unknown | T simplifies to just unknown and unknown & T simplifies to T (see here).

🏠 Home