TypeScript: Typechecker-Assisted Runtime Data Validation
A neat typing trick to have the typechecker tell you where you need to validate your untrusted user data.
Background
I was recently experimenting with Socket.IO while building out the tada! demo, and came across an interesting challenge:
The Server handles untrusted events/input from the Clients, and the Socket.IO TypeScript type documentation–excluding an explicit warning in the docs—suggest a developer might unwittingly assume that the data is validated. Afterall, the code compiles, typechecks, and autocompletes!
However, TypeScript itself does NOT validate data at runtime. TypeScript (and its types) aren't even around at runtime. This is not unique to Socket.IO; other projects that deal with untrusted user input like Remix have similar warnings.
Knowing this, I wanted to see if I could write the TypeScript types such that the typechecker would force me to call a validation function on the untrusted input before using the data as an assumed type.
In other words, I wanted TypeScript to tell me where I needed to put runtime checks. Formally, I wanted to implement static taint-checking.
(👀 Spoiler, it worked! Scroll to the end if you just want to see the solution.)
The Goal
In code, given a Server-side event handler that calls some handler with untrusted user input:
-
💥 I want the typechecker to prevent me from assuming this is a
string
. I want a compile time error -
👍 I want the typechecker to tell me to call validate BEFORE I use
untrusted
so I can use it safely asnumber[]
🎁 I want the compiler to tell me which validation function to use, or—better yet—keep the validation function in sync with the event name and the data that
untrusted
is implied to contain. -
😁 Once validated, I don't want to think about the validation any longer
Slightly more formally, I want on
's type signature to be something along the lines of:
and given some variable untrusted: Untrusted<EventPayload>
I wanted TS to prevent me from using untrusted
as EventPayload
without first calling validate
whose signature would roughly be:
Fun! Let's see what we can do!
Starting Point
Taking a step back, let's look at what Socket.IO gives us out of the box with its Server type definitions:
(👀 Be sure to read the comments in the code!)
The above code compiles and typechecks. Good! But lines 18 and 23 use untrusted data and can error on runtime since the code does not validate numbers
or username
. 😭
The Solution
After much reading TS docs and playing on the TypeScript Playground, I came up with the following:
Untrusted<T>
The key bit is we define and use a type—known as a (pseudo) Nominal Type (TS Doc Examples and Definition, Wikipedia Definition):
This type is pretty neat! Given a function whose TypeScript signature is:
and a variable untrusted
of type Untrusted<string>
, the compiler will tell us not to do:
Even if there's actually a string in there, the compiler demands we narrow the type before using it!
This is excellent news!
If you look at the official TS Playground on Example on Nominal Types, you'll see a really similar, type:
However, it's different from what we have: notice the |
vs. &
.
- Our Code (
|
a.k.a union type): We want to ensure anUntrusted
cannot be used as astring
parameter without first narrowingUntrusted
->string
. - TS Playground (
&
a.k.a. intersection type): It's the other direction: ensure astring
cannot be used without first transformingstring
->ValidatedInputString
.
If we had a variable unvalidated
of type ValidatedInputString
, it could be used in our save
function above, but this defeats what we want to do!
How Does It Work?
type V = A | B
is a union type, and from the TS Handbook: a union type is a type formed from two or more other types, representing values that may be ANY one of those types.
So, let's say we have a variable v
that is type Array | number
, and we want to push an element onto the array like v.push('foo')
. Since v
can be any of Array
or number
without further type narrowing, until there is an implicit type narrow (e.g. Array.isArray(v)
) or an explicit type assertion (e.g. v as Array
), the typechecker will compile-time error if I call v.push('foo')
since it could be of type number
.
The type this article uses:
takes advantage of this fact. The __brand
bit does not exist at runtime. It's solely a convention and serves to force type narrowing to T
. In the next section, we'll define a function that does the type narrowing (analagous to Array.isArray
above).
validateEvent
Now that we have Untrusted<T>
that prevents us from using a variable without first narrowing it to T
, let's define a function that does that. It will actually perform runtime validation, too. We're bridging the type- and run-time systems. For now, we'll just create a specific validation function for a string
:
Cool! It can be used like:
If you immediately try to access result.data
without doing anything else, you'll get a compiler error.
The compiler will force you to narrow the type first and check success
before using it:
Validation/Schema Map and zod
Writing validation logic is cumbersome, and we also want to programmatically create our validators and event typing map, so the next piece of the puzzle looks like this:
We define an object (note: not a type), that maps the name of each of our events, and then a zod
Schema. zod
provides two really helpful things:
- Runtime Typechecking -
z.array(z.number()).parse(data)
will either validate and return anumber[]
that we can trust, or throw an error. Or, in our case we'll usesafeParse
, to return aValidationResult
so we can communicate errors to the users if needed (or silently ignore input). - TypeScript Utilities - Given a
zod
schema likeconst schema = z.array(z.number())
we can writez.infer<typeof schema>
and it will give the TypeScript type of the valid result (number[]
).
With this map, we can make our validation function more generic:
data
's type is dependendent on the EventName
type parameter, so this also prevents validators from being used in an unexpected context. For instance, if I have a variable untrustedStr
of type Untrusted<string>
, the compiler yell if I write: validateEvent("Sum", untrustedStr)
.
Constructing the Socket.IO-Compatible Type
The original Socket.IO type takes a different form, but we can generate it from our map:
As we update the EventNameToValidator
, this type will reflect the changes! This coupling is great!
Tying It Together
Finally, we can go to use all these pieces together:
Notice how all the type's are derived from EventNameToValidator
. This ensures all our code, types, and validation logic stay in sync.
If you change Greet
to take a number
, the compiler will assist in telling you what code you need to change post-validation in the Greet
handler.
Future Directions
This was fun, and although I'm happy I got the compiler telling me where to call validate, it's a bit cumbersome to have each handler do:
We can likley register some middleware to do this for us, or some wrapper so this is done automatically for us, and we can safely write the natural, unwrapped type and trust the middleware has already runtime checked it.
In other words, we just write our code more or less like the example we started with!
That being said, I think the learning and insights here are really useful!
Addendum: unknown
You may be wondering, why not just use unknown
? Or unknown | T
instead of our fancy Nominal type? (I tried that first.)
unknown
by itself is too vague. If everything isunknown
, each event handlers validation function could be used in place of one another from a compiler standpoint, and while this would be runtime safe, it can mean things can get out of sync.unknown | T
simplifies to justunknown
andunknown & T
simplifies toT
(see here).