[Jake Miller] of
Bishop Fox Labs wrote a great intro to a subject I’ve never considered: odd JSON constructions, and how different implementations handle them. An example will help.
obj = {"test": 1, "test": 2} |
So what’s the value of obj["test"]? It’s complicated. Some JSON parsers will choose the first definition of a key, while others choose the last. Still others will throw an error in response. What makes this a particularly serious problem is that the same data may be parsed by different implementations in a single transaction. The example given in the post is of an online store, where the payment processing is handled by a third party.
The attack works by manipulating the JSON object sent by the browser, injecting a second value definition for the quantity of items purchased. The store itself sees the higher value, which determines the actual items shipped. The payment backend uses a different JSON parser, which sees the smaller value. The backend actually handles payment processing, so the amount charged is that of the smaller quantity.
The article goes on to describe issues with invalid unicode embedded in JSON and valid keypairs that have been /*commented out*/, and what happens when you re-serialize this quirky data. Another interesting edge case is the handling of very large numbers, where some parsers return 0, others return a null, and some an approximation in scientific notation.
All told, JSON deserialization is a mess. There’s sure to be many hard-to-spot bugs in web applications that use multiple parsers. The author makes a few recommendations at the end of the post. The most important is that parsers should produce a fatal error on particular quirky JSON input, rather than returning a guess at what data was intended.