TS
Sign In
Knowledge Base
Detailed Notes||9m 13s

Jonathan Blow: JSON Validation and Custom File Formats

https://www.youtube.com/watch?v=fr6v70BkU1s

Here are detailed notes from the transcript:

Detailed Notes: JSON vs. Custom File Formats for Data Serialization

Main Topics Discussed:

  1. The Nature of Parsing: A re-evaluation of what "parsing" truly entails, differentiating between reading primitive types and validating data structure.
  2. Critique of JSON/YAML: Arguments against using generalized structured file formats (like JSON, YAML, XML) for application data, especially concerning validation and runtime performance.
  3. Advocacy for Custom File Formats: The benefits of designing specific, custom file formats (both text-based and binary) for game or application data.
  4. Versioning of Serialized Data: Dispelling the myth that versioning data is inherently hard and arguing that JSON doesn't solve this problem.
  5. Human Readability: Comparing the human readability of JSON/XML with custom text-based formats.

Key Points and Arguments:

  • Parsing is Misunderstood:

    • The common belief is that JSON/YAML are great because code handles parsing, and parsing is hard.
    • Speaker's Counter: Parsing is not hard, especially for binary formats.
    • Two Parts of Parsing:
      1. Reading primitive types: Numbers, strings, etc. (JSON handles this well).
      2. Validation and Structure: Ensuring the file contains what's expected, knowing where things are, and placing data correctly. (JSON fails at this second part).
    • JSON files "could have anything," offering no inherent validation of the structure or content's meaning.
  • Consequences of JSON's Validation Weakness:

    • With Custom Formats: Validation happens once at load time. Afterwards, there's "no uncertainty or ambiguity" about data; it's known to be correct and in place.
    • With JSON: Loading JSON into a generic tree still leaves "no idea what's in there."
    • Common "Fail Case" for JSON Users: Randomly accessing JSON data at runtime because it's "easy," rather than performing an initial validation and data copy/pointer setup.
    • Performance Impact: Random access to "weird tree ass data" at runtime is "much slower" than accessing validated, structured data.
    • Runtime Uncertainty: The "frontier of uncertainty" is moved from load time into the entire program's runtime.
    • Error Handling: Programs constantly check if data exists in the JSON structure. If not, it often leads to "weird errors" that are "dropped on the floor," indicative of poor error reporting in modern software.
    • While disciplined programmers can mitigate these issues with JSON (e.g., a separate validation step), "most people don't."
  • Versioning of Serialized Data:

    • Common belief: Versioning serialized data is a "fairly hard problem."
    • Speaker's Counter: "No," it's not that hard, even for forward compatibility.
    • JSON Doesn't Solve Versioning: JSON provides no inherent solution for versioning. If you add a new field to an entity that only exists in later versions (e.g., "entity version 37 and later"), JSON has no knowledge of this.
    • Developer Responsibility: Developers "have to write that [versioning logic] yourself anyway" for JSON, undermining claims that JSON simplifies serialization problems.
    • Conclusion on Arguments: Arguments for using specific serialization systems (like JSON) based on solving problems like versioning are "deeply incorrect arguments most of the time."
  • Human Readability:

    • Common argument for JSON: "Human readable via text editor."
    • Speaker's Counter: Custom entity formats can also be human readable and "more readable because it's not so full of crap."
    • Comparison: JSON is better than XML ("absolute worst"), but still has issues.

Important Facts or Data Mentioned (Demonstration of Custom Format):

  • Speaker's Custom Entity Format: Demonstrated with an example of a game character entity (57 43 60) from the "Sokoban run tree data levels/Overworld" directory.
  • Format Characteristics:
    • Text Format: Human-readable (used for development/editing).
    • Binary Format: Used for shipping the game (faster loading, packed).
    • Readability: Presented as "tremendously more readable" and "faster to load" than JSON.
    • Floating Point Values: Stores hex floats, ensuring floating-point values "will never diverge."
    • Default Value Handling:
      • A * prefix indicates a value has changed from its default and will be loaded.
      • No * means it's the default value; these lines are discarded, and the default is used.
      • Benefit: Allows changing the default value of a field, and it will automatically apply to all entities that haven't explicitly overridden it.
    • Hierarchy: Supports "limited hierarchies" (e.g., an array of 12 strings, even if empty in the example), but warns against "wackadoodle" hierarchical data in entity systems.
    • Comments/Field Names: Semicolon-prefixed lines (e.g., ; orientation, ; position) are comments used to "make it readable" for humans.
      • The loading system skips these lines.
      • The system knows the order of fields based on the "version number" (which is also present in the file).

Conclusions or Recommendations:

  • Generalized structured data formats like JSON/YAML often fail to provide robust data validation and structural certainty, leading to runtime performance issues and increased complexity in application code.
  • Custom file formats, even simple text-based ones, can be more readable, faster to load, more specific, and offer better control over data validation, versioning, and default value handling.
  • The perceived "hardness" of problems like parsing or versioning is often overstated, and JSON/YAML do not inherently solve these problems in a way that truly benefits developers beyond primitive type handling.
  • While JSON might be "fine for prototyping," the speaker strongly implies that investing in a custom format early on pays significant dividends in terms of clarity, performance, and maintainability.
Generated with Tapescript
7f0104f - 03/02/2026