June 29, 2020

Clojure's keyword namespacing convention Considered Harmful

  1. The great benefits of namespaced keys
  2. Advantages of 'snake case': portability and ubiquity
  3. Frequent objections
    1. 'This is not idiomatic Clojure'
    2. 'The lisp-case convention lets me destructure keywords'
    3. 'But clojure.spec encourages the use of Clojure-namespaced keywords!'
    4. 'This will create inconsistencies in our code style'
    5. 'But I can just write a key-translation layer at the edge of my Clojure program...'
    6. 'You will need a data-marshalling layer anyway, so why not convert keys while you're at it?'
    7. 'My stack is full-Clojure, keywords supported everywhere, so I don't need a portable naming convention'
  4. Conclusion

Thank you for taking the bait of this inflammatory and simplistic title. I promise you that the rest of the article will be more reasoned and nuanced.

In summary: for far-ranging data attributes, such as database columns and API fields, I recommend namespacing keys using 'snake case', contrary to the current Clojure convention of using 'lisp-case' (for example: favour :myapp_user_first_name over :myapp.user/first-name), because the portability benefits of the former notation outweigh whatever affordances Clojure provides for the latter. This is an instance of trading local conveniences for system-wide benefits.

You may already be convinced at this point, in which case the rest of this article will be of little value to you. Otherwise, I want to provoke you to go through the following mental process:

  1. Consider :namespacing_keys_in_snake_case for data attributes in Clojure, rather than the conventional :namespacing.keys/in-lisp-case.
  2. Get angry, because that's disgusting to any self-respecting Clojure-bred programmer.
  3. Recognize that you're angry because you've got attached to an arbitrary convention, and superficial ergonomics around it.
  4. Optional: try to bargain with reality, by attempting to find some hacky mechanisms to keep both notations around. Realize that it's not satisfactory.
  5. Give up, be at peace, and reap the benefits of designing your programs system-first rather than language-first.

I went slowly through this process myself, with some maintenance pains along the way, which hopefully this article can spare you.

The great benefits of namespaced keys

First, it's worth emphasizing that the naming of data attributes is an important issue, however innocuous it may feel. Data attributes such as database columns or API fields are not only the bread and butter of our code, they're also some of the strongest commitments we make when growing an information system, often stronger that the choice of programming language. Once a data attribute is part of the contract between several components the system, it becomes very hard to change. This is true even of small systems such as web or mobile apps.

In recent years, Clojure has encouraged the programming convention of conveying data using namespaced keys, e.g using :myapp.user/id rather than just :id. Namespacing is great, because by reducing the potential for name collisions, it eliminates a lot of ambiguity about names.

The significant benefits of this approach are:

  1. context-free readability: when you see :myapp.user/id in your code, thanks to the myapp.user part, you can tell immediately what kind of data it conveys, and what type of entity it operates on. If you just saw :id, you'd have to figure that out from context.
  2. data traceability: with a simple text search in the code, you can immediately follow all the places where this piece of data is used across your entire system, whatever the language used at each place. This basic ability is significantly helpful for maintenability. I think many developers don't realize how big a difference it makes.

Observe that these benefits apply regardless of the choice of namespacing notation: you would reap them whether you write :myapp.user/id, :myapp-user-id, :myappUserId or :myapp_user_id. It does not matter which namespacing notation you choose, as long as you use it everywhere.

In other languages, programmers have traditionally relied on type systems to remove such ambiguity. Type systems are not as good for this purpose, because they don't reach beyond language boundaries.

Clojure's specific convention also offers some comparatively insignificant benefits:

  • prettiness: "look at :myapp.user/first-name, it's so beautiful! I can use slashes and dashes in programmatic names, this is THE POWER OF LISP!"
  • concision affordances: in Clojure code, using namespace aliases, you can write ::user/first-name as a shorthand for :myapp.user/first-name. Big deal. I mean, I can relate to how pleasing this feels when coding, but again, please consider that thinking of the whole system may be more important than this sort of local preferences.

Advantages of 'snake case': portability and ubiquity

In a real-world system, data attributes are bound to travel through many media: SQL columns, ElasticSearch fields, GraphQL fields, JSON documents... if the system involves other languages as Clojure, they may be represented as class members. As mentioned above, using the same name - spelled in exactly the same way - for the data attribute in all these representations is a precious thing, because you can trace it across your codebase with one basic text search. You can track its usage not only in Clojure code, but also in SQL queries, ElasticSearch queries, JavaScript client code, etc.

Clojure's conventional notation for keys (e.g myapp.person/first-name), a.k.a lisp-case, is portable to almost none of these other platforms: it's not suitable for SQL column names, nor for GraphQL field names, nor for ElasticSearch fields, nor for Java/Python class members... Some people have argued that in those systems you should just drop the entity-name part (myapp.person), as it will be represented in another construct such as the SQL table name, but that's generally misguided IMO, because you're back to having to disambiguate meaning from context, and you're making the fragile assumption that colocated keys should always have the same entity-name part (think e.g of :myapp.person/name and myapp.admin/password).

On the other hand, as far as I can tell, it's hard to come by a platform that does not support snake_case. Using it may not always be idiomatic, but it's almost always supported. That's reason enough to make snake_case a better default, because having one ubiquitous notation is much preferrable to having many locally idiomatic ones.

Frequent objections

'This is not idiomatic Clojure'

Arguably, your programs have more important requirements than being idiomatic. Programming history is riddled with bad design decisions made in the name of being idiomatic. Anyone who's worked through a nasty Scala class hierarchy knows how much incidental complexity some programmers are willing to inflict upon themselves for the sake of being idiomatic ("because it's SO much better to write subject.verb(complement) than verb(subject, complement). It's more idiomatic, you see."). Let's avoid doing that to your program, or the Clojure ecosystem.

'The lisp-case convention lets me destructure keywords'

I like the ability of destructuring my keywords into an entity-name part and an attribute part, for instance:

(namespace :myapp.user/first-name)
=> "myapp.user"

(name :myapp.user/first-name)
=> "first-name"

I can leverage that to manipulate my data attributes generically in my programs.

Don't do that. Don't treat Clojure keywords as composite data structures. This is accidental complexity waiting to happen. Programmatic names are meant for humans to read, not for programs to interpret. Changing an attribute name should not be able to change the behaviour of your program. In Hickeyian terms: you'd be complecting naming with structure.

As a basic example of how this may break, consider that it's normal and expected to find in the same entity keys with different namespaces, e.g :person/first-name and :myapp.user/signup-date. If you have a SQL database, there's a high chance that you need both attributes as columns of the same table (1): yet the default behaviour of a namespace-aware tool like next.jdbc is to constrain both keywords to have the same namespace, which would be problematic in this case, and may be viewed as revealing a complecting of attribute naming and storage layout (2).

Notes:

  • (1) Yes, I know about SQL tables normalization... and that you can do too much of it.
  • (2) Don't worry, that won't prevent you from using next.jdbc: this default behaviour is easily opted out of.

'But clojure.spec encourages the use of Clojure-namespaced keywords!'

Yeah... I know. In a way, Clojure Spec does what I've told you not to do in the previous section: relying programmatically on a naming convention for keywords, as Spec expects the keys you register to be Clojure-namespaced. Pushing further in that direction would be, in my opinion, a design error of clojure.spec.

That said, clojure.spec does quite sensibly make room for other namespacing conventions (via :req-un and :opt-un), and so clojure.spec is compatible with the recommendation this article is making. The semantics of Clojure Spec would be completely broken if name collisions were allowed, and so it's understandable that it's decided to check for namespacing.

'This will create inconsistencies in our code style'

What might worry you: some parts of your code might be forced to use keywords in lisp-case - for instance, because libraries like Integrant impose them on you. Having these keys in lisp-case and other in snake_case might be disturbing.

If that's troubling you, you're in for a pleasant surprise: the visual constrast between snake_case and lisp-case actually makes the code more readable, because it's signals which keys are meant for local use and which are meant to travel across the system.

By the way, you have already seen an instance of readability enhanced by contrasted notation: in Clojure's syntax itself, where parens (... ) are used to denote invocations, and square brackets [... ] are used to denote lexical bindings, departing from the Lisp tradition of using parens for everything.

Again, I don't want to put too much emphasis on this aspect, because I think it's a relatively minor issue. Even without this bonus point, snake_case would be preferrable.

'But I can just write a key-translation layer at the edge of my Clojure program...'

... and then you'd lose the main benefit of namespacing, which is the ability to track a data attribute across your entire system rather than just one component of it.

Allow me to insist: the global searchability of programmatic names is much more important than their conformance to local naming customs.

'You will need a data-marshalling layer anyway, so why not convert keys while you're at it?'

This misses the point, because the key benefit of a ubiquitous naming convention is not to save you the implementation of a data-marshalling layer. It's really about code readability / searchability.

For example, people have argued that other languages don't have a Keyword type, and so having keys in different format in your system is unavoidable. But that's not an issue. So key may appear as :myapp_customer_first_name in Clojure, myapp_customer_first_name in GraphQL and "myapp_customer_first_name" in ElasticSearch, but it will be obvious to both you and grep that these denote the same things.

'My stack is full-Clojure, keywords supported everywhere, so I don't need a portable naming convention'

Lucky you! But are you sure things will stay that way? Isn't there a risk your Datomic database will eventually be followed by an ElasticSearch materizalized view, or that your EQL API will be complemented by a GraphQL or REST API, or that a scientific-computing Python component will grow in your project, or that a JavaScript or ReasonML client will join your system? If that happens, you'll be happy to read myapp_customer_id in the code of these things rather than just id!

Conclusion

This article makes 2 unintuitive claims: that the choice of notation for namespaced keys matters, and that the one used conventionally in Clojure is often suboptimal. It proposes to replace it with :snake_case, the main drawback being that it looks ugly and awkward, which seems like a good deal as design tradeoffs go.

2 years ago, I opened a discussion on ClojureVerse questioning the use of Clojure's namespacing convention. Objections were raised, but none that convinced me or brought up issues I had overlooked, and I'm now confident that this article makes the best default recommendation.

EDIT: That being said, as with most design problems, please don't follow this advice blindly: make it a conscious decision based on the specific requirements of your system. Hopefully this article will have given you a keener awareness of the tradeoffs involved.

In my experience, this proposal tends to be met with reluctance, and remembered without regrets. I myself came to it begrudgingly (a coworker once phrased it well: "I hate it, but it's right.") Clojure developers program with love, and love drives us to cherish little idiosyncrasies. That said, I find it paradoxical that most of the resistance to this idea was along the lines of favouring 'local-language convenience', in a community where talks like The Language of the System and Narcissistic Design have championed as higher principles the adaptability and friendliness to a varied surrounding system.

I hope the ideas presented here can help you program your systems smoothly and harmoniously. Thank you for reading!

Tags: Clojure Architecture Programming