Little languages are the past and yes, the future. We just don't recognise them.
It was common in the 60s and 70s to have the hardware manufacturer ship all the OS and languages with their hardware. The languages were often designed for specific problem domains. The idea of general purpose languages (FORTRAN, PL/1, etc) was uncommon. You can see this in K&R (the original edition anyway) where they justify the idea of a general purpose language, even though C itself derived from prior general languages (B & BCPL) and they had gotten the idea from their experience on Multics (written in PL/1, a radical idea at the time). So a 20 year old idea was still barely diffused into the computing Zeitgeist.
Most Lisp development (since the early 70s at least) is writing a domain-specific representation (data structures and functions) and then writing your actual problem in it. I used both Lisp and Smalltalk this way at PARC in the early 80s.
More rigid languages (the more modern algolish languages like python, c++, rust, C, js etc -- almost every one of them) doesn't have these kind of affordances but instead do the same via APIs. Every API is itself a "little language"
What are called little languages in the Bently sense is simply a direct interface for domain experts. And after all what was a language like, say, Macsyma but a (large) "little language"?
I came to this conclusion early in my career. It went something like this:
A - "To do this, just create this object, fill in these properties, and call these methods."
B - "Okay, I did that, but it crashed."
A - "Yeah, it's because you set the properties in the wrong order. This property relies on this other property under the hood. Set them in this order."
B - "Still crashes."
A - "Yeah, you called the methods in the wrong order. This method relies on that method. Call them in this order and it works."
My conclusion was that the lisp philosophy of building a lot of little sub language was equivalent to what people were doing with OO in C#/Java. Either way you have to learn the "right" way to put things together which is dictated by unseen forces behind the scene.
Of course, I also concluded that most people work differently than I do. For most people, if the code "looks right" (ie recognizable syntax) then they're able to tell themselves a story that it's familiar and their intuition is able to pick up the slack for finding the right enough way to use most arbitrary APIs (just as long as they don't exceed some level of incomprehensibility). On the other hand, I have to understand the underlying logic or I use the API the wrong way pretty much every time.
So for most people lots of APIs is actually a much better cognitive way for them to work whereas for me API soup and lisp macros are the same conundrum.
DSLs and API are fundamentally the same thing, they're making decisions for the user so they can be sped up and reduce cognitive load. The only difference is the manner in which it's expressed. It turns out functions and interfaces are a pretty good way to express... most things you want to do in a program. Lisp acknowledges this and runs with it.
IMO we need a language (or library?) that forces builders of an API to make incorrect behavior hard or impossible. Here's some scribbled out ideas: https://packetlost.dev/blog/lang-scribbles/
I don't think it's so much "little" languages (commonly DSL) that matter. It's more the jumps in expressivity. You don't use a full on Turing-complete language when you need to match strings written in a regular language. Instead, we write the language we want as a regexp, and then use a regexp engine to match it.
I agree with much of the problems listed in the article. The author even manages to stumble onto some of the solutions (e.g. Dhall being a total language).
"Expressiveness is co-decidability" is the main theme of these things. The crux of the issue is in our everyday programming tasks, we have many levels of decidability, ranging from RE all the way to things that require full Turing completeness.
The majority of work however, lies in the middle. There are so many things that can be done with pushdown automatons, or with deterministic automatons. Most codebases don't actually use those though. An issue is that there is a dearth of "mini" languages that support these things.
Another issue is that somehow we are enamoured with the idea that our languages must be able to express everything under the sun (up to TC/Recursively Enumerable). This seems to be more of an industry attitude than anything - there is this chase for the most powerful language (a lisp, clearly... everything else is a blub).
I've recently experimented with embedding an APL into my usual programming language, and it was a very interesting experience. It feels like having the power to do regular expression stuff, but with arrays. I want to do the same for the other levels of expressiveness.
Is the engineering footprint of an organization really better if everything is implemented in twenty different languages, versus just three or four? Everything else aside, quality of the language, scope, etc; just the number. You have to expect everyone to know each language; know the ins, outs, idioms, gotchas, etc. You have to be able to hire for the languages. You need the language runtimes in your environment, everywhere, docker, local dev machines. You have to keep up to date in X times more changelogs, version upgrades, CVEs.
The article pulls Shell as an early example. Shell did not become the powerhouse it is because its "great" (though some would argue it is, I'm not here to debate that); or because its small; or because its general purpose; or because its single-purpose. It became a powerhouse because its Old and Omnipresent. See, the problem with inventing New Things is that they are, by definition, not Old, nor Omnipresent. New Things have to start somewhere, but you're starting in last place.
> Regular expressions and SQL won’t let you express anything but text search and database operations, respectively.
Oh mylanta. Did you know that after the addition of back-expressions, Regular Expressions became turing complete? They are, functionally, a real programming language, just like C; well, except, far more annoying to write. And naturally, SQL "won't let you express anything but database operations", which is to say nothing about "SELECT 1+1"... let alone the little corner of the language called "Stored Procedures".
There’s a Groovy DSL that will allow you to perform operations on collections using SQL syntax. I’d argue it’s an improvement over the traditional procedural or functional approach’s.
Well, maybe this will start a flame war: I will absolutely die on the hill that SQL is shit. But, like shell, is old (48 years old!) and omnipresent; so it sticks around.
Like, the thing I find comical about the original article is: they reach for shell, regex, and SQL as Prime Example of "mini languages done right". Be more domain specific, look at this its already happening. All of those examples are kinda shit languages! They're popular because of their omnipresence. Maybe regex is "fine", though the line "try to solve a problem with a regex and now you have two problems" is well known for a reason.
But my broader opinion is: if you're building an application, service, tool, whatever in LANGUAGE_X; having to "dip out" of that language into an entirely different language should be viewed as, fundamentally, a Negative Thing. There may be reasons why you should; the good may outweigh the bad; but there is Bad there. There will always be an interpretation layer; extra tooling; that's More Things that can go wrong, have to be configured, statically analyzed, tested, its a failure point. SQL injection is a thing. Why? Because, for a time, people had the thought "hey, its a string, lets just template the string"; but that's no good, so now we have A Layer between the Java and the SQL to keep us safe. We can only support So Many Layers; we need to make things simpler, not more complex, there needs to be Fewer Things.
Its time is gone, but I'll always view Heroku Buildpacks as a paragon of system design. Consider: You could write a nodejs application in javascript, write a package.json in json (also javascript), and end-to-end get that thing on a URL on the internet, all in one language. Fantastic! Today, a typical app will have the app (say, Golang), a go.mod (different syntax than go itself), dockerfiles (language 3), kubernetes yamls (language 4), maybe helm or cdk8s (language 5), shell scripts, makefiles, maybe you're also writing sql... this isn't better and it doesn't have to be like this. But right now, it is, and to some degree I'm happy for it because its 80% of the reason why I get paid six figures.
On that note: Am I the only one that's constantly surprised by the absence of proper sandboxing solutions when so many programming languages now provide (otherwise pretty useful) means of running code dynamically in a script-like fashion?
In C#, I can pull in Roslyn, and compile a string on the fly as a C# script; but the way the .NET standard library is structured makes it pretty much unfeasible to prohibit outside interactions I don't want to allow (in my case, e.g: `DateTime.Now`, while allowing the Handling of `DateTime` values).
It's possbile to embed the Typescript compiler into a website, but running code on the fly and some simple sandboxing is not feasible without a serious pile of hacks.
I've recently read a forum thread about a library for compiling/running Elixir code as a script, but guess what: The runtime (apparently) makes sandboxing really hard.
And so on and so on. I just wished that the LUA approach of "if I don't give you a hook, you cannot do that" were just the default. I've seen so many overcomplicated enterprise-y solutions that are basically just a plea for a well-designed, local and small scripting API…
> And so on and so on. I just wished that the LUA approach of "if I don't give you a hook, you cannot do that" were just the default.
Yes, because languages are still not capability-secure. Memory-safe languages are inherently secure up until you introduce mutable global state, and that's how they typically leak all kinds of authority. If you had no mutable global state, then you can eval() all the live-long day and you wouldn't be able to escape the sandbox of the parameters passed in.
Examples of mutable global state:
* APIs: you can make any string you like, but somehow you can access any file or directory object using only File.Open(madeUpString). This is called a "rights amplification" pattern, where your runtime permits you to somehow amplify the permissions granted by one object, into a new object that gives you considerably more permissions.
* Mutable global variables: as you point out, eval() can access any mutable global state it likes, thus easily escaping any kind of attempt to sandbox it.
If these holes are closed then memory-safe languages are inherently sandboxed at the granularity of individual objects.
As far as I know most in-process sandboxing has been deprecated because it is in contrast to maintainability. E.g. Java decided against its Security Manager, because it is way too easy to leave the proper checks out of a new feature, leaving the whole thing vulnerable with a false sense of safety. Instead, process-level isolation is recommended.
Ruby used to have the $SAFE feature for sandboxing, but it was removed because it was buggy, added a lot of complexity, and wasn't actually that useful. Linux has all the various isolation features that make Docker work, but people still recommend not running untrusted code in Docker containers because of the potential for oversights/"bugs" in Linux's API. I suspect that programming languages / VMs don't include these features because they are very hard to get right and add a disproportionate amount of complexity for their utility.
I think WASM is filling in that gap to some extent. From the spec:
> Any interaction with the environment, such as I/O, access to resources, or operating system calls, can only be performed by invoking functions provided by the embedder and imported into a WebAssembly module
And IIRC, the core instruction set is reasonably compact.
Your best approach is to run untrusted code in a separate process in a sandbox. Language developers don't normally deal with hostile users in the same way that os developers do.
A "little language" is just an abstraction over a small part of the domain. Equivalent abstractions can be and are written as libraries. In a Turing-complete host language, the primary difference is that these libraries don't get the privilege of inventing new syntax, but that's almost always a good thing.
We already have major headaches switching between JS, SQL, and {insert backend language here}. Introducing tens of little languages into a codebase may marginally increase readability of each chunk of code in isolation, but the amount of context-switching and required background knowledge it introduces would more than make up the difference.
In a abstraction strategy that's based around libraries, every library agrees on the same basic syntax and semantics. You interact with an embedded DSL via a well-defined interface that all libraries respect. The syntax may not be perfectly ideal to any one part of the project, but it's consistent throughout every part of the project. That has real value.
I think it's also a red herring to argue about lines of code: stringing together a bunch of little languages is not likely to lead to fewer lines of code than pulling in an equivalent number of third-party libraries, and it will almost certainly increase the total amount of code in your distribution, because each little language must not only implement the functionality desired, it also needs its own parser and interpreter. Take McIlroy's shell script: if you add up the C code to implement each of those little languages, you're at about 10k lines of code to make that bit of code golf possible.
I'm a huge fan of DSLs, and I like the analogy of modern programming as pyramid building. I just don't think independent, chained-together DSLs are the answer. I'd rather have a language like Kotlin that is designed around embedded DSLs that all respect the same rules and use the same runtime.
> Take McIlroy's shell script: if you add up the C code to implement each of those little languages, you're at about 10k lines of code to make that bit of code golf possible.
This is a fair point, but part of this has to be how battle tested the language in question is, right?
Bringing in a single language that's been run through its paces (bourne shell in this case) for text processing, seems like a much lower risk than bringing in a dozen different languages and hoping the places that they interface doesn't blow up (hope someone tried that particular combination before).
> We already have major headaches switching between JS, SQL, and {insert backend language here}.
I used to agree wholeheartedly with this, but now I pretty strongly disagree; in my decade or so of cumulative code-monkeying experience, having to understand the nuances of some "convert $BACKEND_LANGUAGE to JS and SQL" layer has been far more headache-prone than just, you know, writing JS and SQL. All about using the right tool for the job - and I know of very few languages that are the right tool for all three of those jobs (let alone the myriad other jobs that might pop up as soon as you expand beyond a simple CRUD app).
I disagree and would say DSLs should go away if possible.
DSLs like SQL are the norm and you can see the problem of them in basically every project.
You either use ORMs or you end up hand rolling SQL rows into Structs or Classes.
The whole mapping usually looks like crap and contains a bunch of implicit corner cases, which eventually end up being a footgun for someone.
Usually the SQL sever runs somewhere else, the ports are wrong, the language version is wrong, or a migration failed and a function is missing yada yada....
The same is usually true for Regexp. There are a billion dialects and every single one of them is basically unreadable, incomplete or just weird.
The same is true for microservices with tons of config files for dev, staging, testing and production...
Everything has its own version, can be down or mutate some random state somewhere while depending one other servcies.
It always breaks at the seams.
Increasing the amount of DSLs increases the amount of seams and thus makes software worse.
But your ORM example could be considered a DSL in itself, so it doesn't really work that well. ORMs can be implemented in various ways too, as a syntactically looking different DSL or as something like objects and method calls or whatever the language already provides.
Regexes can be abstracted from exactly by making a more readable DSL.
Config files and the stuff accepted therein are small languages as well.
What are the alternatives to making little languages?
> DSLs like SQL are the norm and you can see the problem of them in basically every project.
I think your points re: sql are true of straightforward crud apps, but not true at all of an analytics app. In those cases, the sql is often _very_ complex, and while the results of a query may be eventually mapped into a struct or something, the query generation is rarely a simple mapping of properties in an object to select columns.
> There are a billion dialects and every single one of them is basically unreadable, incomplete or just weird.
Sure, but in the vast majority of cases, one only has to deal with at most 3 dialects, and there's a good chance you won't be hitting the corner cases that make each dialect significantly different.
This article claims the problems it's trying to solve are: Hard to onboard new hires, code breaks because of lack of understanding of dependencies, and code changes become harder to manage. In my experience SQL, regexes, unix shell, and listening to Alan Kay, far from solving those problems, are the very things that most exacerbate them. General-purpose languages that are expressive enough to let one write business logic in the language of the domain, but without breaking the rules of the language or requiring new tooling - "internal" rather than "external" DSLs - are a far better way forward.
> " General-purpose languages that are expressive enough to let one write business logic in the language of the domain, but without breaking the rules of the language or requiring new tooling .."
gawk compiled to webassembly would seem to fit the bill -- just shifts gawk from "external" dsl to "internal" dsl. orthogonality allows for usage of all modern interface trappings without any retooling and/or breaking rules of gawk language. Makes gawk a module in group of customized modules to form a general-purpose program out of dsl/little languages.
Transpiling to your main language doesn't make it an internal DSL. Reimplementing what gawk does as a library you can use within your language (and not just by passing opaque strings to it, but by actually expressing awk-style commands in a proper datastructure) is the kind of thing I'm advocating for.
Transpiling the language changes the operational domain.
Reimplimenting/supporting the library in language means, have to do that for each OS operational domain that the gawk interpreter runs in (aka each relevant OS & OS version for mac, unix, windows, non-pc OS's, plus mobile versions).
Modifying language also means that anything that relies on lanuage will have to be re-evaluated to assess potential language change issues/implications.
Transpiling to change the operational domain means can just focus on task at hand, as long as not tied to specifics outside the operational domain of webassembly. aka file paths
Avoids the need to assess potential language change issues/implications.
Changing the operation domain in this case means can leverage the operational environments orthononality to 'express awk-style commands in a propper data structures' that awk doesn't provide without modifiying the language. (or writing the feature in awk)
Changing the operation environment also means no required new retooling. Changing the language may force/require retooling.
Case study context example:
Long term 10-40 year studies where same code written at start of study needs to be runnable / usable / understandable at the end of the study without being rewritten.
Note: Hardware and/or OS used at start may be entirely different than that used at end of study.
Fairly simple to get 10,000+ awk scripts running under IBM/DEC/Mac/Windows/Unix/BeOS to run under any of the other OS domains using webassembly by transpiling awk.
Fairly trivial to leverage browser to permit a user customizable GUI usable under any of the aformentioned OS domains without having to modify/recertify the awk scripts.
Finding / recompiling / supporting awk for each of those OS domains (OS & verions variants, GUI libs) bit more time consuming (short term & long term).
> "... (and not just by passing opaque strings to it, but by actually expressing awk-style commands in a proper datastructure) ..."
Explicit, in language data structures are not a feature at the chomsky language level that traditional awk targeted at, but in-place file block state manipulation can be setup/arranged in a data structure form.
Something like a bnf/ebnf language data structure description would need to be used/processed by an awk script to make use of data structures within an awk script. aka ANI spec in BEGIN section. Something akin to overlaying a script on top of a script and/or printf being passed description of string format.
awk not traditionally set up to do homoiconicity forms within same script instance. (abuse of system call & gawk exentions not withstanding)
Add what ARC language provides to mix to get one common customizable reference for html template interface across multiple OS/platforms. (vs. multiple file copies of a html template instance)
> SQL is a little language for describing database operations
Yeah, SQL is not a little language anymore.
It started as one, but because a lot has been added to it and SQL flavors for Oracle or Postgres are anything but tiny. Windowing, nesting, json handling...
I think the author is kinda proving with this that a successful little language does not stay little, and hence little languages are not the future.
And don't get me started on DSL in general. Just lookup my username and "DSL" on hackernews for endless rambling.
Maybe the term modular would be more appropriate than little? Every solution that has gained widespread dominance, take the shipping container, has been thanks to its flexibility, and modularity.
People have been making this argument since the 80s and possibly even earlier. My experience is often the opposite. Little languages are usually far, far harder than (mis-)using "big" languages for small tasks.
The problem is that your DSL has to be understood by other people, including future you. Programming tasks are vast, combinatorially explosive state spaces full of weird potential interactions between features. Once you get above the complexity and universal familiarity of say, arithmetic, it's difficult for others to understand what's going on just by looking at 1-2 live examples. You have to heavily invest in proper docs and tooling (if your language doesn't provide it for free). By the time you've completed that your "little language" usually isn't such a little effort anymore.
If you don't, you've just made the next CMake. Congrats you monster.
That's why we have languages with functions now, because people didn't want to manually do a register dance in assembly.
That's why we have name spaces, because naming conventions only take you so far.
That's why we have map and filter (or equivalent) because that's what most loops are doing anyway.
Generation after generation, we discover that we all use common abstractions. We name them design patterns, then we integrate them in the language, and now they are primitives.
And the languages grow, grow, bigger and bigger. But they are better.
More productive. Less error prone. And code bases become standardized, simple problems have known solutions, jumping to a new project doesn't mean relearning the entire paradigm of the system in place.
Small languages either become big, or are replaced by things that are big, for the same reason most people prefer a car to a horse to go shopping.
Not that horse ridding will totally disapear, but it will stay in their optimal niche, they are not "the future".
SQL persists because it's an interchange format, not just a programming language. It's one of the few programming languages you'll see embedded in other languages - and generating SQL from other languages is a common source of security bugs. You can't upgrade away from SQL without changing both ends of that connection.
You can write large programs in SQL, but it's generally considered good practice not to.
(I feel I ought to mention LINQ here, not to make any specific point but just to fanboy about it)
same, and in fact, despite using C# since before LINQ even existed, I don't even know how to write the sugar candy version of it.
Part of that is me coming from C++ and its algorithm header and the other part is that the code is just easier to read and understand than the sugar candy version (to me, atleast).
Somewhat pedantic argument: SQL is kinda dead. Sure, modern databases use upgraded dialects of it, but they are custom to each database and often incompatible with plain SQL. There are even many cases where modern databases don't support even standardized SQL constructs.
The easiest example of where databases and SQL part ways: UPSERT. It doesn't exist in standard SQL.
Then let me address the forest. Modern DBs offer SQL as an on-ramp for the most common DML and querying. This further entrenches it as a data language and why SQL just won't die.
There are always more proprietary methods where one needs them. That doesn't mean SQL is dying, it means SQL will likely grow to include some of those too.
> Small languages either become big, or are replaced by things that are big, for the same reason most people prefer a car to a horse to go shopping.
So why are shell languages still around? Why are they not replaced by C#, C++, Java or another big (=general purpose) language?
I find your horse->car comparison more akin to the sh->bash->zsh transition. Zsh is not as small as sh, but still it is in the small league is you ask me.
Small does not mean w/o functions, without NS, without map/filter: it means "not general purpose".
Shell languages are a very good example, because they have been replaced mostly by bigger languages. First by perl, then by python.
Now a day, most people don't write bash if it must be more than a few lines: it fits niche perfectly, like horse ridding.
But you are not going to do website with bash anymore, server swarm deployment with powershell or build your encoding pipeline in fish. Tasks that we used to do using those small languages, until we found out that we prefer a car to do shopping.
I do this routinely with powershell. Powershell is not bash or fish, its highly usuable in all situations except those that require greatest performance and even then there are solutions for various types of problems.
I'd say you're in the extreme minority then. I'm not a bash hacker but I usually end up writing a little script for myself at least once a month. Even just doing `command && command` is technically using a shell language
I said I write custom scripts about once a month. I use bash literally every day. But I also wasn't arguing the future of programming languages. The parent here said that shell languages aren't being used
Still alive and quite well. In fact, as more and more programs are written, they become more and more useful. We're already well into an era where programming can be nearly entirely ignored in favor of merely scripting new behavior out of the interactions between existing programs.
Shell languages have "must be easy to type in execution order" and "must integrate with random programs on the filesystem" as an overriding consideration. You're not going to get that with Java.
gawk compiled via webassembly to allow for running in browser allows 'modern' gui input / output beyond the command line interface while still retaining the ability to be just a cli program.
The irony of your horse analogy is that a horse is more general purpose than a car. A horse can travel along train tracks, roads, sidewalks, and hiking trails.
There’s two things that the article mentions that your omni-language has massive problems with:
1.) Performance. For example a run-time for a user-friendly little language that maps HTTP requests to SQL queries can be much faster than a language that does the same thing by plugging user-friendly APIs together. A custom run-time can parse an HTTP request string directly into SQLite op code while the JS developer is writing glue code that takes orders of magnitude more memory and CPU time.
2.) Static analysis. This means tools that are better at finding bugs, finding optimizations, visualizing structure, etc.
Both of these things are theoretically true, but only if the little language has enough resources behind it to optimize and build enough tooling. A big language is much more likely to have those resources because the target market is big enough to justify it. A niche little language will likely never get that kind of mass behind it, so the tools will be lacking and the runtime won't be optimized.
Big general purpose languages don’t have the capability for certain kinds of performance improvements or static analysis so the market size isn’t a factor.
My point isn't that the big languages can do everything that a little language could theoretically do, it's that the little languages won't have the resources to pull them off, nor will they have the resources to even do what the big languages do.
Proper debugging, syntax highlighting, language servers, security audits. These are things that engineers in the real world expect a language to have, and each little language would have to reinvent each of them. In contrast, a library can piggyback off of the tooling provided in the host language.
So even if a language can deliver on the performance and static analysis that it promises (which few will), it cannot reach adoption because it cannot provide the infrastructure needed.
(That doesn't need even get into the onboarding concerns that I and others have raised about having a codebase that is strung together from a dozen tiny languages.)
Proper debugging, syntax highlighting, language servers, security audits. These are things that engineers in the real world expect a language to have, and each little language would have to reinvent each of them.
I don’t always see this as the case and this might be a very fruitful area for research. What I mean is, much like how we have tools like bison, antlr, and the k framework, I could easily see this notion extending to language servers, etc.
As for onboarding, well, who are we onboarding? New software engineers working on a general purpose language or new operations members working on a DSL?
I remember a time when CSS and HTML had yet to be consumed by a general purpose language and the onboarding for new web designers was significantly easier.
> I don’t always see this as the case and this might be a very fruitful area for research. What I mean is, much like how we have tools like bison, antlr, and the k framework, I could easily see this notion extending to language servers, etc.
Yeah, I could see this happening, to a point. I'm not convinced it will happen, because what would be the impetus?
> I remember a time when CSS and HTML had yet to be consumed by a general purpose language and the onboarding for new web designers was significantly easier.
So do I. I had a friend whose dad wanted to start building web pages for a living, right at the moment when that job was starting to vanish.
At least two factors were involved in that shift:
1. More and more companies wanted web applications, not simple web sites. A functional web app requires engineering effort or it falls apart, whereas a simple web site just requires making things look right. Once these companies had engineers for a web app, it was simpler just to pay them to build out a marketing page than to hire that out (even if hiring it out would have been cheaper).
2. WordPress was becoming more and more approachable, and others like SquareSpace stepped in to make it even easier for a non-technical person to build a website. Companies that didn't need engineers for an app realized that they could do a good enough job for their needs by just using these tools in-house.
I suspect this is the fate of any little language that successfully eliminates the need for engineers: if we can take a piece of a domain and describe it in a DSL that is streamlined enough for no-effort onboarding, it won't take very long for the DSL to be made redundant by GUI tools that are even easier to use.
At that point, the DSL either goes away entirely or it morphs into something bigger to meet more complex needs (as HTML/CSS/JS did).
(This doesn't apply to DSLs like Regex and SQL that are designed for use by engineers, but then we're back to the question of whether an external DSL is pulling its weight relative to an equivalent library.)
That there are many issues with general purpose programming languages such as those outlined by the author in the article?
> I suspect this is the fate of any little language that successfully eliminates the need for engineers: if we can take a piece of a domain and describe it in a DSL that is streamlined enough for no-effort onboarding, it won't take very long for the DSL to be made redundant by GUI tools that are even easier to use.
That seems ideal! Domain-specific languages have always benefited from GUI tooling alongside anything that is less clunky as textual representation.
It is a fruitful area for research! Truffle is an example of the sort of framework you mean. Implement a parser+interpreter using Truffle and you get JIT compilation, GC, debugging, profiling and more stuff for free on top of the JVM.
I don’t want to spend all day playing human debugger because your little language doesn’t support debugging.
I’m not here to tell you how very clever you are for reinventing the wheel. I have my own shot to get done and that’s easier when we make a smaller language out of our general purpose language by agreeing to style guidelines, instead of avoiding solving social problems with technology by creating your own game to change the rules.
When everyone in your group expects the rest of the team to commit
10% of our attention to their tool or module, that doesn’t scale. If we go more than 50% total our capacity to solve problems becomes hamstrung.
If you work someplace where new devs are useless for a year, you’ve likely already got the snowflake disease.
> People have been making this argument since the 80s and possibly even earlier. My experience is often the opposite. Little languages are usually far, far harder than (mis-)using "big" languages for small tasks.
From the top level comment.
It’s also the primary failure mode for DSLs. There’s never any thought for tracing and debugging, and so it becomes a peaen to the primary author(‘s ego).
Once you no longer have impostor syndrome it gets much harder to play along with these ego trips. It’s not that I’m too dumb to understand your DSL, it’s that you’re a fool who thinks writing your own language is the pinnacle of success. It’s very rare for it to better than a decent API and you’re thinking about yourself, not your coworkers.
That’s why everyone always mentions SQL. It’s the exception to the rule, not an example of when DSLs can be good. Remember XSLT. Remember a dozen other failed DSLs. Per decade. Forever and ever.
So every new language is written be a fool on an ego trip? Or are you calling me the fool?
I'm having a very hard time separating what seems to be a painful personal experience from a productive conversation about the future of programming languages.
- Hard to onboard new hires
- Code breaks because of lack of understanding of dependencies
- Code changes become harder to manage
Anyone who finds these problems improved by a DSL isn’t spending time looking at what is actually making life hard for their coworkers, and therefore arrogance.
What helps me with those problems is tool selection. Picking community supported tools that fight some of these problems by allowing me to hire people who already know part of our system, and who can use a wider community as a knowledge base instead of just internal folks who have run off to work on other interesting things and don’t have the time anymore for something cool they did three years ago.
A veterinarian explained part of the dynamic to me quite some time ago: it’s better for your emotional relationship with your pet if a stranger does an uncomfortable procedure rather than you. If we use a tool with poor support at least we can bond over it as a team. If it’s Steve’s baby and he’s terrible at support that animosity turns inward, which is not just a problem, it’s the beginning of the end.
I think I figured out the issue in this discussion. The article is actually about general purpose vs domain specific and not really about big and little… and unfortunately since most “DSLs” are APIs written for in general purpose languages the author resorted to using the term “little language”.
2.) SQL is not a general purpose programming language and is used as an example of a little language by the author.
I can clearly imagine a future where instead of a few large general purpose languages we have a multitude of niche languages that have better performance characteristics, better tooling and smaller individual learning curves.
And even if you do invest in proper documentation and support, you still have to overcome the hurdle that people just _don’t want to spend time learning your one-off language_ - there’s nontrivial opportunity cost in learning something that won’t be useful anywhere else. So people will just do the bare minimum which will lead to misunderstanding and bugs.
> you still have to overcome the hurdle that people just _don’t want to spend time learning your one-off language_
That's an important point. Maybe as an academic someone is more inclined in learning new languages for the sake of intellectual interest, but on the engineering side, having uniformity of language is a big plus.
I ask myself, by the way, if the author misses the point not considering that all code that the programmer writes is translated in machine language / byte code of elementary instructions: those instructions are the primitive language. But the programmer uses a more elevated language as he wants something more expressive.
Academics are not so hot on DSLs as one may think. Typically, they are viewed as padding material to the real research contribution. Anecdotally, I can recall papers being bashed because of the use of a DSL, but not applauded because of it.
> people just _don’t want to spend time learning your one-off language_
Which people though? If you make a DSL that non programmers in your organization use, I'm sure they will appreciate not having to learn the intricacies of Rust or whatever's in fashion this week.
We don't use DSL per se, but a custom tool for writing QA tests, which looks like a kinda Visio block diagram software, only each block is a function or other logical entity. Anyway, after a few years struggling with it, for many different reasons, we are slowly and painfully migrating to writing tests in Python, and every single QA supports it.
Custom languages, with limited support, limited community, limited extendability etc. are just like that - limited. And as soon as you hit a wall with them, transition will cost more (in both time and money) than saved in the first place by using "easier" tooling for non-programmers.
And you probably are going to hit a wall, because human desires expand. Your program does X? That was great, when you wrote it. But now, can you make it do Y? How about Z? Can you integrate it with system W? What do you mean, your little language doesn't support that?
While they are arguably "little languages", shells don't have this problem, because they allow you to invoke any program written in any language, which is an infinite-sized escape hatch for this issue. SQL kind of doesn't have this problem, because it has stored procedures (and also because people don't usually expect general computation from SQL). So SQL and shells are both "little" in some sense, but very much not little in others. Any other small language must also have some similar escape hatch, or it will trap you.
Digression: Reading the comments, SQL and shells keep coming up as the examples of "little languages". But SQL, for all its power, is not "the future". It's going to be part of the future, but it's sure not going to replace everything else. Neither are shells. And I don't see many other examples coming up. This doesn't sell me on the article's claim.
> a custom tool for writing QA tests ... we are slowly and painfully migrating to writing tests in Python, and every single QA supports it
QA is a programming related activity. These aren't the non programmers you are looking for.
I'm thinking more of shops that aren't pure software dev. Where you have specialists in <whatever the company does> that could use writing some automation themselves but don't have the time or inclination to learn all the modern meta-meta-programming stuff. 30 years ago they would have written some quickie BASIC for their formulas but now the software is based on Rust and C++ 2025 and they don't have time for that.
Basically programming is best handling by ... programming languages. However a domain that's not programming can be handled by a DSL.
> which looks like a kinda Visio block diagram software
But in this case there's your problem right there. That's not a DSL it's a visual code generation tool. Can you think of even one tool like that that hasn't proved itself useless?
part of me wants to argue that the interface to a complex piece of software is a language, really, and if it were a self-consciously made language, it could be a lot better in a number of ways.
It really isn't, but it is a bit like democracy, it could be better, but given the landscape and IDE integration capabilities, it is the best the C and C++ community alongside tool vendors have agreed upon.
I certainly rather use CMake, even if I need an open book on the side, than Gradle, Blaze, autotools, yet another Python based build tool,....
However, most of the times, since my use of C++ is related to personal projects, IDE project files are more than enough, they have been serving me well for the last 30 years.
Certainly not better in their dependency on Python and JVM, or IDE integration across Qt Creator, KDevelop, Visual Studio, Android Studio, Clion, VSCode, C++ Builder.
CMake has working Xcode support, while Meson has an unmaintained proof-of-concept hackjob that doesn't work. I really like a lot of what Meson does, but as long as I have to choose between working IDE support and a good project definition language I'm always going to choose the first.
I would add to that when people move to next job DSL dies I don't have a need for that DSL in next company. I could try to implement there - but IP rights would prevent that, getting new people on board with my ideas is just so much work that it is not useful.
That is like learning some SaaS application ins and outs you switch jobs and that specific experience is not useful at all for you.
General purpose language on the other hand is useful even if you move from one country to another and take job in different business niche.
As a developer there is no upside for me to spend my time on diving into some DSL I wont use in next job.
As a business person there is no upside for me to spend my time learning DSL or specific application interface in and out that I won't use in next job or in different position.
I only have only one experience to share. Back in mid 90s, was tasked with developing a webserver that provided targeted advertising. A requirement was providing the marketing team an accessible mechanism for defining rules. Basic stuff like encoding a marketing/ad-sale team rule such as "show ad of truck if user is male, at some age group". A little scripting language was developed, nothing fancier than conditional branching was involved on the surface. And the user base immediately got it and started using it, because it was a "little language".
Most of what Lua offers will not be a requirement for what the ad ops team needs to create and maintain campaigns and there will be specific requirements that are not easily expressed in Lua.
But this was in spring of '95. Ruby released later in December of that year. Lua was first publicly released in '94. I learned about the existence of these two a few years later (Lua first, and then Ruby via RoR).
mea culpa: there was no reddit or github or HN back in '95. Usenet would have been helpful but it wasn't on my radar in those days. I was just 2 years out of architecture grad school, and not exposed to the CS communities in my student years.
I'd very much like to see someone coming up with nice syntax for hierarchical finite state machines and entity components systems just like people came up with nice syntax for queries in the form of linq and nice syntax for html generation in the form of jsx.
Doing these things in vanilla syntax of general purpose programing languages is not exactly great.
Not to mention the human tendencies to align with certain ideals. For example, does it use 1-based indexing like Lua and Julia? If so i just can't bring myself to use it.
The notions first index, current index, previous index, next index and last index and each index are all invariant under shifts of the index set... yet this is the hill people choose to die on.
But little languages could be a nice interface for the non- or semi-programming tasks. Do you really want your domain experts to fiddle with the core of your application or do you want your programmers to do that? A little language could be a great interface to encode specific business rules and domain logic.
The author gives SQL as an example of a little language and we do indeed already provide SQL interfaces to analysts and let them do their thing.
> Do you really want your domain experts to fiddle with the core of your application or do you want your programmers to do that?
The Curse Of Almost: Your tool is great, it's almost perfect... except for that one little thing it can't do, which your users need to do, which, therefore, leads to masses of ugly hacks unless you provide access to an escape hatch where sufficiently motivated experts can drop down to a real language which doesn't have your DSL's limitations and get the job done.
It's the Curse Of Almost because, if it were too much worse of a fit for the problem, nobody would even think of using it to solve that problem. Getting someone 90% there and crapping out puts your users in a more awkward position, especially if they feel they've invested effort in whatever tool they have.
An example is Talend versus CSV: Talend is an ETL Solution which Extracts data from some source, Transforms it according to a graphical DAG of ideally stateless components, and Loads it into some other storage. It's also a happy, friendly GUI on top of Java, which is nice, because the Real World isn't kind to happy, friendly GUI solutions which expect CSV is going to conform to any of your syntax rules or other misguided preconceptions about files having structure. So, when you have to run a Talend pipeline on vaguely-comma-delimited text files which may once have been machine-readable, you can make your own component which is literally just a block of Java code to parse the file using the Zerg Rush Of Ad-Hoc Rules Technique, an oft-overlooked method for designing parsers. You can also use that kind of thing to make components which are tasteless enough to demand state variables other than the stereotyped kind Talend itself provides.
It seems entirely possible to create a tool for making little languages that also supports interfaces for tooling and documentation. Tooling is actually quite abysmal for general purpose languages and as the article points out the tooling for little languages can be much more powerful when there’s a smaller surface area. Also we could build languages that are primarily geared around tooling and documentation instead of languages designed around different manners of defining functions and iterating over lists.
I also don’t think that anyone has ever suggested that making a custom language is a small endeavor.
Whatever the future of programming languages it will definitively not be popular at first and negative criticisms will be the top-rated commentary. And when the new paradigm comes I can almost guarantee that the majority of the HN crowd will be too old and set in its ways to make the transition. Why would the future be any different than the past with regards to paradigms shifts?
That's not the way to see the process. We have been highly successful at little languages already: they are, in essence, why when I write something like "a = a + 1" I can assume it works identically in C, Javascript, and Python. (Semantically, it doesn't! But it is a portable intent.)
You might object and say, "but variable assignment and addition, that's a big language thing." It isn't, though; it's just an infix expression. And infix didn't pop out of nowhere; it had to be invented as part of the gradual creep upwards from machine level "coding" into a more abstract semantics. Infix parsers are small, and while the complete language is larger, what it's presenting is infix-compatible. "Regex" is the same way: there is a general definition of regular expressions, and then there are some common variants of regex, the implemented semantics.
The boundary between "the language needs its own compiler and runtime support" and "the language exists as an API call you pass a string into, which compiles into data structures visible to the host language" is a fluid one. And the most reasonable way of making little languages involves seeing the pattern you're making in your host and "harvesting" it. In the previous eras, there were severe performance penalties to trying to bootstrap in this way, and so generating a binary was essential to success. But nowadays, it's another form of glue overhead. If you define syntactic boundaries on your glue, it actually becomes easier to deal with over time.
Documentation-wise, it's the same: if the language is sufficiently small, it feels like documentation for a library, not a language.
To be clear, I'm not at all opposed to DSLs. I just think that creating a useful one is much more difficult and expensive than is typically acknowledged in these discussions. Creating a new DSL is probably not the first solution you should reach for before trying alternatives.
or CSS (and every one of it's derivatives), GraphQL, SQL, regex. Maybe I'm misinterpreting, but each one is a language and something I'm currently using or used in the past. Little languages are everywhere.
Yes, to add to your point: nobody has managed to used the "outputs" of the STEPS project to do something useful.
There was a cool "wordprocessor-like" (Franken?) demonstrated which was created with a small number of lines, it should be a huge success in the FOSS world no?
Well no, nobody managed to make it work.
In addition, “little” languages tend to eventually become turing-complete, because you keep needing that little extra bit of functionality.
And then you want to modularize your code because it becomes to big, and you want to create libraries for code reuse.
You end up wanting static typing for the usual reasons, which eventually leads to needing parametric types and recursively defined types, and the type system becoming turing-complete as well.
Or you keep working around the limitations of the little language, writing code generators and wrapping it in general-purpose language APIs.
Would you (mis)use c to do text processing, or would you use shell tools?
I suppose all this leads me to the suspicion that little languages fill in for shortcomings in big languages. Big languages can absorb the things that work, this negating the need for small languages in that sphere.
Although how far can that go? Can we keep making ever bigger languages? Or at some point does it crumble under its own weight?
Regarding CMake’s horrific documentation: I will literally be willing to pay money for someone who can show me how/if it’s possible to wire in a different language to CMake! I believe it’s possible, I’ve seen some functions deep in the crappy docs that make it look like it is, but I cannot for the life of me work it out. The language in question produces C object files!
CMake is pretty simple internally. Every "keyword" is a function call. Every function call is implemented as a class. The simplest way to hijack it would be to have a function that calls out to your external interpreter/tool with the state you want. I could see doing that in a couple of ways:
* Add a builtin command [1] that takes a string or filename and calls the interpreter with any additional data you want to pass.
* Add a flow control command [1] that passes the inline block to the interpreter of your choice. You'd probably have to override cmFunctionBlocker as well for this.
Note that this can't fix the deep design issues in CMake like the insane string representation.
And no, I'm definitely not in therapy from CMake-induced PTSD.
Champion! That’s exactly what I’ve been looking for haha
I’m trying to remove the need for external commands to compile Nim when used with ESP-IDF for embedded firmware development, which is dependent on CMake.
That’s exactly what we’re doing, yeah. --compileOnly gets pretty far, and I’d love to remove the need for having to run that compilation step separately before CMake builds the firmware from the generated C sources
This is a “deepity”. We already do this. We constantly do this in programming:
“The idea is that as you start to find patterns in your application, you can encode them in a little language—this language would then allow you to express these patterns in a more compact manner than would be possible by other means of abstraction. Not only could this buck the trend of ever-growing applications, it would actually allow the code base to shrink during the course of development!”
Functions, frameworks, little languages. It’s all abstractions on top of abstractions. You are shifting the knowledge of the abstraction for the more fundamental knowledge underneath that does the actual work.
You end up just sweeping the codebase growth under some other layer’s rug and blissfully forget about the woes of future maintainers. The code is still there, abstracted and exposed by the “little language”. Hiding this behind a cute moniker doesn’t seduce.
This isn’t the future of programming. This is already programming.
Maybe the author is trying to predict that there will be boom in DSLs like there was for JS frameworks? Funnily enough, I'm just wrapping up a DSL for our in-house web component engine that creates an abstract data layer all components share. The pattern was easy enough that I'll probably build more DSLs like it when an API isn't flexible enough
The better the programming language, the less need for a custom language. You can then create the DSL inside of the host language. This requires flexibile syntax to some degree and a fairly advanced typesystem if it should be statically typed - hence not too many languages are a good choice here.
I don't think there's anything in that comment to indicate that they didn't do it that way. There's a robust discussion about this elsewhere, but people making embedded DSLs is basically routine. They just don't always call them that or have awareness that what they're doing fits that description. But almost all of those will be written in the "host language" of just whatever the containing system happens to be written in.
Some languages are particularly suited for this but other than racket and ruby it seems mostly accidental/incidental to their design.
Here's something I don't understand: How are "little languages" different from a bunch of functionality wrapped into a library/module? Is it just that (with some convenient syntax sprinkled on top), or is there more to it?
I would imagine that most of the value comes from being able to "refactor" thought patterns to match the best way to cleave the domain into composable concepts -- and it seems like we do this all the time (and in all programming languages?).
"How are "little languages" different from a bunch of functionality wrapped into a library/module?"
This is called a shallow embedding in the Haskell world.
Deep Embedding is more like writing a full blown interpreter.
Then there is tagless final (Oleg Kiselyov) - it feels shallow but is more flexible as simple library functions and it is optimisable like deep embedded DSLs.
I agree, they are the same! regex is an example of an eDSL that most general-purpose languages support. The king of eDSL:s is of course Haskell. I recommend looking up parser combinators for anyone who has ever struggled with an understanding way to complex regex expressions.
An “internal DSL” is a library with a a design which makes it “feel” like a language. JQuery is the classic example. An “external DSL” has its own syntax, like regexes.
This is what I always heard Lisp was best at. Instead of making totally new languages (with parsers, tooling, etc) you'd create little DSLs within your own code in the form of macros: come up with a "little language" for describing one part of your app, write a macro for it, and then it integrates smoothly with everything around it
Whether or not you agree this philosophy is a good one, and whether or not you like Lisp specifically, I think we can all agree that macros (in whichever language) are a much better way to do it than creating a bunch of tiny languages from scratch. I was surprised not to see the word "macro" appear in the article at all
Lisp macros are nice, but as Gumby said above [0] a common approach is to use Lisp's quoting abilities to construct a data structure that represents the problem at hand (in its own terms) and then create functions to parse / manipulate that data structure.
A classic example comes from Peter Norvig's "Principles of Artificial Intelligence" wherein he defines a subset of English grammar as a data structure [1]:
'((sentence -> (noun-phrase verb-phrase))
(noun-phrase -> (Article Noun))
(verb-phrase -> (Verb noun-phrase))
(Article -> the a)
(Noun -> man ball woman table)
(Verb -> hit took saw liked)))
He then goes on to define a function "generate" that uses the above to create simplistic English sentences.
Additional rules can be added by a non-programmar so long as they understand how their domain logic has been mapped to Lisp.
Macros are very hard to write tooling for, and hence difficult to author and use. I think that, perhaps something like Polyglot GraalVM[1], which was designed to host many languages and let them all seamlessly talk with each other, while automatically making existing tooling "just work" with any new languages created with the framework [2] would be a better way.
> I’ve become convinced that “little languages”—small languages designed to solve very specific problems—are the future of programming,
Yes, but over time very specific problems become bigger/different problems which the little language isn't ideal for, the original developers move on leaving someone new to figure out the problem and language which is probably poorly documented and very brittle. Application developers probably aren't suited to writing and maintaining language code.
The only caveat is an external system provided with its own language - like RDB/SQL - which is proven and well maintained - but its hard to call SQL a little language.
Join any company and organisation and look at their build and deployment tooling.
Unless they are using Kubernetes and even then, you shall find a very complicated bunch of languages:
- shell scripts
- Dockerfiles
- Kubernetes YAML
- Makefiles
- Bazel
- Ansible
- python scripts
- Jenkins XML
- Groovy scripts
- Ruby scripts
- CloudFormation
- Terraform
- Fabric or other deployment deployment script
It's very hard to fit together and understand from a high level.
The last thing they were working on at my previous company was a YAML format to define a server, to go through the organisational structure of the company to manage computer systems.
Some people mentioned LISP in this comment thread. For me LISP is an intermediate language, I would never want to build a large system in LISP. It's not how I think about computation.
Can you expand on you mean when you say that Lisp is not ‘how [you] think about computation’. I have never seen that phrase used regarding Lisp, usually it’s mentioned in regards to logic or other much less mainstream paradigm than the imperative/semi-functional paradigm in Lisp.
I see LISP as being useful for codegen and intermediate representation and AST representation but I wouldn't want to program with it directly without a tool to create a structure that is understandable to me. All the parenthesis!
I wouldn't want to maintain or work on a large Clojure codebase written by other people. I've done that three times.
For reference, I think Python is easy to write and read and understand.
I wrote a simple toy multithreaded interpreter and I've written part of a compiler that does codegen to target the imaginary interpreter. It's basic but my AST is a tree that could be represented in LISP. I use a handwritten recursive descent pratt parser. The language looks similar to Javascript.
I know it can fixate your thinking if you think of it too much in this manner, but I think of modern computers as turing machines. They loop or iterate through over memory addresses which are data or instructions and execute them.
That said, my perspective is not traditional. I design and try implementing programming languages. I am interested in the structure of problems and code, asynchrony, coroutines, parallelism and multithreading more than anything else. Even more than type systems.
I think the expression problem is a huge problem that doesn't have good solutions for managing complexity.
I find other people's LISP code to be difficult to read whereas I can understand a Python, Java algorithm.
What am I trying to say? The structure of the program in the developer and compiler's head is different from the instructions actually executed by the computer. LISP is nearer to the instructions executed by the computer than what exists in my mind. In my mind exists relationships, ideas more complicated and not structured in post order traversal. A post order traversal of LISP is the codegen.
It doesn't have to be a big standalone DSL with a separate compiler or preprocessor. It can also be an embedded little language, like when you sprinkle HTML templates throughout your normal general-purpose language, and as only a syntax extension: https://docs.racket-lang.org/html-template/
(Aside: I'm seeing tasteful Racket and Scheme influences in Rust, even though they're very-very different languages. I'm hoping to contribute a little more influences.)
DSLs are not a replacement for but a complement to any existing language, general purpose or specialized. I have come to think of DSLs as programs for writing programs (similar to but not identical to macros). With a DSL, you can specify the grammar of a specific problem/program. Once that has been done, it is often quite straightforward to implement the grammar in any number of target languages. As an application developer, this may not be a huge advantage (though DSLs can also shine in any client/server interactions), but if you are a library author this can be very compelling because your library may be easily portable to most commonly used language runtimes in a generally rote kind of way. The port might not be optimal, but it should be correct, provided the high level logic of the DSL is. Performance optimizations can be done where needed.
What is great about this approach as an individual is that it requires you to tighten your ideas. When you have to implement all of the functionality in a DSL, you really start thinking about what you truly need. A big language nudges you towards using all of its features while a small language challenges you to consider what is truly essential.
Of course DSLs always run the risk of being write only and/or only comprehensible by the original author. Like any powerful tool, DSLs should be used judiciously and responsibly. Often that isn't the case, in part because I don't think the tooling for writing DSLs is generally very good. But I am betting that new tools that make DSL writing easy will have a profound effect on software development.
Calling SQL a "little language" must surely be a joke. Even if you ignore the differences between its many dialects, SQL is only "little" in that it is domain-specific rather than general purpose.
In fact the entire article seems to boil down to "DSLs are the future", which I'm sure I've seen articles about back when Ruby on Rails was dominating web technologies, Cucumber (and its various ports) created "BDD" testing fad and DevOps started gaining traction on top of various "Ruby DSLs" used as configuration formats.
I don't think DSLs are going to go away any time soon. But there is a trade-off between domain-specific "little languages" and general purpose programming languages (or "DSLs" that are actually subsets of the latter). It can be fun to have to work with a little language, it's not so fun to work with dozens of them, each with different rules you have to memorize, instead of just being able to use the same language for all (and in truth, this was the source of the Ruby DSL craze because developers were already using Ruby on Rails).
hm, im a little unhappy about the author comparing Knuth's solution to a handfull of shell utilities.
for one, the author says knuths program written in WEB was 10 pages long, discounting the fact that these 10 pages are HEAVILY annotated.
my other point is:
tr has 1917 LOC,
sort has 4856,
uniq has 663
and sed is in its own package at around 10 MB
all including comments and docs for sed
it's fine and good that you can use composition with shell utilities, but come on, write that example program in C99 and you'll be a not very happy coder at all. in general i find the comparison rather rude. Knuth was supposed to show ?his? programming language WEB and as a "critique" McIlroy farts out a shellscript like "lmao first".
indeed, you do not often need to count word frequencies.
but what was this article supposed to be really about? software engineering 101 aka dont-reinvent-the-wheel/DRY?
As you say, Knuth was asked to demonstrate his literate programming... In some ways this is a direct request for the non-pithy, articulated, first principles answer. I would more say Knuth was set up than that he was framed, but tomato-tomato. :)
The size of the language is a red herring. You really just want programs that are well structured, which can be greatly helped by choosing a "perfect language" for each task, but often helped just as well by choosing (or creating) a great library to express the business logic.
Unlikely. On one hand notations should be a commodity and they indeed should be unique to the task. There is no point and no way to try making a unified notation for music and chess. On the other hand there is no point to make multiple notations for the same thing. And programming is indeed the same thing from lowest to highest levels, composable like a Russian doll, or we won't be able to build large systems. So the future of programming is a single notation that actually reflects what programming is. We do not have it yet, this is why we have so many "programming languages".
I've seen other replies highlighting Lisp in general and Racket in particular, but when it comes to a "little language" I think it's valuable to have a link to Paul Graham's article on The Roots of Lisp: http://www.paulgraham.com/rootsoflisp.html
It underlines all the points already made here about Lisp, and also includes a CL version of the original code. 67 lines including comments and empty lines. Endless possibilities.
The Rust `nom` parser crate works very well IMHO for creating little languages, DSLs, custom markup, and your own kinds of annotations, all with the Rust compiler guarantees and good speed.
Oh, please, making reasonable good DSL might take a month or even years.
Also wrestling with parser and borrow checker not an easy task for average user.
I'm no Rust zealot but the borrow checker is arguably one of the core benefits for the average user. Wrestling with it means they are not yet ready for systems programming and need to understand move semantics more deeply.
Borrow checker is fine. But from the library writer perspective its pain and take enormous amount of time to make it sound.
One does not simple checkout "nom" and test thing in few minutes.
I've always regarded the principal task when implementing a program to be creating the necessary data structures and a set of functions that represent a vocabulary for manipulating them. So every program I write contains a small domain specific language. Isn't this what everyone does?
'Official' small languages only make sense when the user base is large, this is why SQL exists but a mainstream language for manipulating Maxwell's equations does not.
DSLs and general purpose languages are not comparable. DSLs are generally designed to handle data, while general purpose languages can handle everything. Where DSLs fall apart is when you either need to (a) work with I/O in a different format than the DSL handles data in (including working with hardware) or (b) the scope of data changes into something that now incorporates information outside the DSL’s domain.
For (a), a simple example is an application that reads information from a DAQ, stores it in a database in a compressed format, and later sends the data to a printer. We have a DSL that can easily implement the database, but we need to translate the data from the DAQ into a compatible input, so that requires a different language. Furthermore, the printer requires postscript, so we need some other language or tool for that. Then we need to figure out how to glue it all together. In some cases, it becomes tempting to try to ‘hack’ the DSL into trying to do things it was never designed for.
Or we could just use a general purpose language and libraries.
If a lot of little languages /DSLs were going to be practical as another layer of abstraction up from APIs, I think it would have to be implemented in something like Unison. Unison is made so that that you can have different syntaxes for the language or parts or the language as the language is not stored as text,but more or less the abstract syntax tree.
Then again so is Lisp more or less.
I suspect that instead what will happen as the next big thing in terms of new abstractions in programming is that AI code generators will keep getting better and get better tooling and that's going to be the tool of choice for high level needs.
While these AI tools may not be that accurate today there is vast potential in improving the models and tooling. In the IDE it could be more like you describe what you want, it generates some code and tests and then run the tests and maybe a visual time traveling debugger so you can visualize what it's doing to see if it's doing the right thing.
My concern was how will putting these little languages together work and it turns out that the author also had these questions.
Where do we draw the line on "enough DSLs" for example? And what happens to the gains from using several DSLs in tandem as opposed to a high-level language with libraries that accomplish the same thing?
Author and comments are sticking to text based languages for some reason.
I think visual ones have proven to be more accessible to non-programmers. There are downsides ofc ... for programmers =) As a programmer myself I do not think it will be great to push visual languages too far but business is business.
I am currently building a tiny language designed only for data manipulation in a long-term personal project : data made of number/strings/booleans in maps or lists goes in, can be re-shaped, and goes out. It's written in Typescript as part of a web app.
It just has map/list access & creation, string manipulation & concatenation, basic arithmetic/comparison, lambdas as first-class values, function calls. No way to do I/O, just data in-data out.
I need some non-programmers friends to collaborate on the tiny data transforms and it is quite easy to write a 1:1 text <> visual editor for this language thanks to its limited features, allowing to go from visual to text or text to visual. The exercise in itself is interesting and worth it.
I think I wouldn't do that in a commercial context though.
While I also believe several commenter’s opinion that libraries, frameworks can easily become DSLs themselves (e.g. isn’t Java streams basically a DSL for stream processes inside another language?), one really can’t talk about polyglottism without mentioning the ingenuity of GraalVM.
You basically write a dumb, easy parser and AST interpreter for your language, and it will magically turn into a JIT compiled dynamic language with state-of-the-art GCs and better performance than what you could likely come up with. And the best thing is that it really unifies the computing model where you can pass a python object to some js lib, effectively giving you every library ever written for any (implemented) language, which is the real productivity booster.
Yes but library that turns a part of syntax of language into DSL to solve certain problem have advantage of developers already knowing that language, developers needing to learn only one language, and ability to use with that DSL anything else language libraries offer.
The problem is really "does the language app uses works for target audience that will be doing the DSL". If it does (Ruby makes pretty decent one), job done, if it doesn't, we end up in the mess of making toy languages (usually because developers want to do something fun and writing new small language can be pretty fun) or plugging something like Lua into it.
> For example, SQL is a little language for describing database operations. Regular expressions is a little language for text matching. Dhall is a little language for configuration management, and so on.
> There are a few other names for these languages: Domain-specific languages (DSL:s), problem-oriented languages, etc. However, I like the term “little languages”, partially because the term “DSL” has become overloaded to mean anything from a library with a fluent interface to a full-blown query language like SQL, but also because “little languages” emphasizes their diminutive nature.
Ahem, so is SQL diminuitive or not? Because, SQL is NOT diminuitive. SQL is Turing complete.
I think this works best if the little languages all share as much syntax and semantics as possible. A good example of this is OpenBSD's assortment of configuration file syntaxen for OpenSMTPd, pf, httpd/relayd, etc.; each of those "little languages" differ considerably in their problem domains, but they all seem to share a vaguely-Tcl-ish syntax and have largely converged in semantics and typical structure.
Another important consideration is that these languages are typically best when declarative as possible; if you can avoid Turing-completeness and stick entirely to something representing static data, then that's the ideal.
Interesting article, I immediately had to think about terraform. The problem I'm having with specialized languages (even with SQL) is that it always creates additional interfaces and almost always creates ugly string formatting if you want to integrate (SQL) or ugly duplication (Terraform) if you keep it separate.
I like the idea of abstraction but in my mind it is very easy to have the power of a "little language" inside an all purpose language by using a package. E.g. SQLAlchemy or Pulumi as the alternative to the little languages of SQL and TF.
I've always enjoyed reading this book that espouses this same idea:
Constructing Language Processors for Little Languages 1st Edition by Randy M. Kaplan [1].
It's no longer being produced, but used it's $6 or less.
It's dated, from 1994, but it is a fun enjoyable discussion on the benefit of tiny specific languages. It also has a nice tutorial on the use of lex and yacc.
I think it's more likely the future will be ever more 'languages' which conform to a schema but are simply expressed as JSON/YAML. Being able to trivially deserialize with a simple JSON.parse or equivalent provides a huge head start.
DSLs defined in a type/schema system atop JSON/YAML end up being far easier to write tools around than DSLs which require a custom parser (e.g. Dockerfile.)
That said, there are definitely a subset of languages like JSONPath that would not work written out as an AST.
Having dabbled in this a bit myself (using small dsl's to solve specific problems), some problems that immediately occur to me:
- more difficulty on boarding
- more difficulty adding new features
- more challenges with best practices/linting/code reviews
- different runtime behaviour between different languages
This all adds up to more complexity.
New languages are often really neat, and enjoyable for their own sake, but I'm not all that interested in maintaining a large swath of different languages for different tools.
A huge problem with this approach is that very few general purpose languages reach the point where they have big standard libraries, great external libraries and tools on top of them. Every time someone invents a new language it wipes away the decades of progress of the general purpose language that could be used instead, so it needs to have a massive advantage or be completely necessary.
Among the problems this article ignores is the relatively large fixed cost needed to go from a framework/internal-DSL to a separate "language." Unless you are in a situation where lots of people have that same problem, then it actually creates more complexity than it solves to have an separate language to solve the sub problem.
My first goal with a challenge at the office is "make the problem smaller".
Now I have a folder of little one-offs and REPL scraps. A triumph of tactics over strategy that defies passing on to anyone who didn't author it.
Looking at the JVM and JSON, I wonder to what degree that languages contribute some piece of goodness or idea toward some final Tool To Rule Them All...
Empirically speaking, it seems more like rich languages are the past, present and future of programming. Rich as in, has a strong standard library. Python, go, etc.
The only "little" languages I can think of that I'd reasonably ask people to use at work are lua, make(~), awk, and (ba)sh.
Wouldn't that require programmers to use 50 languages in a project though?
Assuming that this is the future of programming, the number of languages that a programmer would have to learn to create a simple project would be ridiculous.
Why use many languages when you can do the same process in one? Sure, even if the "big" language isn't specifically made for a certain job, doing that job in a little language requires time for the programmer to learn the little language.
The syntax may be different, so to convert from the big languages to little languages it would take time. This is why this hasn't happened yet; people are lazy to learn new languages so they just learn the "big" languages that can complete all the tasks that they require.
I am very familiar with DSLs but had not heard them called "little languages". I cannot say I find it fitting as it makes me think more of languages with reduced syntax or semantics.
I haven't done any research on this but I suspect that usage predates "DSL," or at least was popular before the term DSL was widely known and used for it. I mostly see "little language" used in older sources, mid-90s or earlier.
It seems to be making a comeback though! I prefer it too. It's not really more descriptive than DSL but it's not much less, and is less jargony and just cuter.
If the point is that DSLs will be around, sure. They already are. If the point is that we'll replace complex full-featured languages with a stack of DSLs - no way.
just what i want, a zillion little languages to remember their syntax and semantic quirks.. its bad enough the way it is right now where everyone tells me "the right tool for the right job" then i get there and realize they are all the same tool
AI will kill the need for all these abstractions. There will be as many or as little abstractions you want. It doesn't matter. New languages of all shapes and forms will be generated eventually on the fly. You will just have to know what you want.
I was a little surprised he took until the final paragraph to mention
Racket. If he had played around with it then I think the article might
have read differently.
" Racket is a Lisp dialect that’s specifically designed for creating
new languages (a technique sometimes referred to as
language-oriented programming). I haven’t had time to play around
much with Racket myself, but it looks like a very suitable tool for
creating 'little languages'."
It was common in the 60s and 70s to have the hardware manufacturer ship all the OS and languages with their hardware. The languages were often designed for specific problem domains. The idea of general purpose languages (FORTRAN, PL/1, etc) was uncommon. You can see this in K&R (the original edition anyway) where they justify the idea of a general purpose language, even though C itself derived from prior general languages (B & BCPL) and they had gotten the idea from their experience on Multics (written in PL/1, a radical idea at the time). So a 20 year old idea was still barely diffused into the computing Zeitgeist.
Most Lisp development (since the early 70s at least) is writing a domain-specific representation (data structures and functions) and then writing your actual problem in it. I used both Lisp and Smalltalk this way at PARC in the early 80s.
More rigid languages (the more modern algolish languages like python, c++, rust, C, js etc -- almost every one of them) doesn't have these kind of affordances but instead do the same via APIs. Every API is itself a "little language"
What are called little languages in the Bently sense is simply a direct interface for domain experts. And after all what was a language like, say, Macsyma but a (large) "little language"?