Dgsh – Directed graph shell

esafak · 2025-09-30T15:51:20 1759247480

This would have been great 10-20 years ago, or even at the coining of Unix pipes. By today's standards, however, the syntax feels clunky and dated. I'd like to see contemporary shells like nushell and elvish copy these ideas, with attribution of course, in a more modern way. That is the best way I can see to honor this stagnant project: https://github.com/dspinellis/dgsh

DSpinellis · 2025-09-30T17:06:04 1759251964

I went through two iterations before adopting the current syntax. Truth is neither me nor Doug McIlroy, the inventor of Unix pipes, who kindly and generously provided feedback during dgsh's development, had something better to propose.

What syntax would you propose?

esafak · 2025-09-30T17:32:26 1759253546

Greetings, Diomidis.

I would suggest a familiar notation like "[a, b] -> c" in a dedicated dag block:

  dag text_stats {
    tee -> [ split_words, count_chars ]

    # word-based frequencies
    split_words -> tee_words
    tee_words -> ngram2 -> save_digram
    tee_words -> ngram3 -> save_trigram
    tee_words -> ranked_frequency -> save_words

    # character-based frequencies
    count_chars -> add_percentage
    chars_to_lines -> ranked_frequency -> add_percentage -> save_chars
  }

  run text_stats < input.txt

https://www2.dmst.aueb.gr/dds/sw/dgsh/#text-properties

or

  dag commit_graph {
    git_log -> filter_recent -> sort -n -> [ uniq_committers, sort_by_email ]

    uniq_committers -> [ last_commit, first_commit, committer_positions ]
    [ last_commit, first_commit ] -> cat -> tr '\n' ' ' -> days_between

    [ committer_positions, sort_by_email ] -> join_by_email -> sort -k2n -> [ make_bitmap_header, plot_per_day ]

    [ uniq_committers, days_between ] -> emit_dims -> plot_per_day

    make_bitmap_header -> cat
    plot_per_day -> morphconv -> [ to_png_large, to_png_small ]
  }

  run commit_graph

https://www2.dmst.aueb.gr/dds/sw/dgsh/#committer-plot

The translations above are computer-assisted and may contain mistakes, but you get the idea.

shanemhansen · 2025-09-30T18:26:56 1759256816

The closeness of this syntax to graphviz dot is very interesting.

having dgsh output a graphvis file in dry-run mode would be a neat feature.

DSpinellis · 2025-09-30T17:59:19 1759255159

Thank you for the suggestion. This would mean that you'd also then create some mapping from each name (like git_log) to its implementation, right?

esafak · 2025-09-30T18:00:59 1759255259

Yes, using shell functions:

  git_log() {
    git log --pretty=tformat:'%at %ae'
  }

Separating function definitions allows you to run, test, and re-use them.

DSpinellis · 2025-09-30T20:20:02 1759263602

And, more importantly, assign a name to a process, so that it can appear multiple times in the graph.

UltraSane · 2025-10-01T00:07:02 1759277222

You might want to try looking at the Neo4j query language Cypher for some possible inspiration for the syntax.

zokier · 2025-09-30T17:00:33 1759251633

Well, the project started 12 years ago (as sgsh), so that fits into your 10-20 years ago window :)

hnlmorg · 2025-09-30T16:15:26 1759248926

Murex has had this capability for years. (https://github.com/lmorg/murex)

I’m on my phone at the moment and cooking so cannot type any examples, but if I get time, I’ll throw together some comparisons later tonight

esafak · 2025-09-30T16:24:47 1759249487

I could not find any mention of DAGs or directed acyclic graphs in the documentation.

hnlmorg · 2025-09-30T17:01:41 1759251701

Yeah it’s not technically DAG since it uses iteration, but then dgsh will use iteration under the hood too.

However Murex does support CSP-style concurrency. So while there’s no syntax sugar for writing graphs, you can very easily create adhoc pipes and pass them around instead of using stdout / stderr.

So it wouldn’t actually take much to refine that with some DAG-friendly syntax.

In fact maybe that can be my next project…

DSpinellis · 2025-09-30T18:31:36 1759257096

I'm curious: what do you mean by "dgsh will use iteration under the hood too"? Dgsh does several things under the hood, but I wouldn't characterize any of them as iteration.

hnlmorg · 2025-09-30T18:47:25 1759258045

Yes you’re right. My apologies. I was glancing at the examples while cooking, specifically the git example (https://www2.dmst.aueb.gr/dds/sw/dgsh/#commit-stats) thinking that it was iterating over the lines output from git, but clearly that’s not even how bash would work. That will teach me for commenting without giving something my full attention first doh!

Looking properly at this, I can see no iteration is needed. Which actually makes the Murex implementation even easier because Murex already has tee pipes just like dgsh. It’s just not (yet) particularly well documented.

DSpinellis · 2025-09-30T18:55:53 1759258553

Admiring your multi-tasking!

hnlmorg · 2025-10-01T14:17:35 1759328255

Haha thank you but it’s really not that good otherwise I’d have grokked the iteration thing before making a fool out of myself!

em-bee · 2025-09-30T18:49:24 1759258164

would you be able to share or point to some examples? i am curious.

hnlmorg · 2025-10-01T14:18:56 1759328336

Sorry, it’s been a very busy couple of days so not had 5 minutes to test anything.

Stay tuned though, What I’m going to do is write a blog post about it. It’s an interesting enough topic to deserve one

em-bee · 2025-10-01T20:46:47 1759351607

awesome. i am looking forward to that. that blog on the murex website could use some attention anyways :-)

o11c · 2025-09-30T17:10:03 1759252203

Frankly, I find that anything more than some preparatory `exec {my_fd}< <(commands ...)` is an unmaintainable mess, so bash is plenty for any program that should be implemented in bash.

DSpinellis · 2025-09-30T18:01:28 1759255288

Manually playing around with fds is definitely unmaintainable. My hope is that a clean syntax can help create maintainable complex pipelines.

dang · 2025-09-30T19:09:47 1759259387

Related. Others?

Dgsh – Directed Graph Shell - https://news.ycombinator.com/item?id=21700014 - Dec 2019 (11 comments)

Dgsh – Directed graph shell - https://news.ycombinator.com/item?id=13352659 - Jan 2017 (51 comments)

politician · 2025-09-30T17:06:51 1759252011

A solution to the One Billion Row Challenge (1brc.dev) written in dgsh would be a interesting as a benchmark.

DSpinellis · 2025-09-30T17:29:17 1759253357

Nice benchmark! This is a (not at all efficient) awk one-liner.

awk -F\; ' $2 > max[$1] { max[$1] = $2 } !($1 in min) || $2 < min[$1] { min[$1] = $2 } { sum[$1] += $2; count[$1]++} END { for (n in sum) printf("%s=%.1f/%.1f/%.1f, ", n, min[n], sum[n] / count[n], max[n])}'

Can't see how dgsh could be applied to it.

jimbokun · 2025-09-30T15:22:22 1759245742

This is very interesting, but I'm wondering how it compares to just using a dynamic language like Python or Ruby for the same tasks. Curious how the line count to express the same tasks would come out.

everforward · 2025-09-30T16:57:04 1759251424

From a glance, it looks like very similar tradeoffs vs bash. Much harder to read in a medium-large application, but much more ergonomic IO and process control.

I.e. much faster to use dgsh for a basic processing DAG, much more painful to use dgsh for a large ETL pipeline.

Python with something like Prefect isn't something you'd use a REPL to bang out a one-off on, but it'd be more maintainable. dgsh would let you use a REPL to bang out a quick and dirty DAG.

DSpinellis · 2025-09-30T17:10:28 1759252228

I've found creating pipelines with Python to be messy and intuitive. Other than creating a DSL to express them I can't see how DAGs can be expressed naturally with Python's syntax.

Even creating tools in Python that can be connected together in a Unix shell pipeline isn't trivial. By default if a downstream program stops processing Python's output you get an unsightly broken pipe exception, so you need to execute signal.signal(signal.SIGPIPE, signal.SIG_DFL) to avoid this.

PaulHoule · 2025-09-30T15:34:08 1759246448

There is a lot of stuff for Python which follows the "express computation as a dag" approach, especially Apache Airflow

https://airflow.apache.org/

croemer · 2025-09-30T17:17:42 1759252662

I was curious but the docs are a nightmare. I clicked through a couple of pages and couldn't see a single simple non-trivial example.

DSpinellis · 2025-09-30T16:58:56 1759251536

Apache Airflow solves a very different problem. Its DAGs are static dependencies between sequentially executed processing steps, whereas the DAGs of dgsh express live direct data flows.

PaulHoule · 2025-09-30T17:20:12 1759252812

Yeah, there are also the boxes and lines tools like

https://www.knime.com/

which have their own subculture. You could solve the same problems they do with pandas and scikit-learn but people who use those tools would never use pandas and scikit-learn and vice versa.

Circa 2015 I was thinking those tools all had the architectural flaw that they pass relational rows over the lines as opposed to JSON objects (or equivalent) which means you had to realize joins as highly complex graphs where things that seem like local concerns to me require a global structure and where what seems like a little change to management changes the whole graph in a big way.

I found the people who were buying up that sort of tools didn’t give a damn because they thought customers demanded the speed of columnar execution which our way couldn’t deliver.

I made a prototype that gave the right answers every time and then went to work for a place which had some luck selling their own version that didn’t always give the right answers because: they didn’t know what algebra it supported, didn’t believe something like that had an algebra, and didn’t properly tear the pipeline down at the end.

jpitz · 2025-09-30T20:10:12 1759263012

Do you mean to say that two non-dependant tasks in an Airflow DAG aren't able to concurrently execute? Thats not my experience. I'm also confused by the use of 'static' in this context.

DSpinellis · 2025-09-30T20:24:42 1759263882

That's the point: non-dependant tasks can run concurrently in Airflow. In sh/BAsh/dgsh dependant tasks can also run concurrently, as in tar cf - . | xz.

jpitz · 2025-10-01T02:10:51 1759284651

Ok. thank you!

sunshine-o · 2025-09-30T15:54:34 1759247674

I respect Python but the upgrade to Python 3 showed that data processing workloads that can be handled by standard Unix tooling should stay there.

The upgrade was a nightmare for so many organizations. It shouldn't be that way but it was.

procaryote · 2025-09-30T16:53:55 1759251235

spawning shell commands and the equivalent of piping is surprisingly hard in python. It's almost easier to do in C

There are probably libraries that could help, but then you need to install dependencies which is sad in python for other reasons

croemer · 2025-09-30T17:14:24 1759252464

We use snakemake a lot in bioinformatics to take advantage of parallelism in workflows while staying close to Python: https://github.com/snakemake/snakemake

Others use nextflow but that requires learning Groovy and it's less intuitive.

UltraSane · 2025-10-01T00:05:35 1759277135

As someone who loves graphs and Neo4j I wish I had though of this.

byearthithatius · 2025-09-30T19:33:26 1759260806

Interesting. What are the benefits of thinking of data pipelines in terms of a DAG? Why cant it be cyclical with exit conditions?

DSpinellis · 2025-09-30T20:22:12 1759263732

A nicer syntax and a lower probability of deadlocks.

willjp · 2025-10-01T02:34:39 1759286079

This is the nerdiest thing I've ever seen, and I absolutely adore it.

gigatexal · 2025-10-01T02:39:28 1759286368

Same! As a data engineer I have long wished we could get away from airflow and move back to Unix pipelines. This is really cool.

reddit_clone · 2025-10-01T20:36:28 1759350988

Does the build work on Mac OSX?

uncletaco · 2025-09-30T15:10:20 1759245020

Hello. In English this makes me think of the phrase “dog shit”. Not sure if that’s intentional or not.

pentaphobe · 2025-09-30T15:22:35 1759245755

Second English speaker here who didn't make that connection at all

nasretdinov · 2025-09-30T15:25:14 1759245914

English is my third language and I can confirm I didn't even think about this

rirze · 2025-09-30T15:32:50 1759246370

Same english is my fourth language and it didn't even appear to me

DSpinellis · 2025-09-30T16:59:56 1759251596

Author of dgsh here. This is definitely not what I had in mind.

DonHopkins · 2025-10-02T18:17:35 1759429055

Yeah, and the Slack logo designers claim they didn't have penises or swastikas on their mind, but here we are.

https://www.buzzfeednews.com/article/nicolenguyen/slack-new-...

lucideer · 2025-09-30T18:53:53 1759258433

Another English speaker data point here & I actually read dogshit before I read dgsh.

DonHopkins · 2025-09-30T15:43:03 1759246983

That's what I think when I hear "bash".

dotnetcarpenter · 2025-09-30T20:15:18 1759263318

That's literally the word for shit in Norwegian: bæsj https://upload.wikimedia.org/wikipedia/commons/6/69/Nb-b%C3%...

cestith · 2025-10-01T13:29:48 1759325388

The word “dash” is a word for shit in English (as in dashboard - literally the board on a buggy or wagon to deflect the horse droppings). That doesn’t keep a shell from being named that. Of course, dash also means to move quickly so it’s not the only meaning. Moving quickly seems to be the inspiration for the shell’s name.

nasretdinov · 2025-10-01T11:01:35 1759316495

Well, bash as a shell is ok. Writing complex scripts on the other hand...

kbr2000 · 2025-09-30T19:25:24 1759260324

batshit?

AdieuToLogic · 2025-10-01T01:46:36 1759283196

> In English this makes me think of the phrase “dog shit”.

In English, this makes me think of the phrase "dig shell". I guess we just have different things on our minds...

:-p

DonHopkins · 2025-10-02T18:14:23 1759428863

Or shoes.

SoftTalker · 2025-10-01T02:59:52 1759287592

First thing I thought of also.

goldenCeasar · 2025-09-30T15:46:22 1759247182

Now I can't unconnect this, I hope OP was aware because now he wont forget too.

em-bee · 2025-09-30T18:15:13 1759256113

and no matter how much i try, i can't make the connection. best i can come up with is dogshell, and even that is a stretch. phew...