Json-Base – Database built as JSON files

bvinc · on July 2, 2020

Someone at my old company basically did this and put it into production.

The first problem he encountered was that multiple connections couldn't both be using the database at a time without clobbering each other. "No problem," he thought, this is a good use case for micro services. A service sitting on top would ensure that there was only one operation being performed at a time.

Next, his problem was that the database would get corrupt sometimes when something bad happened in the middle of writing the file. His solution was to put the entire JSON format inside of a JSON string. If it could be parsed successfully, then he knew the whole file was written. Then all he needed were "backup" files for each table, in case the current one was corrupt.

Next, his problem was that querying and iterating through a large table performed badly, since it required parsing the entire thing first. Querying several times required the whole file to be parsed every time. The solution was to move SOME of the tables over to JSON-inside-SQLite.

EDIT: Oh yeah, the next problem was how to structure the data inside of sqlite. He decided to make a single table called "kitchen_sink" that held every JSON value. There was a column that said which "collection" it belonged to. There was another column that represented the row's primary key. So you could quickly query for a collection name, and a primary key, and get the full JSON row.

So the next problem was that you couldn't query quickly for things that weren't the primary key. So new columns had to be added called "opt_key1" and "opt_key2" where certain rows could put key values, and indexes could be added on those columns, so you could quickly query by it's first optional key, or it's second optional key.

btown · on July 2, 2020

> So the next problem was that you couldn't query quickly for things that weren't the primary key. So new columns had to be added called "opt_key1" and "opt_key2" where certain rows could put key values, and indexes could be added on those columns, so you could quickly query by it's first optional key, or it's second optional key.

It's all fun and games until you realize DynamoDB works more or less the same way: https://docs.aws.amazon.com/amazondynamodb/latest/developerg...

waheoo · on July 3, 2020

"webscale"

btown · on July 3, 2020

Which reminds me: it's almost the 10 year anniversary of "Mongo DB Is Web Scale" https://www.youtube.com/watch?v=b2F-DItXtZs

methodin · on July 3, 2020

A lot of things are hard to do in DynamoDB but for the scenarios it excels at there are not many peers.

boynamedsue · on July 2, 2020

It's easy to get a little laugh from this, but congrats to the guy for exploring. Now he knows first-hand the inordinate challenge and can describe it in detail, but more importantly avoid these hard-learned patterns later.

teej · on July 2, 2020

Great for the dev, terrible for everyone else who had to deal with it -in production-.

xienze · on July 2, 2020

> but more importantly avoid these hard-learned patterns later.

Depends, I’ve known people who have gone through similar experiences and still poo-poo all those “unnecessarily bloated” solutions like a proper database.

colecut · on July 2, 2020

I make crappy thrown together frontends and will probably forever poo-poo the 'unnecessarily bloated' js frameworks

voidfunc · on July 2, 2020

I worked with someone that was keen to use reductionist logic and arguments...

we ended up with a lot of shitty solutions to problems that were hard to maintain, hard to extend, and hard to use because the more "complex" solution was really just a fancy version of a folder and some text files.

theturtletalks · on July 2, 2020

He took "What I cannot create, I do not understand" to a whole new level.

raxxorrax · on July 3, 2020

Every database probably hits similar problems in development. But it is probably better to just use sqlite if you need it to help out.

Groxx · on July 2, 2020

AFAICT (not being Node-fluent) this doesn't even use atomic file writing strategies :| So yeah, all of these are pretty likely to happen with this lib.

Just use SQLite, people. Even JSON-in-SQLite is still likely to be an improvement.

badsectoracula · on July 2, 2020

> Next, his problem was that the database would get corrupt sometimes when something bad happened in the middle of writing the file.

I'm not sure i understand how this can happen... unless you try to update JSON in-place (which is a very bad idea for any text-based format), what you do is encode/write the entire JSON from scratch. So either the file is written properly or it isn't written.

Honestly from the entire message it doesn't sound like JSON was a bad idea but that your coworker didn't know what he was doing and if he was doing something else then he'd still be doing big mistakes.

ComputerGuru · on July 2, 2020

Here, have a read. Warning: if you haven’t done low-level development, you might walk away wondering if your entire life has been a lie.

https://danluu.com/file-consistency/

badsectoracula · on July 4, 2020

I've read that at the past but this article has an issue:

> if there's a crash during the write

It never brings up how can there be a crash in the first place (also the entire article is too filesystem specific).

Groxx · on July 7, 2020

Any crash. Power failure, hardware failure, kernel panic while working for another process, software bugs, etc.

It's impossible to make crash-free systems. If your goal is to maximize the chance that your data remains valid, you have to plan on that.

SXX · on July 3, 2020

Truly amazing read. Thanks for sharing.

_8huj · on July 3, 2020

I learned some things! Thank you. :)

oxymoron · on July 2, 2020

1. A cosmic ray storm turned all ASCII charactees into ECBDIC

2. Lightning struck the 12 V feed and upped the voltage to 10 MV, turning all 0 and 1’s into 6’s

3. Someone spilled a New England Pale Ale on the server

4. The process was assinated by the mysterious killer only known from his modus operandi of leaving OOM written in blood across the syslog

5. Birds nested within the server and fed all the SATA cables to their babies

Seriously though, disk writes aren’t atomic.

lalaland1125 · on July 3, 2020

File renames are atomic. This is a solved problem:

1. Write your updates to a copy of the file.

2. Do an atomic rename of that copy to the original.

Zarel · on July 3, 2020

Renames aren't atomic on crash.

https://danluu.com/file-consistency/

mpweiher · on July 4, 2020

Hmm...unlike the rest of the post, he just asserts this without support.

His sources seem to disagree, certainly with that kind of blanket statement. For example:

Our study takes a pessimistic view of file-system behavior; for example, we even consider the case where renames are not atomic on a system crash.

So this is clearly considered an outlier/unusual.

(e.g., a single 512-byte write or file rename operation are guaranteed to be atomic by many current file systems when running on a hard-disk drive)

[https://www.usenix.org/system/files/conference/osdi14/osdi14...]

I remember reading quite a bit about the (performance reducing) lengths filesystems go to in order to ensure consistency of directory entries even in case of a crash, and for example how "soft updates" were introduced to accomplish the same consistency with less of a performance degradation.

Looking at it from another angle, if you are running on top of a filesystem that cannot keep itself consistent, then you are SOL, there really isn't anything you can do to mitigate.

Just like we can't guarantee that we will be able to persist data that's in memory to disk if the OS is free to kill us at any time. "Best effort" it is, which means getting the data to disk as quickly as possible and not corrupting what is there.

jclulow · on July 3, 2020

They are if you perform the correct sequence of fsync operations on both the file and the directories, and use a file system which is correctly implemented.

chrismorgan · on July 3, 2020

Renames aren’t atomic on crash.

badsectoracula · on July 4, 2020

What crash?

The article linked above never explains that part, it only assumes that it will happen. From the code it sounds as if the crash can happen in the OS itself (but then the entire kernel will crash). At that point things are completely outside your control and you might as well running on broken hardware.

fanf2 · on July 3, 2020

1. Write your updates to a copy of the file.

1a. fsync() the file to ensure the contents are durable.

2. Do an atomic rename of that copy to the original.

2a. fsync() the directory to ensure the rename is durable.

fiddlerwoaroof · on July 3, 2020

Or just use a database like SQLite or Postgres that has developers dedicated to solving this problem.

badsectoracula · on July 4, 2020

Disk writes aren't atomic but you know if they failed or not (unless your OS is lying to you but that is a problem with the OS, not your code).

louis8799 · on July 3, 2020

Oh yeah, so basically had he pushed further, he would have realized to avoid corruption, he would need to implement "write ahead log (WAL)". An for the implementation of WAL and other performance concern, he would have realized that storing the JSON as string is not the way to go, he'd need to implement other binary data structure. Then he'd have realized that he had just invented another NoSQL DB.

Had he pushed further.....

Had he pushed further, he'd have raised funding for the newly invented NoSQL DB, and built a startup company on top of it.

pgt · on July 2, 2020

This was...painful to read, but hilarious.

liuliu · on July 2, 2020

It all sounds funny but the final solution is not far away except the JSON bit. FriendFeed 11 years ago first popularized the concept of storing schema-less data in MySQL: https://news.ycombinator.com/item?id=496946. Uber did the similar thing a few years ago: https://news.ycombinator.com/item?id=16251143

I've been building an open-source alternative on mobile that based on similar concept (SQLite + FlatBuffers): https://dflat.io/ SQLite own schema is already awesome, but in this way, you can have sum-types, better schema upgrade guarantees, index building can be asynchronously etc.

goatinaboat · on July 3, 2020

Someone at my old company basically did this and put it into production.

The is the mentality that plagues the industry, that anything more than a few years old is obsolete, and therefore experience is worthless, and therefore the wheel must be reinvented every time because those old programmers must have been dumb, why would they use SQL otherwise. Why real engineers don't take "software engineers" very seriously (and in turn why software engineers don't take webdevs seriously).

mpweiher · on July 4, 2020

I used sets of flat JSON files as our "database" in the Wunderlist iOS and macOS clients.

Worked like a charm, never had a problem with it.

It was actually put in as a placeholder until we had time to think about a real storage solution, but it turned out we never needed anything more sophisticated, and were actually the fastest and most reliable clients we had. In fact, every time I encountered a performance problem I was hopeful that I would finally have a good reason to do that real implementation, but it invariably turned out to be a simple bug.

- Cocoa has -writeToFile:atomically:, which writes a new file and then renames, so no write-corruption

- We were lucky that lists had just the right granularity for a single file to be read/written atomically

- We likely wrote (quite) a bit more data than absolutely necessary, but I/O tends to have large fixed overheads so medium files tend to take around the same time as small files

- We did not do anything with the data on disk except read it, so not a DB

- We really did use files, not JSON strings inside SQLite

- We flushed to disk asynchronously, but as quickly as possible

benologist · on July 2, 2020

I have built this into some software and the reason is when you are developing, and for very limited use cases outside of development it is extremely convenient. Clone my project and start messing around without provisioning a database. It reduced dependencies by hundreds of modules too because all of the connectivity libraries were abstracted into separate modules you'd only install if you wanted that particular type in production.

hnlmorg · on July 2, 2020

This is why I love sqlite. No provisioning required and you have a flat portable file. But you also have the added bonus that it's highly performant and there isn't much work to refactor your SQL from sqlite3 to most other RDBMS.

jamil7 · on July 2, 2020

Funny read, why did nobody stop him?

bvinc · on July 2, 2020

I tried. I held a meeting to talk about the code. I found the problems hard to predict and hard to describe. It was decided that after the meeting he would work more on making his code less hacky and more production ready.

But the real answer is that our team was very siloed. No one knew what anyone else was doing. The other problem was that he was actually solving real world problems, and he was a very high performer. He got stuff done. Arguing to start over a project that's already working is a difficult position to hold when talking to management.

honkycat · on July 2, 2020

It's unfortunate that in this industry, on a lot of teams, "high performer" means "sloppy coder who lets his co-workers finish their project."

The problems he encountered with his dumbass solution were EASILY foreseen by an even noob coder. What did he "get done"? How did writing his own shitty version of a database add value to the company? He is good at finishing his own pointless tasks quickly, maybe, but if I was in charge of the team he would be looking for a new job after this stunt.

Sick-to-death of these cowboys. Nothing is ever "done", the majority of expense in software development comes in during maintenance, not during initial implementation.

nilkn · on July 3, 2020

My interpretation of this as a manager is that this developer was probably a creative thinker with a decent track record who got stuck going down a bad path on this project, and nobody paid attention or intervened until it was too late. They were also probably pretty junior but perhaps had some past accomplishments that made them appear less likely to make this kind of mistake. Once it was in production, the developer very well may have been "stuck" with it (i.e., unable to get permission to scrap it and redo it, since it was technically working and solved some business problem).

Given the team dynamics and lack of involvement from this person's manager, I wouldn't move to fire them. I'd move to rethink the entire team, admonish the manager, and possibly remove them. The team itself wasn't working, and this was a symptom: someone had a bad idea, pursued it for too long, nobody did enough to stop it, and then they couldn't go back.

This is a classic consequence of a manager who has stopped paying attention to their own team. The team was most likely also overburdened with too many tasks, which is why everyone was working on something separate and independent and nobody knew what anyone else was doing. In reality this developer shouldn't have been given a project like this without being paired with a more senior engineer to supervise it, but that would cut down on the number of story points the team could get through and would thus be discouraged in a dysfunctional environment.

ponyous · on July 2, 2020

If he was in your team it would be your fault. I think you just need a healthy balance of senior/juniors on the same codebase.

As OP said they were siloed from each other and he definitely needed some mentorship. I've seen devs like that turn to incredible coders just after a couple of months of pair programming.

honkycat · on July 2, 2020

Yeah that is a great point, I would have never let it get to this point.

That is 100% a pet peeve of mine: Places that hire perfectly capable jr. engineers and then fail to give them the support they need.

musingsole · on July 2, 2020

I can sympathize but it seems hard to argue with this developer's approach then. If it met the needs of the company, particularly to the desired level at the times these features were requested, I don't think there's a valid critique of the developer's architecture beyond iT's NoT DoNe CoRrEcTlY. And still, there's a lot to be said for keeping your developer's entertained so they stick around.

airstrike · on July 2, 2020

> there's a lot to be said for keeping your developer's entertained so they stick around

Really? At the expense of everyone else who has to deal with this monstrosity for the foreseeable future, or worse yet replace it with an actual tool that can be reliably used.

This JSON-inside-sqlite-inside-JSON-inside-a-JSON-string beast should never have seen the light of day.

You're not paid to be entertained, sorry. You're paid to be productive. As productive as you can, and to put the needs of the client and the long-term success of the company hopefully first but certainly before any resemblance of entertainment if you're getting paid

Did I mention you're getting paid to work?

bvinc · on July 2, 2020

I can certainly say that that's exactly the way I felt complaining about it. I felt like I was an asshole attacking him, and I don't think he liked me very much because of it. The whole thing was very uncomfortable. I didn't throw a fit. I tried to be very understanding and make suggestions.

If it's any consolation, it fell on to me to maintain this code after he moved on to something else, which is why I know so much about how it works.

david422 · on July 2, 2020

> it fell on to me to maintain this code after he moved on to something else

This has happened to be before. I disagreed with a technical direction, it was implemented anyways, and then I'm left to maintain it. Very frustrating.

honkycat · on July 2, 2020

You are never an asshole for telling the truth.

I've dealt with this before, it is a form of gas-lighting. Some people are good at making everyone else into a bully when THEY are the actual bully. Like the kid who keeps splashing you in a pool, but runs off and cries and tells when you splash them back.

Standing up for yourself sometimes makes you look/feel like an asshole. That doesn't mean you are wrong or that you shouldn't do it.

edit: Check out the book "Radical Candor" if you regularly struggle with expressing negative feedback

airstrike · on July 2, 2020

> If it's any consolation, it fell on to me to maintain this code

That's not any consolation... If anything, it's all the more reason for you to be pissed off. He should have dropped the project, rolled out a future-proof tool and taught to do differently next time.

Anything short of that is just enabling the dude's delusion of grandeur and therefore a mistake on everyone else's part...

musingsole · on July 2, 2020

If you're maintaining it, then I think you get a fair vote in it's architecture going forward. Things that are plainly problematic now didn't seem that way to a different group of people in a different context before it was even created. Perhaps it was a cascade of poor choices, but regardless, identifying problems with the architecture in the context of today gives a huge advantage over those who were putting it together under who knows what conditions (at work or elsewhere).

Just like the never ending "turn this Excel workbook into an app" stream of work, refactoring older apps will be a constant. Focusing today's conversations on yesterday's mistakes only detracts from the work left to do (which is to say if your architecture change arguments are valid, there should be ways to justify implementing them today outside of "it should've been done this way in the first place because then we wouldn't have had those problems that are now solved anyway")

kbenson · on July 2, 2020

> If it met the needs of the company, particularly to the desired level at the times these features were requested, I don't think there's a valid critique of the developer's architecture beyond iT's NoT DoNe CoRrEcTlY.

I'm pretty sure the cascading series of "his next problem" sentences implies that there were plenty of problems with the architecture that weren't identified ahead of time, and they had to encounter and then fix as a series of bugs.

> And still, there's a lot to be said for keeping your developer's entertained so they stick around.

There's a difference between keeping your developers entertained and letting them infect production with ill-conceived projects that cause problems for all those that interact with them.

This project is reimplmenting something already solved multiple times. There are many document stores, and JSON interfaces and addon to traditional RDBMS, so what was being solved here, other than letting someone scratch an itch at the expense of the division he's working in. You're better off giving him 20% time for his own projects and calling it a day if you really think entertaining your developers is important enough to warrant it.

There are times when rolling your own is useful. Generally when there's some extreme requirements for space or performance, but even that becomes rare when the area is mature and explored thoroughly. A database, even a JSON document store of some sort, is so mature that to make it worth while for one person to roll their own when it seems to need all the common features (locking, remote access, different clients), that to actually recoup the cost of building our own (much less the future cost of troubleshooting and bug fixing) is almost impossible unless you're somehow hired a genius workaholic for peanuts.

dnautics · on July 2, 2020

I would say operator friendliness is actually the best reason to roll your own (was clearly not the case here). If you have a system that is less complex, because it meets your use case only and not the competing use cases of every damn engineering outfit that can pay overpaid and underqualified devs to commit to an open sourced codebase, and as a result requires less labor to manage (for example, not using kubernetes for a 3 person startup), you should roll your own.

kbenson · on July 2, 2020

Sure, but there are different levels of "roll your own". Mysql or Pestresql + a text field and a microservice front end for access control and JSON validation (if you don't want to use the included components from those respective projects that handle those for you) is easier and friendlier most of the time than a microservice on top of sqlite on a local disk, which is friendlier than replacing sqlite with BDB, which is probably friendlier than rolling your own storage format.

Once you've abstacted it to a service, your API is what you and your client (should) care about, and many of the arguments for more specialized implementations no longer apply. Personally, I think the only reason I would go with something like sqlite instead of Postgres/Mysql behind a microservice is if I was baking the date into it with each release, so the sqlite data files are shipped with the version released. Even then, I'm not sure there's any reason I would do anything other than sqlite though. Even if I had need of lots of JSON files, I would probably have my build procedure process them into an sqlite file I tested and shipped with, if only because I would then avoid having to deal with all the problems this guy encountered by trying to make his own database.

dnautics · on July 2, 2020

Oh yeah. Unless you're actually in the database biz, Don't roll you own database. Those things are rock solid, usually easy (well postgres and mysql/maria are anyways) and state is hard.

imtringued · on July 3, 2020

Rolling something yourself has the benefit that you don't need to teach yourself how the system works because you built it but once you want someone else to join the team this fires back because that person needs to be taught how your database works. If the database in question was widely used then spending time on learning a new database would be worth the investment but if it is only used internally then it's just a waste of time.

mkhalil · on July 2, 2020

I left my last company because one of my co-devs would always do crazy hack-job things, and when I complained to them or higher-ups, the excuse was:

< "Well all the work was already developed, and it would take too much time to rewrite it. You should have said something earlier" > "When?" I asked, considering she had just put up the (big) PR's and PR's ARE the time to review... < "Check her commits as she pushes them to the repo" - as in her bugfix/feature branches, not master...

My jaw dropped. Especially since I was hired on as "Lead" and had all the accountability but no actual power.

jack_h · on July 2, 2020

Yeah, I'm in a similar situation at the moment.

It's incredibly frustrating because during code reviews I will request changes so it's not such a broken hack job, and the response will basically be "No, it's not worth changing". At which point I'm the one "holding up development". We wasted hundreds of development hours during the last project because of this persons "inventive" code, and nobody seems to understand what's going on.

Shame the job market is a bit crap right now.

munk-a · on July 2, 2020

It's hard to get more strength to push back with out of thin air. I'd encourage you to try pushing for more detailed post-mortems (if you don't already have them) and just keep an eye out on how much curtailed reviews cost the company. You also really want an advocate for code maintenance and if you don't have one of these with a loud voice there isn't a really feasible way to solve it except becoming it yourself and earning the trust of those above you.

Two pieces of actual useful advice I can offer are:

1. A review style I picked up based off of RFC 2119[1] basically the reviewing software we use allows us to mark particular comments as blocking of non-blocking and I pair that with the usage of MAY/SHOULD/MUST within the comment language i.e. "We're using the old `array()` syntax here instead of `[]` we MAY wish to use the more modern syntax" this allows me some room to elevate necessary change while keeping in the nitpicks I really want to throw in (and I do try and minimize them) without lowering the power of the strong comments. I've used MUST maybe three times always for something incredibly terrible like pages not loading or migrations to the DB that are unsafe and cause data loss.

2. Agree on syntax and style rules and enforce them. It's easier to get people to agree to rules once than try and argue for them on each PR - anything like brace placement or line limit shouldn't come up repeatedly since it wastes everyone time and makes folks feel belittled.

1. https://www.ietf.org/rfc/rfc2119.txt

gen220 · on July 2, 2020

This is great advice. Just have some minor thoughts to tack on.

Post-mortems are great for many reasons. For the case of GP, one particular advantage is that they align senior peoples' understanding: we shouldn't do X again. If you have a strong narrative for why a project failed, post-mortems are a formal setting in which you can present this narrative with concrete evidence to higher-ups.

In the future, when you see warning signs that a mistake is approaching repetition, you can raise the concern up the chain, invoking the memory of the post-mortem to motivate their intervention.

I also totally agree that a sincere and high-quality code review process is required for high quality code. Your 2119 recommendation is excellent. I'd also recommend doing some reading on commit message templates that smart people follow, they've improved my commit game, big-time.

munk-a · on July 2, 2020

At our company no commits get into the trunk without going by another set of eyes. We're probably creeping up to mid-sized right now so those eyes can vary in stringency and reliability more so than they would have when it was just a handful of devs, but I think mandatory code reviews are a good habit to get into - so long as you empower every reviewer to be critical and make it clear that both the reviewer and dev are owning the code and must ensure it is acceptable during the process.

We've had that process on for quite a while, and while there are some big weaknesses and holes in it we've also adopted a principle to keep PRs as small as possible[1] with those two tools we've had some pretty reasonable success with a lot of our biggest incidents being related to times when we've made large changes or a review was skimped on.

1. Even if that isn't measure in LoC - moving a dependency and updating references to it is something I'd count as a single action - but one I'd want isolated from any logic changes.

mkhalil · on July 2, 2020

Yeah, commit's weren't going to trunk/master without the extra eyes/PR.

The commits I was told to review if I wanted to stop thecraziness were the personal ones going to the bugfix/feature branch.

gen220 · on July 2, 2020

This might be a separate issue! :)

Many good companies enforce a no-origin-branches policy, with rare and well-justified exceptions. Because, used as you describe, a "feature branch" is just a future massive diff in disguise (when it's eventually merged), and massive diffs are a big no-no because they're a huge pain to iterate on via code review.

geitir · on July 3, 2020

Doesn't every git repo have an origin branch? What is the alternative to creating a feature branch for developing something you don't want in production until it's ready?

gen220 · on July 3, 2020

Yep! Sorry, I meant that the only developed branch on origin is `master` (or whatever it’s called at your org). You can create branches locally, but pushing a local branch is strongly discouraged.

The workflow looks something like this.

git pull master; git checkout -b my-feature; ... ; git add -A; git commit

At this point, you submit the code for review, and upon approval the branch is merged into master and pushed. It’s not possible to push a commit hash to master that has not been reviewed.

If you have a feature that’s composed of many steps, you can “stack” multiple commits, and review/merge them in order.

If you want to develop the entire stack at once, you’re most likely doing something wrong (according to this culture). You can incrementally merge pieces of code to master in such a way that’s impossible for it to be deployed, and your final diff can be what makes it deployable.

munk-a · on July 3, 2020

Ouf that last bit makes me hurt.

Encouraging smaller changes isn't nearly as useful if those changes aren't isolated - if it's just half the picture then you can't accurately review it.

I hit a similar sort of issue recently - I've been incrementally developing a complex data migration, each change to the migration has worked on its own and been reviewed separately but I'm still going to go in and request a full review once the piece of logic is fully assembled. This is also happening on an integration branch on origin - we do try and keep these to a minimum but we're making a backwards incompatible change that would be quite expensive to do in a fully backwards compatible manner.

There are things that are infeasible to reasonably do without an integration branch (nothing is impossible technically, but it might be a huge waste of time) but even those things are pretty few and far between. If integration branches are common place at your company it might be good to examine coding practices and see if you can slice up tickets to be smaller.

gen220 · on July 3, 2020

Yeah, organizing work in such a way that you can make isolated, incremental change requires a nontrivial amount creativity and discipline, and that takes time like you say.

But, I do believe it pays off in the form of a higher quality end-product (fewer bugs, more testable/legible components, more extensible), which saves you time in the long run.

jjeaff · on July 2, 2020

I disagree. It's coder malpractice. There is something to be said for a quick solution that just gets the job done. But each one of the updates described would take more work than just implementing SQLite or similar. Sure, on the outset, do something quick and dirty. By the second or third iteration, any legitimate developer should have switched to a database solution. Creating technical debt for no reason or invalid reasons is just a good way to setup your company for failure.

tegiddrone · on July 2, 2020

On the other hand, other sorts of devs would probably not be entertained having to maintain a in-house database some unchained dev decided to introduce into the stack one day for "reasons." That tech debt will compound until it becomes more of a liability.. hopefully the product brings in enough money so that the in-house database can continue to be supported or removed.

This sort of stuff is what deters me from being a developer sometimes. Fuck the salary, get me out of here.

lumost · on July 2, 2020

It's not very agile friendly, but emphasizing design early in the process and having some "gate-keeping" protocol such as design review or code review can greatly reduce the chance of something going off the rails like this as it forces everyone to acknowledge what done looks like, as well as what the "missing" pieces will be.

The GateKeeper process isn't something you want to index on too heavily - but you also need a mechanism to counter-balance the possibility of a dev saying "I built a prototype last week that does 95% of the things we want" and 3 months of iteration later identifying that it only did 5%, and that getting the remaining use cases will require a re-write.

agumonkey · on July 3, 2020

Tech debt 201

inshadows · on July 3, 2020

> Next, his problem was that the database would get corrupt sometimes when something bad happened in the middle of writing the file. His solution was to ...

https://www.cs.ait.ac.th/~on/O/oreilly/perl/cookbook/ch07_09...

stavros · on July 2, 2020

I wrote something that works more or less the same way, but exclusively over SQLite:

https://github.com/skorokithakis/goatfish/

It's actually quite good as a quick-and-dirty datastore. I do need to move everything to SQLite's JSON field, though.

kaiby · on July 2, 2020

As others have mentioned, there are a ton of off-the-shelf solutions that would have been more than adequate for this.

My question is, why didn't he go for any of the existing solutions when setting them up would've still been faster than rolling his own DB-in-a-JSON-file solution?

idclip · on July 2, 2020

Bless him. I love it

bullen · on July 3, 2020

The solution is to put every "entity" in it's own JSON file and use the file system indexing with paths.

Only downside is you need to format ext4 with type small otherwise you run out of inodes before you run out of disk!

treeman79 · on July 2, 2020

Many many years ago In college, SQL sounded hard.

So I built my own database in PHP.

Enough said

time0ut · on July 2, 2020

When I was in university, one of our major projects was to implement a rdbms. Super fun project that taught us a lot of respect.

blackrock · on July 2, 2020

How did you implement the file block sectors? Did you build your own sub FAT protocol?

Like, initially allocating a large file block, and then you subdivided that yourself to get individual sector access?

imtringued · on July 3, 2020

As long as the stakes are low everything is fine.

ak39 · on July 2, 2020

But what color was it? Mauve?

CapsAdmin · on July 2, 2020

How did you go about porting the database code to something more sane? (just assuming you did)

I imagine if this database system is contained well enough, it shouldn't be so difficult to swap its internals with something else. Especially if it's all just JSON-like.

cortesoft · on July 2, 2020

But.... why?

waltpad · on July 2, 2020

Maybe because it was tempting: JSON is fairly easy to handle, very portable, and when you look at a JSON document, it's straightforward to think about querying it, and thus DB, although JSON is structured, and DBs are relational.

nailer · on July 2, 2020

> although JSON is structured, and DBs are relational.

You can have structured relational databases (RethinkDB for one, but there are others)

felipellrocha · on July 2, 2020

Job security, maybe? Can't think of anything else... This person would certainly never be fireable again after merging that monstrosity

golergka · on July 3, 2020

How was he hired for that position?

thesid · on July 2, 2020

pure genius

bifrost · on July 2, 2020

I think this is my favorite comment today.

chmod775 · on July 2, 2020

You definitely shouldn't use this in production.

Looking at the code there's:

- race conditions everywhere.

- bad and inconsistent formatting, which doesn't help with the

- huge if-else monstrosities.

- Also uses synchronous IO and asynchronous IO randomly.

- Uses try-catch liberally, doesn't check the caught errors, and just re-tries blindly forever in some cases.

If you do any parallel updates/inserts/removals with this "database" you're pretty much guaranteed to lose data. Updates are essentially: 1. read table, 2. make changes, 3. save table. Which at least would work if it was all synchronous.

I know this is going to sound harsh, but building databases is hard for even the most experienced coders, and whoever wrote this is clearly at the other end of that spectrum.

https://github.com/Devs-Garden/jsonbase/blob/master/tables.j...

5986043handy · on July 3, 2020

Yes, this feels very much like a weekend hackathon project rather than a finished product, even for its intended use case.

adamredwoods · on July 2, 2020

They're using readFileSync? Doesn't this lock up the thread?

mbreese · on July 3, 2020

It’s been awhile since I’ve looked at JavaScript like this, but I would guess that the thread lock would only exist so long as the read() call is running. That returns and then the write is initiated. It doesn’t look like there is a lock that exists throughout the update callback. If that’s missing, then if you had two threads/processes operating on the same table, you’d all but guarantee one of those update calls would be lost (in the best case scenario).

Or maybe there a different locking mechanism in place that my cursory look missed?

seabass · on July 3, 2020

Yes. They’re also using fs.exists in the exact way that the official documentation says not to use it.

russnewcomer · on July 2, 2020

I did something vaguely similar to this recently, and I still maintain it was a good choice.

I volunteered to write a medical visit recording app for an NGO in a developing country (a friend works with the NGO and asked me if I would help), and they have almost no budget, no guarantees of internet connectivity when their folks are in the field, and the likelihood that they may be using this software for years.

So I wrote a C# app that uses Winforms, and stores all data as JSON files, the 'table' structure is basically directories in the file system.

It lets them share visit file by import/exporting a zip file of the JSON via sneakernet USB drives [super naive last record written wins], does not rely on an internet connection anywhere at all ever, and all files are stored in plain JSON so that they can conceivably in the future do some data analysis on it. Their alternate plan was to continue using paper, or some terrible regular reconciliation of excel spreadsheets.

Having said all that and defending my decision on this single use-basically-I-wanted-to-have-independent-JSON-instead-of-SQLite-so-in-the-future-maybe-have-a-web-function-to-sync solution,

This feels like a different use case.

jaspax · on July 2, 2020

I strongly approve of this sort of thing. Using dead-simple and human-readable formats is a big win for things like this, even if it isn't architecturally "correct". It sounds like your decision was a good one for the use case you were looking at.

wilsonrocks · on July 2, 2020

I made a weekly zoom quiz that stores questions: text, audio, image, video links as json files and it works fine when parsed with Gatsby.

sangfroid_bio · on July 2, 2020

It is great until your charity gets acquihired by a big think tank/bigger charity/international aid group and most doctors/charity operators are not known for their talent at scaling software.

russnewcomer · on July 2, 2020

In this case, this NGO is not going to get acquihired. It's more likely that I'll get an email in 5 years from someone who I don't recognize asking me if I know anything about this program because my email has been attached to this thing they got gifted from a dead project, and have been using after all of the original people have moved on. :)

waltpad · on July 2, 2020

The difference is that your DB is the FS, and each JSON file is an individual record. You're not storing each table in a single JSON document.

dpedu · on July 2, 2020

How did you get involved with this kind of work?

russnewcomer · on July 3, 2020

Short version - I worked in that country for a few years and still care deeply about it.

If you want the long version, or you are interested in ways you could also be involved in that kind of project (or know C# and WinForms and want to help??? :) ) my email is my hn username at gmail.

marci · on July 2, 2020

> use-basically-I-wanted-to-have-independent-JSON-instead-of-SQLite-so-in-the-future-maybe-have-a-web-function-to-sync solution

sounds like you should use CouchDB, it a database/webserver, so you could make a simple html form on localhost[0], CouchDB is built with replication/sync (over HTTP) as one of it's main feature[1], and on the field, an offline-first webapp with PouchDB[2] and Service Workers[3] could have the exact same form

[0] https://docs.couchdb.org/en/stable/best-practices/forms.html

[1] https://pouchdb.com

[2] https://github.com/pouchdb-community/worker-pouch

russnewcomer · on July 2, 2020

Nope, that's too much complexity.

Because then someone has to run and manage a webserver, and there is no guarantee that ServiceWorkers will work like they do in 5 years or on an ancient Windows7 laptop running IE7.

I want this to be able to run for years without my intervention. :)

marci · on July 2, 2020

something that will work in 5 years on ancient windows7 and ie7?

couchdb

there's nothing to manage.

On Windows, you install couchdb.msi or whatever, installed as a windows service, it automatically boots at startup time.

Start IE7 go to localhost:5984/_utils, you get the DB's UI. At that point, all you did was installation.

One click later, you created the first db called 'somedb', a click later, you created the first json doc called 'somedoc'. Now you can access it from localhost:5984/somedb/somdoc.

For the HTML form, just after you created 'somedoc', you can click on "add attachment" and upload someform.html, then go to localhost:5984/somedb/somdoc/someform.html from IE7, no need for anything fancy. After you're gone, someone with the most basic HTML knowledge can make some changes if need be. No Internet required. Will work as long as the laptop works.

russnewcomer · on July 2, 2020

That is a little more complex than, "Here's a .exe, it will save files into your My Documents. Click Export to make a zip, import to read someone else's zip." Plus generating instructions on how to do that would have been tough, and this way should be much easier to spread the app around.

Basically, you don't have a bad idea, and if I were a couch expert or were not on the other side of the world, I might have chosen that. But since I know C#/WinForms well enough, and if we went with the browser I would have to support mobile phones and I don't want to support mobile phones for this use case for a lot of other reasons.

marci · on July 3, 2020

I understand, it was mostly or the-in-the-future-maybe-have-a-web-function-to-sync.

I also suggested Couch because outside of western countries, you seldom find laptops or desktops (outside of cities), but smartphones with a recent browser are ubiquitous, so if it worked on IE7 it would run anywhere, even in the most remote area with no/crappy network. And the first time I used couch, my programming "knowledge" was very basic HTML (no JS).

> if we went with the browser I would have to support mobile phones and I don't want to support mobile phones for this use case for a lot of other reasons.

Yeah... all in all I completely misinterpreted the requirement of your use-case.

jfkebwjsbx · on July 3, 2020

> And the first time I used couch, my programming "knowledge" was very basic HTML (no JS).

That probably explains a lot: you have a hammer and everything looks like a nail.

CouchDB makes no sense whatsoever for the requirements described.

marci · on July 3, 2020

> you have a hammer and everything looks like a nail.

Actually, that comment made me see things differently on a project that had me scratch my head for the last couple of weeks, so thanks a lot!

Don't know if it's related but I tend to feature creep.

jfkebwjsbx · on July 3, 2020

You're welcome! I am glad you took it in a good way and helped you.

sanqui · on July 2, 2020

Sounds like how MongoDB was born. I believe that has made a lot of people very angry and been widely regarded as a bad move.

kevsim · on July 2, 2020

Was my first thought as well. Mongo stores things as BSON rather than JSON but that feels more or less like an implementation detail.

ianlevesque · on July 2, 2020

Hitchhiker’s guide reference, very nice.

rubyn00bie · on July 3, 2020

I'm not trying to be a huge asshole here, but this has zero tests and just saves json files to disk. There's literally a full readme and contributors guide but zero tests for something that's supposed to store data for you?

For a community that loves it some Jepsen analysis, I can't for the life of me figure out why this has been up-voted so many times. This is just saving JSON file to disk. I'd argue this is harder than using Redis (flushing to disk) or (vomits in mouth) Mongo. Or shit, just use your filesystem and `jq`, you'll have something likely faster, safer, and more maintainable.

_28jh · on July 3, 2020

What are you talking about? They have tests right here: https://github.com/Devs-Garden/jsonbase/blob/master/test.js

Just uncomment the tests you want to run! Easier than using a testing library IMO.

choward · on July 3, 2020

I can't tell if you're being serious but one of the main features of tests is they are automated. They should be able to run as part of the build.

And just because you're not using a library doesn't mean you shouldn't have assertions. All these "tests" do is log. What do I check the output for?

All you need to do to have a somewhat respectable build is uncomment those tests, make them clean up after themselves, change the console logging to be assertions instead, and make them run on GitHub.

zupa-hu · on July 3, 2020

Come on folks have some humor this is clearly sarcastic.

jaywalk · on July 2, 2020

It's all fun and games until this ends up in somebody's production environment.

alecco · on July 2, 2020

And there's never a serious consequence to the ones who did it. By then they switched to a new position somewhere else. Like most prima donnas. And this shows that deep, deep down they know they are fake.

mongojunction · on July 3, 2020

That sounds like it comes from some specific experience you had... but it's pretty uncalled for to apply it so confidently to someone you don't know. Don't be mean, right?

Actually, another view is that there's nothing wrong with tinkering and DIY. Perl, JS, Redis all came from people hacking their own solutions (as far as I know).

Also, many big software orgs build extensive internal tools themselves.

Plus, making your own stuff is a lot of fun. You should try it sometime (if you haven't already) :)

jfkebwjsbx · on July 3, 2020

There is nothing bad about writing your own solutions.

What is bad is putting them in production when you don't have a clue about the domain.

mongojunction · on July 3, 2020

Being ignorant didn't make you a prima donna tho, as above says.

Also, they have to have some clue about the domain, because the domain is their own problem and they're writing a solution for it. So I don't think we can really just someone as not having any clue about their own engineering challenges.... especially if they're working solutions to them....

Antirez said literally he didn't know about existing solutions when he went to write redis, and he and redis are awesome. nothing bad about that

but I get your point about bad solutions are bad but that's sort of a tautology, doesn't add much value, and who are we to judge someone else's solutions are bad we don't know everything about their use case.

Again... even if we can say that you choosing someone else's technology for your problem is not a good solution we just can't criticize the author because it's your responsibility what you choose. so I just don't think it's valid to criticize the author

jfkebwjsbx · on July 3, 2020

> they have to have some clue about the domain, because the domain is their own problem

They can be lifelong experts on their problem, yet have no clue about writing a database engine and low-level programming in general.

> Antirez said literally he didn't know about existing solutions when he went to write redis

Nobody is born with knowledge. The difference is that Antirez studied previous solutions, studied how to do it, and then applied that knowledge right.

Instead, that person did the equivalent of building a bridge disregarding everything humans learnt about it since the Roman empire. It will not be a surprise if the bridge ends up collapsing.

tobr · on July 2, 2020

So much pessimism! Not sure if the author is here, but it would be interesting to hear what makes this different from, say, Lowdb.

Also, the writing in the README feels sloppy, which doesn’t inspire confidence. For example, you might want to decide if it’s called jsonbase, JSON-base, JSONBASe, Json-Base, JSON-Base, json-base or jsonDB.

teej · on July 2, 2020

If the author doesn’t provide any explanation of why this exists or what motivated them to create it, what am I supposed to assume?

They’ve called it a database. They have said explicitly “ You can use this as a backend for your ReST APIs.” But it doesn’t meet the table stakes for a database and encouraging folks to use it in a production environment is actively harmful.

I wish more folks were up front with the trade offs they make. I respect an OSS author a lot more when they are honest and upfront with what a thing is good at and where trade offs have been made.

When I don’t see that, I assume that either the author doesn’t know/care (red flag) or they can’t be bothered (annoying).

zachrip · on July 2, 2020

I don't believe the OP of this post are the same person as the author of the library. Someone publishing a project isn't harmful, you don't have to use it. If someone uses it and gets burned that is their fault, not the author's. If you're making this project a dep, it's your job to vet it, especially if it's a database. Just because something is OSS doesn't mean it needs to be some polished stone that meets your standards.

teej · on July 2, 2020

I agree with you. For those same reasons it’s reasonable for HN commenters to be “pessimistic” about a library with no track record and no discernible take on why it deserves to be production ready.

zachrip · on July 2, 2020

The project is 2 months old and they say nothing about its readiness for production, simply that it _could_ be used for a REST backend or similar. They also say that it could be used for a quick PoC. I'm not quite understanding how either of those claims are wrong. Why are people torching a young project that someone is releasing publicly for free? Again, if you don't want to use the project, nobody is forcing you to.

mongojunction · on July 3, 2020

I think it's wrong to try to push the responsibility for other people's choices to use something or not to the person who creates that thing. And it's also very contradictory to another aspect of hacker or engineering culture which is like... someone can create a amazing free security service and then that service can be used in deplorable ways by criminals but almost nobody in this scene will admonish the creator because they realize that the creator is not responsible for how people choose to use that creation. not to mention that exact sentiments is basically universally expressed in every license that exists. so I really think it's embarrassing how such supposed criticism passes on these forums without being you know immediately dismissed as ridiculous.

It is also impractical to expect the Creator to anticipate all the use cases and potential benefits and pitfalls that people might find in those different use cases and express them.

Second it's fundamentally a violation of a boundary about choices. The people who make the choice to adopt software or not are the ones who are responsible for the technical debt or credit they allocate by making that choice.

Instead of criticizing creators for not adequately disclaiming their new products because of a hypothetical or real harm that is incurred because people choosing that, you should criticize the people selecting things for being irresponsible with the projects they are responsible for.

If your evaluation of a project is simply based on reading the readme at a superficial level then it's nobody else's fault but yours if you end up with problems with the tech that you choose.

I'm not saying you're being mean here I think this is just a misguided attempt to try to avoid technical debt but it doesn't focus on an effective way to do that. What I feel is disappointing is how this sort of criticism is often leveled at new projects as a way to dismiss or I think unfairly criticize these creations and their authors, maybe as form of "concern trolling." if I understand that term correctly.

Like, "don't use this new project in production" is sort of a tautology of "be careful about any tech that you choose that it's suitable for your use case", which is pretty obvious and I think low value thing to say, but it's often said about new projects in a way that suggests "this project is terrible and the author is bad for suggesting that people even think about using this". which I think is very toxic to a culture of creation, invention and tinkering and it's disrespectful of people who put in the effort to make something. it also encourages something which I think is harmful which is the need to think "I need to make this project perfect and bulletproof before I even think of releasing it" which I think means there's a lot of projects that could have benefited if they were appreciated at the small flame level, but maybe people are discouraged from putting them out there because of this sort of misused criticism.

even though I'm not really a fan of his I think Paul Graham said something about this point regarding startups that's like a startup is like an idea that's just being born and it's very fragile so you have to kind of protect it but it can grow into something really amazing.

MH15 · on July 2, 2020

I wrote some code to do this in high school when I couldn't figure out SQL. Then I learned why databases exist.

jarym · on July 2, 2020

What ever you try to do with JSON has probably been tried before with XML. Including XML databases.

Now I’ll accept that XML databases has their use (especially if it involved storing and transforming third-party XML) but I can’t think of any good use for this when there’s SO many better options.

mbreese · on July 3, 2020

Having lived through XML databases and build systems, it’s sad/funny/interesting(?) to see this all play out again.

This has all happened before, and it will all happen again.

toyg · on July 3, 2020

Same here! I tried pretty hard at the time to work with XML databases. In the end, SQL is just more practical in most circumstances, and easier to reason about. The same will likely happen with this sort of effort.

time0ut · on July 2, 2020

I could see some niche uses for this. Anywhere you want a quick and dirty local db for demos and hacking. That said, I think you'll get more mileage out of SQLite. It is generally my go to for these use cases and far richer and more powerful.

syastrov · on July 2, 2020

Back in the day, we used to use CSV for this :)

techntoke · on July 2, 2020

This is what happens when someone with basically little technical experience joins a JavaScript coding school and has to build something in order to graduate.

tushonka · on July 2, 2020

Aaah the beginners project to store info without having an actual database.

pstuart · on July 2, 2020

SQLite with the JSON extension works well.

tushonka · on July 2, 2020

Who has time to learn SQL when you can just read/write JS objects from/to files. ;)

winrid · on July 2, 2020

I work on a system with a JSONB column with 11k unique paths with no schema. I created some tools to generate jsdoc which is a start, but please no.

beamatronic · on July 2, 2020

Are there tools which will scan a body of JSON and and generate all the unique paths? Is it common to do this?

winrid · on July 2, 2020

It's pretty easy to do. Also, a common interview question. :)

nicoburns · on July 2, 2020

jq can do this. I had to do something similar recently with a firebase database I inherited. We had nowhere near 11k unique paths thhough.

smt88 · on July 2, 2020

This is both more effort and a worse outcome than just using Postgres with JSONB data types.

boring_twenties · on July 3, 2020

I had the same thought, especially nowadays when you can just type `docker run --name mydb postgres` and wait like 3 minutes.

elpescado · on July 3, 2020

...or even just storing JSON in files.

waltpad · on July 2, 2020

That's a very strange idea: JSON is basically structured data, like XML, it's nice for documents with deeply nested structures.

The main issue is that contrarily to a DB, any modification will shift everything after it, so any indexing will have to be corrected. I suppose that if the document is not stored as is, but instead broken up in pages (filesystems are likely doing that already, so piggy backing on that could help), then indexing could be improved, but then storage starts to look like a regular DB, rather than JSON.

Interesting nonetheless, time will tell.

slifin · on July 3, 2020

Nesting in your primary data store is an anti-pattern for information workloads where you need to ask questions against your data

Consider:

    {
      "name": "Sam",
      "age": 12,
      "friends" [
      {
        "name": "Tom",
        "age": 15
      }
      ]
   }

Attributes like name and age are properties of a person entity, when placed in a JSON hierarchy something else is happening, the one dimensional relationship the things have between each other is also being saved into the structure

That's dangerous because relationships should be formed on read, not on write, otherwise you concrete all future reads towards whatever it was on write, and if you're particularly sloppy the data gets duplicated which is even worse

The solution is to normalise your data store and use relational algebra to reify relationships at runtime

The problem with mainstream databases is they don't force normalisation, automatic indexing and pulling off attribute level normalisation is unworkable performance wise, so in most teams this doesn't work but this idea does work if you want to try this out learn Datomic

issa · on July 2, 2020

I don't mean this negatively, but I'm having trouble thinking of a real-world use case for this. Can anyone add some?

bradstewart · on July 2, 2020

Looks a bit like https://github.com/typicode/lowdb, which I've used for a few production-ish usecases in the past.

Specifically for storing the results/state of a set of manually-executed management scripts. The scripts needed to query the data from previous executions, do some stuff, and store the output. Think poor mans version of terraform.

Everything was dumped in a git repo that was shared across a few people. It was a quick and dirty solution to manage some alpha customers before the "real" system came online.

randtrain34 · on July 2, 2020

Yeah lowdb or even https://github.com/LokiJS-Forge/LokiDB will probably be much more performant/more featureful.

kutorio · on July 2, 2020

Maybe similar to things like json-server (mockup a REST api by making a few json files) which is useful during the development process?

slantyyz · on July 2, 2020

It's probably useful for single-user scenarios (like an Electron app) where you only need to save a minimal amount of user data.

neilobremski · on July 2, 2020

Didn't everyone do this at some point? Sort of like everyone that started C++ in the 90's rolled their own string class. I remember doing this before "JSON" hadn't yet found its acronym (shaking rake ... get off my lawn!)

slantyyz · on July 2, 2020

This reminds me a lot of LokiJS[1], which, if I recall correctly, could optionally save the data as JSON file(s).

[1] https://github.com/techfort/LokiJS

waltpad · on July 2, 2020

I think that, oddly enough, the point of that project is to use the JSON format as the DB storage format, not as an export option. Just from the look of it (I don't know either projects), LokiJS will very likely be always faster.

slantyyz · on July 2, 2020

Sorry for the confusion, by "save as", I didn't mean export.

LokiJS has multiple persistence options, with JSON files in the filesystem being just one of them.

Alternatively, you could also just use it in-memory or with IndexedDB.

waltpad · on July 2, 2020

Oh, I understand. I suppose it makes sense, if for instance one needs to store a bunch of parameters somewhere, it might as well be a JSON file.

runawaybottle · on July 2, 2020

I’ve used Loki as more of an in memory db in the Browser, never with the assumption of using it as real storage.

maxpert · on July 2, 2020

I once built a toy document store using SQLite and Python using almost similar idea, https://maxpert.tumblr.com/post/47494540287/a-document-store... if done correctly the advantage of that approach IMHO is: - ACID (Powered by SQLite) - Complex and efficient Index (Powered by SQLite) - CouchDB like API

I've been playing around with Rust recently maybe I will do a simple implementation in Rust-Lang which will keep it memory safe and efficient.

randtrain34 · on July 2, 2020

Why would someone use this over https://github.com/typicode/lowdb ?

coding123 · on July 2, 2020

There are many use-cases for this for super simple ma-pa shops. It may be relegated 100% to shopping carts and checkout processes, but this would allow one to hand-edit her database, and even potentially let the server git push live changes... kinda interesting concept.

aabbcc1241 · on July 3, 2020

I went further, to log the logical changes of state as json files. (aka command sourcing)

To reduce the IO overhead, I batch multiple json values into a larger file.

I don't need random access because I'll replay all the changes when the server start.

Going to open source the library soon.

jarofgreen · on July 3, 2020

I'm also working on a similar system - a JSON datastore with a event stream so there is a full history of changes.

However we are using Postgres as a backend.

Our code is already technically Open Source, but it's not relatively stable and not doc'd yet, so I won't link.

aabbcc1241 · on July 9, 2020

I can see why people prefer sql database over plain files, for better edge cases handing. But that requires more configurations and resources.

bullen · on July 3, 2020

I also made a distributed JSON database over HTTP, it's been running live with 250.000 customers since 2016 without reboot: http://root.rupy.se

deft · on July 2, 2020

This is useful for a lot of projects. I've used a similar db library in a rust project I was working on for an example application. This way there are no heavy dependencies or even the need to say "you need sqlite".

xrd · on July 2, 2020

I would prefer to use pouchdb as an in memory database. Then, if I outgrow that, use the pouchdb server. Then scale up to couchdb if I need that. It's much better architected and your front end code never has to change.

jordic · on July 2, 2020

Minio also does this... Haha https://docs.min.io/docs/minio-select-api-quickstart-guide.h...

mkl95 · on July 2, 2020

This brings me back sweet memories of saving my Pygame minigames state to clunky JSON files when I was basically clueless about databases. It worked surprisingly well until it became a huge spaghetti mess :-)

jmull · on July 2, 2020

Seems like it should use standard callbacks and/or standard Promises for responses.

That's probably beside the point, though. Hopefully no one ever uses this so the non-standard async pattern doesn't matter.

laanfor · on July 2, 2020

JSON file as backend, So can I directly edit the JSON file?

hedora · on July 2, 2020

Bind does something like this (but not with JSON).

You have to run a “freeze” command before editing the database directly (so it can flush the current version of the database, and redirect writes to memory + log), and then “thaw” so it can read your changes and apply the log of updates to it.

drbojingle · on July 2, 2020

Could be good for things that are often read and rarely written. As soon as multiple people try to update the same file though just use an established db

StreamBright · on July 2, 2020

But why?

CogentHedgehog · on July 2, 2020

This reminds me of XML Database from back in the day... except JSON instead of XML.

tobyhinloopen · on July 2, 2020

Makes me think of how many Minecraft server plugins use YAML files as a database

globular-toast · on July 3, 2020

What a stupid idea. SQLite exists and you should use it. I took over a codebase at work that contained an ad hoc implementation of something like this and it was surely the most unprofessional thing I'd ever seen. What are they teaching kids in university these days?

trollied · on July 3, 2020

Just learn SQL. Please!