Someone at my old company basically did this and put it into production.
The first problem he encountered was that multiple connections couldn't both be using the database at a time without clobbering each other. "No problem," he thought, this is a good use case for micro services. A service sitting on top would ensure that there was only one operation being performed at a time.
Next, his problem was that the database would get corrupt sometimes when something bad happened in the middle of writing the file. His solution was to put the entire JSON format inside of a JSON string. If it could be parsed successfully, then he knew the whole file was written. Then all he needed were "backup" files for each table, in case the current one was corrupt.
Next, his problem was that querying and iterating through a large table performed badly, since it required parsing the entire thing first. Querying several times required the whole file to be parsed every time. The solution was to move SOME of the tables over to JSON-inside-SQLite.
EDIT: Oh yeah, the next problem was how to structure the data inside of sqlite. He decided to make a single table called "kitchen_sink" that held every JSON value. There was a column that said which "collection" it belonged to. There was another column that represented the row's primary key. So you could quickly query for a collection name, and a primary key, and get the full JSON row.
So the next problem was that you couldn't query quickly for things that weren't the primary key. So new columns had to be added called "opt_key1" and "opt_key2" where certain rows could put key values, and indexes could be added on those columns, so you could quickly query by it's first optional key, or it's second optional key.
> So the next problem was that you couldn't query quickly for things that weren't the primary key. So new columns had to be added called "opt_key1" and "opt_key2" where certain rows could put key values, and indexes could be added on those columns, so you could quickly query by it's first optional key, or it's second optional key.
It's easy to get a little laugh from this, but congrats to the guy for exploring. Now he knows first-hand the inordinate challenge and can describe it in detail, but more importantly avoid these hard-learned patterns later.
> but more importantly avoid these hard-learned patterns later.
Depends, I’ve known people who have gone through similar experiences and still poo-poo all those “unnecessarily bloated” solutions like a proper database.
I worked with someone that was keen to use reductionist logic and arguments...
we ended up with a lot of shitty solutions to problems that were hard to maintain, hard to extend, and hard to use because the more "complex" solution was really just a fancy version of a folder and some text files.
AFAICT (not being Node-fluent) this doesn't even use atomic file writing strategies :| So yeah, all of these are pretty likely to happen with this lib.
Just use SQLite, people. Even JSON-in-SQLite is still likely to be an improvement.
> Next, his problem was that the database would get corrupt sometimes when something bad happened in the middle of writing the file.
I'm not sure i understand how this can happen... unless you try to update JSON in-place (which is a very bad idea for any text-based format), what you do is encode/write the entire JSON from scratch. So either the file is written properly or it isn't written.
Honestly from the entire message it doesn't sound like JSON was a bad idea but that your coworker didn't know what he was doing and if he was doing something else then he'd still be doing big mistakes.
I remember reading quite a bit about the (performance reducing) lengths filesystems go to in order to ensure consistency of directory entries even in case of a crash, and for example how "soft updates" were introduced to accomplish the same consistency with less of a performance degradation.
Looking at it from another angle, if you are running on top of a filesystem that cannot keep itself consistent, then you are SOL, there really isn't anything you can do to mitigate.
Just like we can't guarantee that we will be able to persist data that's in memory to disk if the OS is free to kill us at any time. "Best effort" it is, which means getting the data to disk as quickly as possible and not corrupting what is there.
They are if you perform the correct sequence of fsync operations on both the file and the directories, and use a file system which is correctly implemented.
The article linked above never explains that part, it only assumes that it will happen. From the code it sounds as if the crash can happen in the OS itself (but then the entire kernel will crash). At that point things are completely outside your control and you might as well running on broken hardware.
Oh yeah, so basically had he pushed further, he would have realized to avoid corruption, he would need to implement "write ahead log (WAL)". An for the implementation of WAL and other performance concern, he would have realized that storing the JSON as string is not the way to go, he'd need to implement other binary data structure. Then he'd have realized that he had just invented another NoSQL DB.
Had he pushed further.....
Had he pushed further, he'd have raised funding for the newly invented NoSQL DB, and built a startup company on top of it.
I've been building an open-source alternative on mobile that based on similar concept (SQLite + FlatBuffers): https://dflat.io/ SQLite own schema is already awesome, but in this way, you can have sum-types, better schema upgrade guarantees, index building can be asynchronously etc.
Someone at my old company basically did this and put it into production.
The is the mentality that plagues the industry, that anything more than a few years old is obsolete, and therefore experience is worthless, and therefore the wheel must be reinvented every time because those old programmers must have been dumb, why would they use SQL otherwise. Why real engineers don't take "software engineers" very seriously (and in turn why software engineers don't take webdevs seriously).
I used sets of flat JSON files as our "database" in the Wunderlist iOS and macOS clients.
Worked like a charm, never had a problem with it.
It was actually put in as a placeholder until we had time to think about a real storage solution, but it turned out we never needed anything more sophisticated, and were actually the fastest and most reliable clients we had. In fact, every time I encountered a performance problem I was hopeful that I would finally have a good reason to do that real implementation, but it invariably turned out to be a simple bug.
- Cocoa has -writeToFile:atomically:, which writes a new file and then renames, so no write-corruption
- We were lucky that lists had just the right granularity for a single file to be read/written atomically
- We likely wrote (quite) a bit more data than absolutely necessary, but I/O tends to have large fixed overheads so medium files tend to take around the same time as small files
- We did not do anything with the data on disk except read it, so not a DB
- We really did use files, not JSON strings inside SQLite
- We flushed to disk asynchronously, but as quickly as possible
I have built this into some software and the reason is when you are developing, and for very limited use cases outside of development it is extremely convenient. Clone my project and start messing around without provisioning a database. It reduced dependencies by hundreds of modules too because all of the connectivity libraries were abstracted into separate modules you'd only install if you wanted that particular type in production.
This is why I love sqlite. No provisioning required and you have a flat portable file. But you also have the added bonus that it's highly performant and there isn't much work to refactor your SQL from sqlite3 to most other RDBMS.
I tried. I held a meeting to talk about the code. I found the problems hard to predict and hard to describe. It was decided that after the meeting he would work more on making his code less hacky and more production ready.
But the real answer is that our team was very siloed. No one knew what anyone else was doing. The other problem was that he was actually solving real world problems, and he was a very high performer. He got stuff done. Arguing to start over a project that's already working is a difficult position to hold when talking to management.
It's unfortunate that in this industry, on a lot of teams, "high performer" means "sloppy coder who lets his co-workers finish their project."
The problems he encountered with his dumbass solution were EASILY foreseen by an even noob coder. What did he "get done"? How did writing his own shitty version of a database add value to the company? He is good at finishing his own pointless tasks quickly, maybe, but if I was in charge of the team he would be looking for a new job after this stunt.
Sick-to-death of these cowboys. Nothing is ever "done", the majority of expense in software development comes in during maintenance, not during initial implementation.
My interpretation of this as a manager is that this developer was probably a creative thinker with a decent track record who got stuck going down a bad path on this project, and nobody paid attention or intervened until it was too late. They were also probably pretty junior but perhaps had some past accomplishments that made them appear less likely to make this kind of mistake. Once it was in production, the developer very well may have been "stuck" with it (i.e., unable to get permission to scrap it and redo it, since it was technically working and solved some business problem).
Given the team dynamics and lack of involvement from this person's manager, I wouldn't move to fire them. I'd move to rethink the entire team, admonish the manager, and possibly remove them. The team itself wasn't working, and this was a symptom: someone had a bad idea, pursued it for too long, nobody did enough to stop it, and then they couldn't go back.
This is a classic consequence of a manager who has stopped paying attention to their own team. The team was most likely also overburdened with too many tasks, which is why everyone was working on something separate and independent and nobody knew what anyone else was doing. In reality this developer shouldn't have been given a project like this without being paired with a more senior engineer to supervise it, but that would cut down on the number of story points the team could get through and would thus be discouraged in a dysfunctional environment.
If he was in your team it would be your fault. I think you just need a healthy balance of senior/juniors on the same codebase.
As OP said they were siloed from each other and he definitely needed some mentorship. I've seen devs like that turn to incredible coders just after a couple of months of pair programming.
I can sympathize but it seems hard to argue with this developer's approach then. If it met the needs of the company, particularly to the desired level at the times these features were requested, I don't think there's a valid critique of the developer's architecture beyond iT's NoT DoNe CoRrEcTlY. And still, there's a lot to be said for keeping your developer's entertained so they stick around.
> there's a lot to be said for keeping your developer's entertained so they stick around
Really? At the expense of everyone else who has to deal with this monstrosity for the foreseeable future, or worse yet replace it with an actual tool that can be reliably used.
This JSON-inside-sqlite-inside-JSON-inside-a-JSON-string beast should never have seen the light of day.
You're not paid to be entertained, sorry. You're paid to be productive. As productive as you can, and to put the needs of the client and the long-term success of the company hopefully first but certainly before any resemblance of entertainment if you're getting paid
I can certainly say that that's exactly the way I felt complaining about it. I felt like I was an asshole attacking him, and I don't think he liked me very much because of it. The whole thing was very uncomfortable. I didn't throw a fit. I tried to be very understanding and make suggestions.
If it's any consolation, it fell on to me to maintain this code after he moved on to something else, which is why I know so much about how it works.
> it fell on to me to maintain this code after he moved on to something else
This has happened to be before. I disagreed with a technical direction, it was implemented anyways, and then I'm left to maintain it. Very frustrating.
I've dealt with this before, it is a form of gas-lighting. Some people are good at making everyone else into a bully when THEY are the actual bully. Like the kid who keeps splashing you in a pool, but runs off and cries and tells when you splash them back.
Standing up for yourself sometimes makes you look/feel like an asshole. That doesn't mean you are wrong or that you shouldn't do it.
edit: Check out the book "Radical Candor" if you regularly struggle with expressing negative feedback
> If it's any consolation, it fell on to me to maintain this code
That's not any consolation... If anything, it's all the more reason for you to be pissed off. He should have dropped the project, rolled out a future-proof tool and taught to do differently next time.
Anything short of that is just enabling the dude's delusion of grandeur and therefore a mistake on everyone else's part...
If you're maintaining it, then I think you get a fair vote in it's architecture going forward. Things that are plainly problematic now didn't seem that way to a different group of people in a different context before it was even created. Perhaps it was a cascade of poor choices, but regardless, identifying problems with the architecture in the context of today gives a huge advantage over those who were putting it together under who knows what conditions (at work or elsewhere).
Just like the never ending "turn this Excel workbook into an app" stream of work, refactoring older apps will be a constant. Focusing today's conversations on yesterday's mistakes only detracts from the work left to do (which is to say if your architecture change arguments are valid, there should be ways to justify implementing them today outside of "it should've been done this way in the first place because then we wouldn't have had those problems that are now solved anyway")
> If it met the needs of the company, particularly to the desired level at the times these features were requested, I don't think there's a valid critique of the developer's architecture beyond iT's NoT DoNe CoRrEcTlY.
I'm pretty sure the cascading series of "his next problem" sentences implies that there were plenty of problems with the architecture that weren't identified ahead of time, and they had to encounter and then fix as a series of bugs.
> And still, there's a lot to be said for keeping your developer's entertained so they stick around.
There's a difference between keeping your developers entertained and letting them infect production with ill-conceived projects that cause problems for all those that interact with them.
This project is reimplmenting something already solved multiple times. There are many document stores, and JSON interfaces and addon to traditional RDBMS, so what was being solved here, other than letting someone scratch an itch at the expense of the division he's working in. You're better off giving him 20% time for his own projects and calling it a day if you really think entertaining your developers is important enough to warrant it.
There are times when rolling your own is useful. Generally when there's some extreme requirements for space or performance, but even that becomes rare when the area is mature and explored thoroughly. A database, even a JSON document store of some sort, is so mature that to make it worth while for one person to roll their own when it seems to need all the common features (locking, remote access, different clients), that to actually recoup the cost of building our own (much less the future cost of troubleshooting and bug fixing) is almost impossible unless you're somehow hired a genius workaholic for peanuts.
I would say operator friendliness is actually the best reason to roll your own (was clearly not the case here). If you have a system that is less complex, because it meets your use case only and not the competing use cases of every damn engineering outfit that can pay overpaid and underqualified devs to commit to an open sourced codebase, and as a result requires less labor to manage (for example, not using kubernetes for a 3 person startup), you should roll your own.
Sure, but there are different levels of "roll your own". Mysql or Pestresql + a text field and a microservice front end for access control and JSON validation (if you don't want to use the included components from those respective projects that handle those for you) is easier and friendlier most of the time than a microservice on top of sqlite on a local disk, which is friendlier than replacing sqlite with BDB, which is probably friendlier than rolling your own storage format.
Once you've abstacted it to a service, your API is what you and your client (should) care about, and many of the arguments for more specialized implementations no longer apply. Personally, I think the only reason I would go with something like sqlite instead of Postgres/Mysql behind a microservice is if I was baking the date into it with each release, so the sqlite data files are shipped with the version released. Even then, I'm not sure there's any reason I would do anything other than sqlite though. Even if I had need of lots of JSON files, I would probably have my build procedure process them into an sqlite file I tested and shipped with, if only because I would then avoid having to deal with all the problems this guy encountered by trying to make his own database.
Oh yeah. Unless you're actually in the database biz, Don't roll you own database. Those things are rock solid, usually easy (well postgres and mysql/maria are anyways) and state is hard.
Rolling something yourself has the benefit that you don't need to teach yourself how the system works because you built it but once you want someone else to join the team this fires back because that person needs to be taught how your database works. If the database in question was widely used then spending time on learning a new database would be worth the investment but if it is only used internally then it's just a waste of time.
I left my last company because one of my co-devs would always do crazy hack-job things, and when I complained to them or higher-ups, the excuse was:
< "Well all the work was already developed, and it would take too much time to rewrite it. You should have said something earlier"
> "When?" I asked, considering she had just put up the (big) PR's and PR's ARE the time to review...
< "Check her commits as she pushes them to the repo" - as in her bugfix/feature branches, not master...
My jaw dropped. Especially since I was hired on as "Lead" and had all the accountability but no actual power.
It's incredibly frustrating because during code reviews I will request changes so it's not such a broken hack job, and the response will basically be "No, it's not worth changing". At which point I'm the one "holding up development". We wasted hundreds of development hours during the last project because of this persons "inventive" code, and nobody seems to understand what's going on.
It's hard to get more strength to push back with out of thin air. I'd encourage you to try pushing for more detailed post-mortems (if you don't already have them) and just keep an eye out on how much curtailed reviews cost the company. You also really want an advocate for code maintenance and if you don't have one of these with a loud voice there isn't a really feasible way to solve it except becoming it yourself and earning the trust of those above you.
Two pieces of actual useful advice I can offer are:
1. A review style I picked up based off of RFC 2119[1] basically the reviewing software we use allows us to mark particular comments as blocking of non-blocking and I pair that with the usage of MAY/SHOULD/MUST within the comment language i.e. "We're using the old `array()` syntax here instead of `[]` we MAY wish to use the more modern syntax" this allows me some room to elevate necessary change while keeping in the nitpicks I really want to throw in (and I do try and minimize them) without lowering the power of the strong comments. I've used MUST maybe three times always for something incredibly terrible like pages not loading or migrations to the DB that are unsafe and cause data loss.
2. Agree on syntax and style rules and enforce them. It's easier to get people to agree to rules once than try and argue for them on each PR - anything like brace placement or line limit shouldn't come up repeatedly since it wastes everyone time and makes folks feel belittled.
This is great advice. Just have some minor thoughts to tack on.
Post-mortems are great for many reasons. For the case of GP, one particular advantage is that they align senior peoples' understanding: we shouldn't do X again. If you have a strong narrative for why a project failed, post-mortems are a formal setting in which you can present this narrative with concrete evidence to higher-ups.
In the future, when you see warning signs that a mistake is approaching repetition, you can raise the concern up the chain, invoking the memory of the post-mortem to motivate their intervention.
I also totally agree that a sincere and high-quality code review process is required for high quality code. Your 2119 recommendation is excellent. I'd also recommend doing some reading on commit message templates that smart people follow, they've improved my commit game, big-time.
At our company no commits get into the trunk without going by another set of eyes. We're probably creeping up to mid-sized right now so those eyes can vary in stringency and reliability more so than they would have when it was just a handful of devs, but I think mandatory code reviews are a good habit to get into - so long as you empower every reviewer to be critical and make it clear that both the reviewer and dev are owning the code and must ensure it is acceptable during the process.
We've had that process on for quite a while, and while there are some big weaknesses and holes in it we've also adopted a principle to keep PRs as small as possible[1] with those two tools we've had some pretty reasonable success with a lot of our biggest incidents being related to times when we've made large changes or a review was skimped on.
1. Even if that isn't measure in LoC - moving a dependency and updating references to it is something I'd count as a single action - but one I'd want isolated from any logic changes.
Many good companies enforce a no-origin-branches policy, with rare and well-justified exceptions. Because, used as you describe, a "feature branch" is just a future massive diff in disguise (when it's eventually merged), and massive diffs are a big no-no because they're a huge pain to iterate on via code review.
Doesn't every git repo have an origin branch? What is the alternative to creating a feature branch for developing something you don't want in production until it's ready?
Yep! Sorry, I meant that the only developed branch on origin is `master` (or whatever it’s called at your org). You can create branches locally, but pushing a local branch is strongly discouraged.
At this point, you submit the code for review, and upon approval the branch is merged into master and pushed. It’s not possible to push a commit hash to master that has not been reviewed.
If you have a feature that’s composed of many steps, you can “stack” multiple commits, and review/merge them in order.
If you want to develop the entire stack at once, you’re most likely doing something wrong (according to this culture). You can incrementally merge pieces of code to master in such a way that’s impossible for it to be deployed, and your final diff can be what makes it deployable.
Encouraging smaller changes isn't nearly as useful if those changes aren't isolated - if it's just half the picture then you can't accurately review it.
I hit a similar sort of issue recently - I've been incrementally developing a complex data migration, each change to the migration has worked on its own and been reviewed separately but I'm still going to go in and request a full review once the piece of logic is fully assembled. This is also happening on an integration branch on origin - we do try and keep these to a minimum but we're making a backwards incompatible change that would be quite expensive to do in a fully backwards compatible manner.
There are things that are infeasible to reasonably do without an integration branch (nothing is impossible technically, but it might be a huge waste of time) but even those things are pretty few and far between. If integration branches are common place at your company it might be good to examine coding practices and see if you can slice up tickets to be smaller.
Yeah, organizing work in such a way that you can make isolated, incremental change requires a nontrivial amount creativity and discipline, and that takes time like you say.
But, I do believe it pays off in the form of a higher quality end-product (fewer bugs, more testable/legible components, more extensible), which saves you time in the long run.
I disagree. It's coder malpractice. There is something to be said for a quick solution that just gets the job done. But each one of the updates described would take more work than just implementing SQLite or similar. Sure, on the outset, do something quick and dirty. By the second or third iteration, any legitimate developer should have switched to a database solution. Creating technical debt for no reason or invalid reasons is just a good way to setup your company for failure.
On the other hand, other sorts of devs would probably not be entertained having to maintain a in-house database some unchained dev decided to introduce into the stack one day for "reasons." That tech debt will compound until it becomes more of a liability.. hopefully the product brings in enough money so that the in-house database can continue to be supported or removed.
This sort of stuff is what deters me from being a developer sometimes. Fuck the salary, get me out of here.
It's not very agile friendly, but emphasizing design early in the process and having some "gate-keeping" protocol such as design review or code review can greatly reduce the chance of something going off the rails like this as it forces everyone to acknowledge what done looks like, as well as what the "missing" pieces will be.
The GateKeeper process isn't something you want to index on too heavily - but you also need a mechanism to counter-balance the possibility of a dev saying "I built a prototype last week that does 95% of the things we want" and 3 months of iteration later identifying that it only did 5%, and that getting the remaining use cases will require a re-write.
> Next, his problem was that the database would get corrupt sometimes when something bad happened in the middle of writing the file. His solution was to ...
As others have mentioned, there are a ton of off-the-shelf solutions that would have been more than adequate for this.
My question is, why didn't he go for any of the existing solutions when setting them up would've still been faster than rolling his own DB-in-a-JSON-file solution?
How did you go about porting the database code to something more sane? (just assuming you did)
I imagine if this database system is contained well enough, it shouldn't be so difficult to swap its internals with something else. Especially if it's all just JSON-like.
Maybe because it was tempting: JSON is fairly easy to handle, very portable, and when you look at a JSON document, it's straightforward to think about querying it, and thus DB, although JSON is structured, and DBs are relational.
- bad and inconsistent formatting, which doesn't help with the
- huge if-else monstrosities.
- Also uses synchronous IO and asynchronous IO randomly.
- Uses try-catch liberally, doesn't check the caught errors, and just re-tries blindly forever in some cases.
If you do any parallel updates/inserts/removals with this "database" you're pretty much guaranteed to lose data. Updates are essentially: 1. read table, 2. make changes, 3. save table.
Which at least would work if it was all synchronous.
I know this is going to sound harsh, but building databases is hard for even the most experienced coders, and whoever
wrote this is clearly at the other end of that spectrum.
It’s been awhile since I’ve looked at JavaScript like this, but I would guess that the thread lock would only exist so long as the read() call is running. That returns and then the write is initiated. It doesn’t look like there is a lock that exists throughout the update callback. If that’s missing, then if you had two threads/processes operating on the same table, you’d all but guarantee one of those update calls would be lost (in the best case scenario).
Or maybe there a different locking mechanism in place that my cursory look missed?
I did something vaguely similar to this recently, and I still maintain it was a good choice.
I volunteered to write a medical visit recording app for an NGO in a developing country (a friend works with the NGO and asked me if I would help), and they have almost no budget, no guarantees of internet connectivity when their folks are in the field, and the likelihood that they may be using this software for years.
So I wrote a C# app that uses Winforms, and stores all data as JSON files, the 'table' structure is basically directories in the file system.
It lets them share visit file by import/exporting a zip file of the JSON via sneakernet USB drives [super naive last record written wins], does not rely on an internet connection anywhere at all ever, and all files are stored in plain JSON so that they can conceivably in the future do some data analysis on it. Their alternate plan was to continue using paper, or some terrible regular reconciliation of excel spreadsheets.
Having said all that and defending my decision on this single use-basically-I-wanted-to-have-independent-JSON-instead-of-SQLite-so-in-the-future-maybe-have-a-web-function-to-sync solution,
I strongly approve of this sort of thing. Using dead-simple and human-readable formats is a big win for things like this, even if it isn't architecturally "correct". It sounds like your decision was a good one for the use case you were looking at.
It is great until your charity gets acquihired by a big think tank/bigger charity/international aid group and most doctors/charity operators are not known for their talent at scaling software.
In this case, this NGO is not going to get acquihired. It's more likely that I'll get an email in 5 years from someone who I don't recognize asking me if I know anything about this program because my email has been attached to this thing they got gifted from a dead project, and have been using after all of the original people have moved on. :)
Short version - I worked in that country for a few years and still care deeply about it.
If you want the long version, or you are interested in ways you could also be involved in that kind of project (or know C# and WinForms and want to help??? :) ) my email is my hn username at gmail.
sounds like you should use CouchDB, it a database/webserver, so you could make a simple html form on localhost[0], CouchDB is built with replication/sync (over HTTP) as one of it's main feature[1], and on the field, an offline-first webapp with PouchDB[2] and Service Workers[3] could have the exact same form
Because then someone has to run and manage a webserver, and there is no guarantee that ServiceWorkers will work like they do in 5 years or on an ancient Windows7 laptop running IE7.
I want this to be able to run for years without my intervention. :)
something that will work in 5 years on ancient windows7 and ie7?
couchdb
there's nothing to manage.
On Windows, you install couchdb.msi or whatever, installed as a windows service, it automatically boots at startup time.
Start IE7 go to localhost:5984/_utils, you get the DB's UI. At that point, all you did was installation.
One click later, you created the first db called 'somedb', a click later, you created the first json doc called 'somedoc'. Now you can access it from localhost:5984/somedb/somdoc.
For the HTML form, just after you created 'somedoc', you can click on "add attachment" and upload someform.html, then go to localhost:5984/somedb/somdoc/someform.html from IE7, no need for anything fancy. After you're gone, someone with the most basic HTML knowledge can make some changes if need be. No Internet required. Will work as long as the laptop works.
That is a little more complex than, "Here's a .exe, it will save files into your My Documents. Click Export to make a zip, import to read someone else's zip." Plus generating instructions on how to do that would have been tough, and this way should be much easier to spread the app around.
Basically, you don't have a bad idea, and if I were a couch expert or were not on the other side of the world, I might have chosen that. But since I know C#/WinForms well enough, and if we went with the browser I would have to support mobile phones and I don't want to support mobile phones for this use case for a lot of other reasons.
I understand, it was mostly or the-in-the-future-maybe-have-a-web-function-to-sync.
I also suggested Couch because outside of western countries, you seldom find laptops or desktops (outside of cities), but smartphones with a recent browser are ubiquitous, so if it worked on IE7 it would run anywhere, even in the most remote area with no/crappy network. And the first time I used couch, my programming "knowledge" was very basic HTML (no JS).
> if we went with the browser I would have to support mobile phones and I don't want to support mobile phones for this use case for a lot of other reasons.
Yeah... all in all I completely misinterpreted the requirement of your use-case.
I'm not trying to be a huge asshole here, but this has zero tests and just saves json files to disk. There's literally a full readme and contributors guide but zero tests for something that's supposed to store data for you?
For a community that loves it some Jepsen analysis, I can't for the life of me figure out why this has been up-voted so many times. This is just saving JSON file to disk. I'd argue this is harder than using Redis (flushing to disk) or (vomits in mouth) Mongo. Or shit, just use your filesystem and `jq`, you'll have something likely faster, safer, and more maintainable.
I can't tell if you're being serious but one of the main features of tests is they are automated. They should be able to run as part of the build.
And just because you're not using a library doesn't mean you shouldn't have assertions. All these "tests" do is log. What do I check the output for?
All you need to do to have a somewhat respectable build is uncomment those tests, make them clean up after themselves, change the console logging to be assertions instead, and make them run on GitHub.
And there's never a serious consequence to the ones who did it. By then they switched to a new position somewhere else. Like most prima donnas. And this shows that deep, deep down they know they are fake.
That sounds like it comes from some specific experience you had... but it's pretty uncalled for to apply it so confidently to someone you don't know. Don't be mean, right?
Actually, another view is that there's nothing wrong with tinkering and DIY. Perl, JS, Redis all came from people hacking their own solutions (as far as I know).
Also, many big software orgs build extensive internal tools themselves.
Plus, making your own stuff is a lot of fun. You should try it sometime (if you haven't already) :)
Being ignorant didn't make you a prima donna tho, as above says.
Also, they have to have some clue about the domain, because the domain is their own problem and they're writing a solution for it. So I don't think we can really just someone as not having any clue about their own engineering challenges.... especially if they're working solutions to them....
Antirez said literally he didn't know about existing solutions when he went to write redis, and he and redis are awesome. nothing bad about that
but I get your point about bad solutions are bad but that's sort of a tautology, doesn't add much value, and who are we to judge someone else's solutions are bad we don't know everything about their use case.
Again... even if we can say that you choosing someone else's technology for your problem is not a good solution we just can't criticize the author because it's your responsibility what you choose. so I just don't think it's valid to criticize the author
> they have to have some clue about the domain, because the domain is their own problem
They can be lifelong experts on their problem, yet have no clue about writing a database engine and low-level programming in general.
> Antirez said literally he didn't know about existing solutions when he went to write redis
Nobody is born with knowledge. The difference is that Antirez studied previous solutions, studied how to do it, and then applied that knowledge right.
Instead, that person did the equivalent of building a bridge disregarding everything humans learnt about it since the Roman empire. It will not be a surprise if the bridge ends up collapsing.
So much pessimism! Not sure if the author is here, but it would be interesting to hear what makes this different from, say, Lowdb.
Also, the writing in the README feels sloppy, which doesn’t inspire confidence. For example, you might want to decide if it’s called jsonbase, JSON-base, JSONBASe, Json-Base, JSON-Base, json-base or jsonDB.
If the author doesn’t provide any explanation of why this exists or what motivated them to create it, what am I supposed to assume?
They’ve called it a database. They have said explicitly “ You can use this as a backend for your ReST APIs.” But it doesn’t meet the table stakes for a database and encouraging folks to use it in a production environment is actively harmful.
I wish more folks were up front with the trade offs they make. I respect an OSS author a lot more when they are honest and upfront with what a thing is good at and where trade offs have been made.
When I don’t see that, I assume that either the author doesn’t know/care (red flag) or they can’t be bothered (annoying).
I don't believe the OP of this post are the same person as the author of the library. Someone publishing a project isn't harmful, you don't have to use it. If someone uses it and gets burned that is their fault, not the author's. If you're making this project a dep, it's your job to vet it, especially if it's a database. Just because something is OSS doesn't mean it needs to be some polished stone that meets your standards.
I agree with you. For those same reasons it’s reasonable for HN commenters to be “pessimistic” about a library with no track record and no discernible take on why it deserves to be production ready.
The project is 2 months old and they say nothing about its readiness for production, simply that it _could_ be used for a REST backend or similar. They also say that it could be used for a quick PoC. I'm not quite understanding how either of those claims are wrong. Why are people torching a young project that someone is releasing publicly for free? Again, if you don't want to use the project, nobody is forcing you to.
I think it's wrong to try to push the responsibility for other people's choices to use something or not to the person who creates that thing. And it's also very contradictory to another aspect of hacker or engineering culture which is like... someone can create a amazing free security service and then that service can be used in deplorable ways by criminals but almost nobody in this scene will admonish the creator because they realize that the creator is not responsible for how people choose to use that creation. not to mention that exact sentiments is basically universally expressed in every license that exists. so I really think it's embarrassing how such supposed criticism passes on these forums without being you know immediately dismissed as ridiculous.
It is also impractical to expect the Creator to anticipate all the use cases and potential benefits and pitfalls that people might find in those different use cases and express them.
Second it's fundamentally a violation of a boundary about choices. The people who make the choice to adopt software or not are the ones who are responsible for the technical debt or credit they allocate by making that choice.
Instead of criticizing creators for not adequately disclaiming their new products because of a hypothetical or real harm that is incurred because people choosing that, you should criticize the people selecting things for being irresponsible with the projects they are responsible for.
If your evaluation of a project is simply based on reading the readme at a superficial level then it's nobody else's fault but yours if you end up with problems with the tech that you choose.
I'm not saying you're being mean here I think this is just a misguided attempt to try to avoid technical debt but it doesn't focus on an effective way to do that. What I feel is disappointing is how this sort of criticism is often leveled at new projects as a way to dismiss or I think unfairly criticize these creations and their authors, maybe as form of "concern trolling." if I understand that term correctly.
Like, "don't use this new project in production" is sort of a tautology of "be careful about any tech that you choose that it's suitable for your use case", which is pretty obvious and I think low value thing to say, but it's often said about new projects in a way that suggests "this project is terrible and the author is bad for suggesting that people even think about using this". which I think is very toxic to a culture of creation, invention and tinkering and it's disrespectful of people who put in the effort to make something. it also encourages something which I think is harmful which is the need to think "I need to make this project perfect and bulletproof before I even think of releasing it" which I think means there's a lot of projects that could have benefited if they were appreciated at the small flame level, but maybe people are discouraged from putting them out there because of this sort of misused criticism.
even though I'm not really a fan of his I think Paul Graham said something about this point regarding startups that's like a startup is like an idea that's just being born and it's very fragile so you have to kind of protect it but it can grow into something really amazing.
What ever you try to do with JSON has probably been tried before with XML. Including XML databases.
Now I’ll accept that XML databases has their use (especially if it involved storing and transforming third-party XML) but I can’t think of any good use for this when there’s SO many better options.
Same here! I tried pretty hard at the time to work with XML databases. In the end, SQL is just more practical in most circumstances, and easier to reason about. The same will likely happen with this sort of effort.
I could see some niche uses for this. Anywhere you want a quick and dirty local db for demos and hacking. That said, I think you'll get more mileage out of SQLite. It is generally my go to for these use cases and far richer and more powerful.
This is what happens when someone with basically little technical experience joins a JavaScript coding school and has to build something in order to graduate.
That's a very strange idea: JSON is basically structured data, like XML, it's nice for documents with deeply nested structures.
The main issue is that contrarily to a DB, any modification will shift everything after it, so any indexing will have to be corrected. I suppose that if the document is not stored as is, but instead broken up in pages (filesystems are likely doing that already, so piggy backing on that could help), then indexing could be improved, but then storage starts to look like a regular DB, rather than JSON.
Attributes like name and age are properties of a person entity, when placed in a JSON hierarchy something else is happening, the one dimensional relationship the things have between each other is also being saved into the structure
That's dangerous because relationships should be formed on read, not on write, otherwise you concrete all future reads towards whatever it was on write, and if you're particularly sloppy the data gets duplicated which is even worse
The solution is to normalise your data store and use relational algebra to reify relationships at runtime
The problem with mainstream databases is they don't force normalisation, automatic indexing and pulling off attribute level normalisation is unworkable performance wise, so in most teams this doesn't work but this idea does work if you want to try this out learn Datomic
Specifically for storing the results/state of a set of manually-executed management scripts. The scripts needed to query the data from previous executions, do some stuff, and store the output. Think poor mans version of terraform.
Everything was dumped in a git repo that was shared across a few people. It was a quick and dirty solution to manage some alpha customers before the "real" system came online.
Didn't everyone do this at some point? Sort of like everyone that started C++ in the 90's rolled their own string class. I remember doing this before "JSON" hadn't yet found its acronym (shaking rake ... get off my lawn!)
I think that, oddly enough, the point of that project is to use the JSON format as the DB storage format, not as an export option. Just from the look of it (I don't know either projects), LokiJS will very likely be always faster.
I once built a toy document store using SQLite and Python using almost similar idea, https://maxpert.tumblr.com/post/47494540287/a-document-store... if done correctly the advantage of that approach IMHO is:
- ACID (Powered by SQLite)
- Complex and efficient Index (Powered by SQLite)
- CouchDB like API
I've been playing around with Rust recently maybe I will do a simple implementation in Rust-Lang which will keep it memory safe and efficient.
There are many use-cases for this for super simple ma-pa shops. It may be relegated 100% to shopping carts and checkout processes, but this would allow one to hand-edit her database, and even potentially let the server git push live changes... kinda interesting concept.
This is useful for a lot of projects. I've used a similar db library in a rust project I was working on for an example application. This way there are no heavy dependencies or even the need to say "you need sqlite".
I would prefer to use pouchdb as an in memory database. Then, if I outgrow that, use the pouchdb server. Then scale up to couchdb if I need that. It's much better architected and your front end code never has to change.
This brings me back sweet memories of saving my Pygame minigames state to clunky JSON files when I was basically clueless about databases. It worked surprisingly well until it became a huge spaghetti mess :-)
Bind does something like this (but not with JSON).
You have to run a “freeze” command before editing the database directly (so it can flush the current version of the database, and redirect writes to memory + log), and then “thaw” so it can read your changes and apply the log of updates to it.
Could be good for things that are often read and rarely written. As soon as multiple people try to update the same file though just use an established db
What a stupid idea. SQLite exists and you should use it. I took over a codebase at work that contained an ad hoc implementation of something like this and it was surely the most unprofessional thing I'd ever seen. What are they teaching kids in university these days?
The first problem he encountered was that multiple connections couldn't both be using the database at a time without clobbering each other. "No problem," he thought, this is a good use case for micro services. A service sitting on top would ensure that there was only one operation being performed at a time.
Next, his problem was that the database would get corrupt sometimes when something bad happened in the middle of writing the file. His solution was to put the entire JSON format inside of a JSON string. If it could be parsed successfully, then he knew the whole file was written. Then all he needed were "backup" files for each table, in case the current one was corrupt.
Next, his problem was that querying and iterating through a large table performed badly, since it required parsing the entire thing first. Querying several times required the whole file to be parsed every time. The solution was to move SOME of the tables over to JSON-inside-SQLite.
EDIT: Oh yeah, the next problem was how to structure the data inside of sqlite. He decided to make a single table called "kitchen_sink" that held every JSON value. There was a column that said which "collection" it belonged to. There was another column that represented the row's primary key. So you could quickly query for a collection name, and a primary key, and get the full JSON row.
So the next problem was that you couldn't query quickly for things that weren't the primary key. So new columns had to be added called "opt_key1" and "opt_key2" where certain rows could put key values, and indexes could be added on those columns, so you could quickly query by it's first optional key, or it's second optional key.