Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Some data belongs in Postgres, some in DynamoDB, some in JSON files. Now, how do we do reporting?

One of the key concepts in microservice architecture is data sovereignity. It doesn't matter how/where the data is stored. The only thing that cares about the details of the data storage is the service itself. If you need some data the service operates on for reporting purposes, make an API that gets you this data and make it part of the service. You can architect layers around it, maybe write a separate service that aggregates data from multiple other services into a central analytics database and then reporting can be done from there or keep requests in real time, but introduce a caching layer or whatever. But you do not simply go and poke your reporting fingers into individual service databases. In a good microservice architecture you should not even be able to do that.



Sorry, but "making an API that gets you this data" is the wrong answer.

Most APIs are glorified wrappers around individual record-level operations like- get me this user- or constrained searches that return a portion of the data, maybe paginated. Reporting needs to see all the data. This is a completely different query and service delivery pattern.

What happens to your API service written in a memory managed/garbage-collected language when you ask it to pull all the data from its bespoke database, pass it through its memory space, then send it back down the caller? It goes into GC hell, is what.

What happens when your API service when it issues queries for a consistent view of all the data and winds up forcing the database to lock tables? It stops working for users, is what.

There are so many ways to fail when your microservice starts pretending it is a database. It is not. Databases are dedicated services, not libraries, for a reason.

It is also true that analysts should not be given access to service databases, because the schema and semantics are likely to change out from under them.

The least bad solution? The engineering team is responsible for delivering either semantic events or doing the batch transformation themselves into a model that the data team can consume. It's a data delivery format, not an API.


>It is also true that analysts should not be given access to service databases, because the schema and semantics are likely to change out from under them.

Its not perfect but what we do is create a bunch of table views that represent each of the core data types in the system. We can then do all of the complex joins to collect the data analysts want in to an easy to query table as well as trying to keep the views consistent even as the db changes.


So you have a single database for all your microservices?


>It goes into GC hell

Can you exapand on this a little? Or a paper that I can read?


The service will need to read all its data and put it into objects, then extract the data from the objects to report it, then garbage collect all of that. For every single record in its entire data set.

You could say but oh, why not just return the underlying data without making objects? Well now you are exposing the underlying data format, which is what we’re trying to avoid by giving this job to the service.


And thus such patterns lead to the absurdity where 90% of enterprise apps do little actual computations beside serializing and deserializing JSON (or XML if a "legacy" app).


It's remarkable what you can do with just functions and nested data structures. Used to be big on the whole OOP thing, data roles, so much effort for so little.

Now I try to think about problems as "I have input data of shape X, I need shape Y" and fractally break it down into smaller shape-changes. I am kinda starting to get what those functional programmers are yammering on about.


The parent comment said “is asked for all records..GC hell “.

Since a micro service deals with only its own data and reporting is then across services, we’d need to query across services to get data and make sense of it. If we’d ever need to query all records, then such records would become domain objects in the micro services first before being passed along. A large number of domain objects would require a large amount of memory. Processing and releasing domain objects will result in GC on the released objects.


Wait, I would assume that the people in need of reporting would have a pretty good idea of what those reports should look like. That means you know exactly what data needs to be read from a data store optimized for reporting. Each micro-service contributes their share of data to a data store optimized for reading. This is a text-book use case for a non-relational document store. I'm really not seeing what's so difficult about building such a process.


Reporting and non-relational are like oil and water, coming from experience working with people who make reports.

It’s not like they come up with every report they think they might need while the micro service is being architected. They come up with a new report long after engineers have moved on. If it’s a SQL database, no problem. If it’s some silly resumeware data store, then what?


Yeah, this.

If you can't ask questions you didn't think of in advance, you didn't collect data.


Real question:

Why pull it into memory like that? Why not just pump it through a stream?


A stream would be the correct way to handle that problem, so backpressure can be used to prevent too much memory churn.


I came here simply to echo this statement! Design a reporting solution that is responsible for ingesting data from these micro services' persistence layers. Analysts should only ever be querying this reporting solution and should not be allowed to connect directly to any micro service persistence layer or API.

We have a whole industry around Analytics and Data and the tools and processes to build this reporting layer is well established and proven.

Nothing will give you as many nightmares as letting your analysts loose on your micro service persistence layers!


This is seriously why my company still has monoliths.

Our databases are open to way too many people. What's worse, they are multi tenant making refactoring really hard.


Having more than one schema owner is practically a death sentence for development and engineering...

We used to have a few of those, especially on exadata clusters. Finally carted them out of the local dc after moving to RDS Aurora databases with strict policies. Might have caused 3 or 4 people to quit, but totally worth it for the 500+ people that stayed who now can own their data, schema and development (and be held responsible for it! -- another issue of multi-db-access, it's always someone else's fault). Went from deploying once a day with a 'heads up' message to no-message deploying multiple times per hour.


Why monoliths? Everyone still wants to to have OLAP and OLTP systems where analytics are done on OLAP. Where having this separation you can get data from multiple sources to put into your analytics.

I cannot imagine people not doing that and having need to have stats in real time. For most shopping/banking stuff you can get away with once in 24 hours dumps and then analytics can be done on that.


> But you do not simply go and poke your reporting fingers into individual service databases.

This is why I distrust all of the monolith folks. Yes, it's easier to get your data, but in the long run you create unmaintainable spaghetti that can't ever change without breaking things you can't easily surface.

Monoliths are undisciplined and encourage unhealthy and unsustainable engineering. Microservices enforce separation of concerns and data ownership. It can be done wrong, but when executed correctly results in something you can easily make sense of.


You're saying "monoliths encourage unhealthy engineering" and then in the next sentence say "when executed correctly" for microservices. That sounds like a having/eating cake type situation.


Not exactly. It's hard to tell from the outside if a monolith was architecture well or is about the fall over.

In a microservice architecture it's harder to pretend you're doing it right.


> In a microservice architecture it's harder to pretend you're doing it right.

After seeing a few of them, I'd say: "it's less embarrassingly obvious that you're doing it wrong."

But dig into the code for a few endpoints and it usually don't take long to find the crazy spaghetti and the poorly-carved-out separation of responsibilities breaches.


I disagree, "doing it wrong" just looks different there.


The argument (which I sort of buy) is that microservices provide rails that keep people from doing certain stupid things like N clients depending on the data schema (making the schema a de-facto public interface).

The trick with microservices is that the ecosystem is maturing and there are still lots of ways to screw up other things that are harder to screw up with monoliths. In time 95% of those will go away (my specific prediction is that one day we will write programs that express concurrency and the compiler/toolchain will work out distributing the program across Cloud services--although "Cloud" will be an antiquated term by then--including stitching together the relevant logs, etc and possibly even a coherent distributed debugger experience).


You basically just described Erlang/OTP there.


To be fair, this is how I've seen tech decisions presented at most big tech companies.


Quit talking about what is behind the curtain


> It can be done wrong, but when executed correctly [...]

Quite the self-fulfilling prophecy there.

> Yes, it's easier to get your data, but in the long run [...]

Systems can and should be evolved and adapted over time. E.g. deploying components of the monolith as separate services. You can't easily predict what the requirements for your software going to be in say 10 years.

And depending on the stage a company is, easy access to data for business decisions outweighs engineering idealism.


> easy access to data for business decisions outweighs engineering idealism

I think there are different levels of sophistication of "engineering idealism". GP talks about "data ownership", and I get the desire to keep the data a microservice is responsible for locked in tightly with it. But let's be precise why it's good: because isolating responsibility reduces complexity. Not because code has some innate right to privacy.

In my own engineering idealism, there's no internal data privacy in the system. Things should be instrumentable, observable in principle. If an analyst wants to take your carefully designed internal NoSQL document structure and plug it into an OLAP cube for some reason, there must be a path to doing that; if that's an expected part of the business, the service needs to have it on the feature list, that this should be doable without degrading the service.

Software needs to be in boxes because otherwise we can't handle it mentally, but the boxes really shouldn't be that black.


Isolating responsibility reduces complexity for that piece of code. It increases complexity for assembling the whole thing into a holistic package, which is usually what analytics primary need is.

YMMV, but the tradeoff is less complexity at the SWE/prod department, and more at the analytics team.


> But let's be precise why it's good: because isolating responsibility reduces complexity.

The thing is, it just shifts around complexity. Once you have microservices, you have to deal with a bunch of new failure modes, plus a bunch of extra code whose only purpose is to provide an interface to other services. And in terms of separating data, the worst part is that you've prevent access this data with some other data within the same transaction.


> Quite the self-fulfilling prophecy there.

Microservices require your organization to have an engineering culture. I would be afraid of introducing them at, say, Home Depot where (I've heard) your average programmer doesn't even write tests.

If you have engineering talent within a small multiplicative factor of Google (say 0.5), then you can pull off Microservices at your org.

Edit: I'm being downvoted, but I don't think it's a dangerous assumption or point to make that it takes a certain amount of discipline and experience to implement microservices correctly. When you have that technical capacity and the project calls for it, the benefit is tremendous.


I think you're being downvoted because you're implying monoliths don't require an engineering culture and that microservices are a silver bullet in getting systems built correctly.

I've seen good and bad in each approach. It's certainly possible to enforce good SOCs and proper boundaries in monorepos, and also possible to plough a system into the ground with microservices.

They're all just tools in your toolbox and both have a part to play in modern development.


You’d be surprised how sophisticated Home Depot is. They switched their monolith to microservices using Spinnaker and even contributed back to Spinnaker.


My client is doing a lot of it wrong. To be fair, they got sold a lot of really horrible and ridiculous advice from IBM consultants (is there another kind?), but they also have people in charge (organizationally and technically) who aren't great decision-makers.

As the article says though, you can't fix a people problem (bad engineering practices and discipline) by going from one technology to another (monolith to microservices).


Only when done by folks that never learned how to write modular code and package libraries.

The same folks aren't going to magically learn how to do distributed computing properly, rather they will implement unmaintainable spaghetti network calls with all the distributed computing issues on top.


And untangling a monolith tends to be much less problematic that untangling a bunch of microservices. For one thing, you can refactor/untangle it all offline, do your testing, and do a single release with the updated version, as opposed to trying to coordinate releases of a bunch of services whose interfaces/boundaries were poorly defined.


DDD enforces seperation also.

It's about code quality, microservices are easy replaceable. Modules are too.

With both systems, the core part ( eg. mesh, Infrastructure, ... ) Is crucial.

I think experienced developers can see this, the ones that actually delivered products and had big code changes. The ones that handled their "legacy" code.

Microservices are just a way to enforce it, there are others. None are perfect or bad, both have their use-case.


I do not claim expertise here, but it would seem like microservices would add significant performance costs. Stitching together a bunch of results from different microservices is going to be a LOT more expensive than running a query with joins.


Humans are the most expensive part of the system. You have to make it easy for humans to understand and change the system, and at the end of the day that's the number one thing to optimize for. This is why microservices are compelling.

But to speak directly to your concern, you have to think about service boundaries and granularity correctly. Nobody is saying make a microservice out of every conceivable table. Think about the bigger picture, at a systems level. Wherever you can draw boxes you might have a service boundary.

Why would you need to join payment data to session and login data?

Do you need to compare employee roles and ACLs against product shipping data?

These things belong in different systems. If you keep them in the same monolith, there's the danger that people will write code that intertwines the model in ways it shouldn't. Deploying and ownership become hard problems.

The goal is to keep things that are highly functionally related together in a microservice and expose an API where the different microservices in your ecosystem are required to interact. (Eg, your employees will login.)

When the data analytics folks want to do advanced reporting on the joins of these systems (typically offline behavior), you can expose a feed that exports your data. But don't expose an internal view of it to them or they'll find ways of turning you into a monolith.


In my experience it is a lot more difficult to navigate around all the different microservices to understand what needs to be done compared to being in a monolith where you can jump from file to file.

Also then what also happens is microservices are created using different languages which in turn adds so much complexity to understand what is going on on the whole big picture level.

And code gets repeated a lot more. If there is change in a microservices or update everyone will need to figure out what services depend on and how they will have to adapt. With monolith you can just use your IDE to see what will break if you make a change. So much repeated business logic. Creating a new feature involves having to have many meetings to figure out what services in which way have to be updated.

It is crazy mess in my opinion.

I have been with a company that had monolith application which they split up to more than 15 services (some python, some js, Scala, Java, etc...). Monolith still is used for some parts that are not migrated. I was working on single service having no idea how the whole system worked together. Then I had to do something in the old parts and I very quickly got an understanding how everything works together.


>And code gets repeated a lot more. If there is change in a microservices or update everyone will need to figure out what services depend on and how they will have to adapt. With monolith you can just use your IDE to see what will break if you make a change. So much repeated business logic. Creating a new feature involves having to have many meetings to figure out what services in which way have to be updated.

This is what people mean when they say "distributed monolith" vs. microservices.


I work on a monolith with a team experimenting in microservices and good lord do I hate it. The microservice represents a required step in our user flow, and due to the way we're set up I have to spin up my own private copy. Very often there have been configuration or API changes that were not communicated to me and so for the past few months that service have been broken and I've managed to avoid it for the most part. When I can't, I find it is faster to simply re-assign existing database records or simply bullshit them in a database editor rather than deal with the "why isn't the XXXXXXXXXXXX service working for me again?" flavor of the day

And holy fuck is debugging that stuff difficult. HUUUUUGE waste of time, but management looooooooooves their blasted microservices...


Programming complexity is changed to devops complexity.

With microservices, without a good documentation how it connects, it's going to leave a very bad impression.


Having to have that documentation, finding, reading, understanding and trusting it already adds so much overhead.

It is still nowhere close to ability to jumping around with IDE.

It might be in a different language, different design patterns and to get to the details you have to check out that project anyway because you can't document absolutely everything out of code base. And if you do you will end up with multiple sources of truth.

It is so much more likely that for every little issue which you otherwise might be able to find an answer to yourself very easily you will have to contact the team owning that microservices.

It is not only mentally exhausting. It is time consuming, it requires so much back and forth. It creates so much dependence on other people because figuring out how things are related is so much more difficult.

Sometimes I have 8 or more different IDE windows open to understand what is going on.


> you have to think about service boundaries and granularity correctly.

This is the hardest part.. I'd argue that this is almost impossible to do correctly without significant domain modeling experience.. also microservices by nature make this hard to refactor these boundaries (compared to monoliths where you'd get compile time feedback)

I prefer to make a structured monolith first (basically multiple services with isolated data that are squished together into a single deployable) and pull them out only if I really need to... Also helps with keeping ms sprawl under control


If you already can't serve your requests from one DB, and you already want to factor out the analytics stuff, the long running background queries, modularize the spaghetti, scale the maintenance load, CI build + testing time, etc...

That's what SOA and microservices is supposed to solve.

At that scale you do reporting from a purpose-built service.

Allegedly.


We do a lot of reporting that when. Then the users are unhappy that the data is slightly "stale". It serves some purposes, but not all purposes.


That wouldn't be a microservice then.

There's going to be a relationship between data in your services, but it shouldn't be directly referential.


> enforce separation of concerns and data ownership

You can enforce separation of concerns and data ownership in a monolith just as much as you can not enforce these two characteristics in a micro service architecture. Microservices and monoliths are a discussion about deployment artifacts, full stop.


> you create unmaintainable spaghetti that can't ever change without breaking things you can't easily surface.

How does creating a tangle of microservices (effectively distributed objects) really solve the problem?


Microservices provide an abstraction. That is kind of the point. If you feel like the data yours service operates on would be better off stored in a redis database instead of an RDBMS, you can rewrite your persistence layer, test and roll out the new version of the service. As long as your APIs do not change, nobody cares how you produce responses to requests. In a monolith, this would be a nightmare. You don't have a single persistence layer to change, you have to go through every module, find all the places where this specific table or tables are being accessed and change retrieval and storage functionality everywhere.


> Microservices provide an abstraction.

So do modules/classes/interfaces etc. You don't need a layer of HTTP in between components to have abstraction.

In addition, it feels like microservices solve a problem that very few people really have. I've never run into a case where I though "boy, I'd sure like to have a different database for this one chunk of code". If that did happen, then sure, split it out, but I can hardly believe that splitting your entire code base into microservices has a net benefit. The real problem in nearly every project I've worked on is complexity of business logic. A monolith is much easier to refactor, and you can change the entire architecture if you need to without having to coordinate releases of many different applications.


Seems to me you are talking about a database access layer instead of microservices.

My understanding of microservices is a bunch of loosely connected services that can be changed with minimal impact to the others

Problem with the ideal is in reality this never works as complexity grows the spaghetti code moves to spaghetti infrastructure ( Done a network map of a large k8s / istio deployment lately ? )


The impact would be minimal only if the API of the microservice didn't change. But in the same codebase too, if you have a module whose API doesn't change the changes from refactoring it would likewise be minimal.


Constructed good, it's Ravioli and not spaghetti.


But that's true of a well-constructed monolith, too. And it has far fewer failure modes and less complexity in general.


>Microservices enforce separation of concerns

Depending on where you work, it can be a problem, because the separation is not always appropriate, and can for political reasons be much harder to revert when visible at the service level (for example because the architect doesn't understand consistency, or because your manager tells you that the distributed architecture documentation has been sent to the client so it cannot be modified).

In case of undue separation, reworking the internals of the enclosing monolith should have less chance to cause frictions.


In practice, splitting your code into consumable libraries/modules works equally as well.

Then your monolith is just all the modules glued together.


Splitting code into libraries works better, it's simpler and faster. The only thing micro services bring to the table is being to deploy updates independently (although this is also possible with libraries). If you don't need to deploy independently then micro services are useless complexity, if you can't deploy independently then you've got a distributed monolith.


A co-worker had a smart solution for this: your service's representation in a reporting system (a data warehouse for example) is part of its API. Your team should document it, and should ensure that when that representation changes information about the changes is available to the people who need to know it.

This really makes sense to me. I love the idea that part of a microservice team's responsibility is ensuring that a sensible subset of the data is copied over to the reporting systems in such a way that it can be used for analysis without risk of other teams writing queries that depend on undocumented internal details.


your service's representation in a reporting system

At what point in time?


At the beginning: https://docs.pact.io/


> But you do not simply go and poke your reporting fingers into individual service databases. In a good microservice architecture you should not even be able to do that.

I agree. In a monolith architecture, though, you CAN do that (and many shops do.) That's where their pains come from when they migrate from monolith to microservice: development is easier, but reports are way, way harder.


> when they migrate from monolith to microservice: development is easier [...]

Not even that -- that idea is still highly debatable.

I would argue that it absolutely isn't easier, and the stepping-back-in-time of developer experience is one of the biggest problems with microservices.

Microservices in general, are way, way harder.


What you are describing certainly isn't unique to microservices.


> you do not simply go and poke your reporting fingers into individual service databases

Side point: This is a needlessly hostile and unprofessional way to refer to a colleague. Remember that you and the reporting/analytics people at your company are working towards the same goals (the company's business goals). You are collaborators, not combatants.

You can express your same point by saying something like "The habit of directly accessing database resources and building out reporting code on this is likely to lead to some very serious problems when the schemas change. This is tantamount to relying upon a private API." etc.

We can all achieve much more when we endeavor to treat one another with respect and assume good intentions.


This is an incredible overreaction to an entirely innocuous comment.


I've noticed reporting/analytics people going extinct around my workplace as micro services make monitoring easier. There might be some pent up hostility towards the technology side


If you think telling colleagues not to "simply go and poke your reporting fingers into" things won't insult them or put them on a defensive footing, I encourage you to try it and closely note the reception you receive. In my experience, people do not appreciate being spoken to like that.


They didn't tell their colleagues to do that, they made a slightly humorous comment on a hacker news thread.


We’re colleagues by virtue of the fact that we’re members of the same profession.

Anyway, what’s the reason not to treat people on hackernews with the same respect you’d treat a coworker with?


I think I prefer the poking around analogy. I can immediately visualize why it's bad, and it doesn't have the word "tantamount".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: