Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The four pillars of data observability: metrics, metadata, lineage, and logs (metaplane.dev)
155 points by kzh_ on Aug 14, 2022 | hide | past | favorite | 27 comments


We're good at logging text, but how do you handle logging assets (images, audio - anything non-textual but generated) and associating them with your logs?

For example an image processing pipeline. You don't always want to log (it'd never scale) but as part of a trace you might want to keep the intermediate files so you can track down where the problem is. You've already got text logging for each step, recording metrics like duration and which filters were involved. I have saved files and referenced them in the logfile, but no log viewers I've seen understand anything beyond text. So I then have to build my own UI or open the images in turn.

Is there a pattern to handle this?


+1 on existing log viewers being particularly well suited for text over non-textual assets. My experience here is limited but I believe Grafana has a dynamic image plugin if you store a link to an asset in blob storage or Base64 encode it.

I've also heard of people storing those links in a database like Snowflake then creating displays on top using Tableau or Looker, to avoid having to build a web app from scratch.


It might be interesting to have something like "statistical logging", which saves the intermediate image files 1% of the time and discards them after 30 days.


Maybe wild idea: generate a unique text identifier for the image + an s3 url, log that identifier rather than the image. I guess its logging metadata rather than the actual data.


- unused memory is wasted, you may be able to store the raw image. - if your process is deterministic store a hash. - store a low resolution image.


Seems like the the key pillars are: freshness, volume, schema, distribution, and lineage.

Makes more sense this way, I think...

If you think about metrics, traces, and logs (software observability pillars) as three distinct things, it's hard to view metadata separate from metadata, lineage, or logs. Metadata is kind of the glue that holds everything together.

This article has more relevant sources, IMO, even if it is from a SaaS vendor.

https://www.montecarlodata.com/blog-what-is-data-observabili...


Off topic: there seems to be a growing trend at HN of posts reaching the homepage with a reasonable number of upvotes yet without comments.

I don't know how to proceed with these posts (and this one), yet the temptation of mentally flagging these as friendly upvotes or point hoarders is strong, and I must admit that such posts receive less attention and more suspicion from me.

YMMV.


OP here, I posted this a few days ago and was surprised to see it on the front page this morning. Not sure why it says I submitted 4 hours ago when I wasn’t awake, maybe the second-chance pool (https://news.ycombinator.com/item?id=26998308)?

But I’m also generally skeptical of high upvote/comment ratios, because as a long-time HNer too I also want to read things that are genuinely interesting. In this case, I can promise you neither I nor anyone on the team is soliciting upvotes for this post.

On that note, if anyone has any comments about the content itself, happy to discuss further.


Thanks for commenting constructively. As you have umderstood, my intention was never to point fingers at you or your article, but rather use it as a suitable context to confront with the HN crowd.

Thanks for having seen this from the start :)


Edit: I went and read TFA, and must say there were some red flags. CS people who add "PhD" beside their name are not only pretentious, but are trying to throw their academic weight around instead of letting their ideas and presentation stand on its own. Filled with more marketing fluff than useful information. Ugh.

I'm siding with you on this. I've "undowned" you and upvoted instead; Sorry xcamber!

--

If you're really concerned, email dang (hn@ycombinator.com) and ask him to look into it. As a sidenote, if you actually flag out of suspicion of a voting ring or other feelings without real evidence, it is abusing the power you've been entrusted with. Threads like this one are also way off topic, seems more considerate to submit an "Ask HN" post rather than hijack the story discussion.

The group dynamics are often surprising on HN.


OP here, I only try to write and share things that I find personally interesting, so if it came across as marketing fluff that was the opposite of what I was aiming for :/. But I do appreciate you reading the whole thing. FWIW I also thought including PhD might be pretentious.


Hi OP! It would be nice to have some examples to go with the article. Some set of minimum data and sample "lineage" etc.

This has broadened my perception of data though, I never linked this with the good old thermodynamic principles.


Thanks for reading! Including examples is a great point, because otherwise the article can be kind of abstract, especially because each person has a different mental model of data. I'll add some later on.

Maybe thermodynamics is a hammer that makes all things seem like nails, but the connections pop up all over the place. Entropy is another highly applicable concept to data systems.


I would say inlcuding CEO is far more pretentious. A PhD at least means something more substantial, because it requires an external certification.


Adding “PhD” and other credentialed titles is standard SEO practice these days.

The thought is Google sees the article as from a “credible source” and ranks you higher.


You could try reading the article and commenting with your own original thoughts.


What makes you assume I did not read?

How shocking is it that, on some article and topic, I do not have a comment I deem interesting enough to share?

Besides, if you do not appreciate my specificly off-topic contribution, then so be it.


> What makes you assume I did not read?

OP didn't say you didn't read it. They said that you didn't read it AND attempt to start a meaningful discussion about it, which exactly what you were complaining about, right?


If you mean meaningful with regards to the topic, I'd agree with you and knew it from the start, hence the "off-topic" warning.

My intention was to confront my experience and behavior (towards certain categories of posts of high upvotes/comment ratio) with the rest of the HN crowd, in a contextualized environment where it applies.

I'm sorry that the conversation now revolves around my own comment. Kinda ironic.


All you need are logs.

Also, https://honeycomb.io looks pretty dope.


The honeycomb-dot-io case was bad, I attended their conference once and it was like I'd been suckered into a neverending timeshare sales tour. Thankfully I haven't noticed the domain on the frontpage lately.

Compare:

https://news.ycombinator.com/from?site=honeycomb.io

https://news.ycombinator.com/from?site=metaplane.dev

Both look like they want to game HN pretty hard. If only they'd publish actual novel or interesting information instead of thinly veiled SaaS marketing!


My guess is that people are interested with this topic and want to read discussions from other people, but don't really have anything to add right now. Sometimes I upvote topics using those thoughts.


Fair point. Although I must admit it affects (people like?) me directly - I make it a point to always go through the comments before I read the article as a matter of habit.


I often upvote articles because I have a passing interest in the topic and want to see the content/author get intellectually flogged by HN users.


There's not enough information shared about what metrics people monitor and why. Accurately measuring complex system performance and utilization over time is hard work! Observability platforms such as DataDog and New Relic are very expensive. Engineering teams repeat mistakes, recording too much information, getting the bill, and rushing high-priority pruning exercises through engineering to stop the financial bleeding. I encourage everyone to share their monitoring setups!


We use Atatus, that can help with application performance troubleshooting and optimization, helping us to identify slow database queries and optimize query performance.


nobody needs this new saas stuff. I prefer the traditional pillars of: emails from users, live chat feature in product where users shame you publicly if something is wrong, twitter search 'is X down', and 4) having laptop open in passenger seat on commute, tethered to blackberry, and periodically hitting F5 on the page which hits the most APIs




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: