My preference in these situations is always to start with buy, and then try to j...

snewman · on April 13, 2020

> Their profit margin is usually in their efficiency in doing it better than you

This is a very nice way of framing the question.

I think there may be at least two scenarios to consider. For some products, cost is primarily driven by R&D; for others, cost of operating the service (COGS) dominates. Of course it's a continuum, but I suspect there are more services near the ends than the middle.

For an R&D-dominated service, if your needs are nontrivial and there is an off-the-shelf product that is a decent fit, you probably want to go off-the-shelf. A commercial provider can amortize the R&D over a large customer base.

For a COGS-dominated service, the commercial provider may still be more efficient but it is less of a slam dunk. Log management (e.g. Splunk) involves a substantial operational component, because you're dealing with large volumes of data.

Disclaimer: I am the founder of Scalyr, a Splunk competitor. Early on we realized that, even though log management is COGS-heavy, we could achieve huge economies of scale, because query workloads are highly bursty. By carefully managing a central pool of resources shared by many clients, we're able to run much more efficiently than a single-tenant homegrown solution [1]. This is one of the key points that we highlight when having build vs. buy discussions with potential customers.

[1] https://www.scalyr.com/blog/searching-1tb-sec-systems-engine...

texasbigdata · on April 13, 2020

Seeing how an enterprise software chooses to articulate and differentiate itself in product and sales from the internal perspective is super interesting.

Are there other examples of "we chose to go to market like this (x, y, z reasons) and here's why"?

snisarenko · on April 14, 2020

Hi snewman,

I am currently using logdna as my splunk alternative, but it's not very good.

Why should I try scalyr next, over the other alternatives ?

My main constraint is the solution should be at least half as good as splunk, and cost less than $50 a month (I generate less than 1G a day).

Your pricing seems to be reasonable. But how do you compare to splunk on features ?

Here are the other tools I am considering trying

https://www.humio.com/

https://www.sumologic.com/

https://sematext.com/logsene/

https://www.datadoghq.com/log-management/

https://logz.io/pricing/

https://www.chaossearch.io/

https://www.papertrail.com/

https://www.loggly.com/

https://www.graylog.org/products/enterprise

snewman · on April 14, 2020

Thanks for reaching out. It really depends on what you're looking for, and I don't want to hijack this thread with a long discussion, so feel free to reach out at https://scalyr.com/contact. If you're focused on the power of the query language, then Splunk is the undisputed leader, but we are relatively strong as well; see https://app.scalyr.com/help/power-queries. This iteration of our query language was launched last year and was a lot of fun to build -- the team allowed me to write a fair amount of the code. :-)

You can find a more general overview at https://scalyr.com/product.

snisarenko · on April 14, 2020

Mostly looking for decent query language, decent charting, and API to grab chart data.

Seems like you guys fit the bill. I'll do the 30 day trial, and see how it goes. Thanks for the response.

pianoben · on April 14, 2020

I was an early Scalyr user circa 2014 or so, and have used Splunk at various gigs before and after. My current company is not a Scalyr customer and that makes me pretty sad - even back then, they were snappy and responsive in ways that Splunk has never matched. Despite (maybe due to?) being substantially cheaper, the service we received from the founders and operators was second to none; for example, I remember that someone even hand-wrote a Clojure EDN log-parser for us at no extra charge, just to make sure we were happy.

Steve says elsewhere in the thread that Splunk has the more-flexible query language, and while that may be true I as a product engineer never noticed the delta myself.

TL;DR I think you'd be doing yourself a favor to try Scalyr, and I have no affiliation other than being a happy former customer.

snisarenko · on April 15, 2020

Thanks. This is helpful information.

cthuen · on April 14, 2020

Hey snisarenko, you could use the Gravwell Community Edition which is free up to 2GB/day. 4GB/day if you participate in the current alpha testing program. https://www.gravwell.io/download

Disclaimer: I'm one of the founders. We built it to be a Splunk alternative and I think it does a great job from a data ingest, scalability, and data querying perspective. We're lacking some of the out-of-the-box capability, but not for long. Kits (our "apps") release is coming this quarter.

hinkley · on April 13, 2020

It's Pareto principle for me. >80% of our code should be off the shelf. The special sauce is 10% of the code. What should we spend that on? What will make us stand out?

If you can answer that, it's all the justification you need for building it. Anything you build that doesn't differentiate you can be bought by a competitor.

BiteCode_dev · on April 13, 2020

As usual, it depends. I know some friends that have profitable websites because they run on a private server and they handcrafted everything. If they had to use cloud services for serving stuff, host the db, provide search and cache things, it would cost too much and they couldn't make a living out of it.

rconti · on April 13, 2020

To me, this shows that the actual service/content they're providing has relatively low value, and so they're "making" money not just on the content but also on the services side; eg, they're a cheaper sysadmin than an outsourced one. And they're also possibly ignoring risk; eg, they don't have the data durability that a cloud provider does, but it Probably Won't Matter (TM).

There's absolutely nothing wrong with this way of making money; the lower the margins, the more watching your costs is crucially important, and the difference between making a living and not even breaking even!

IanCal · on April 13, 2020

Possibly, it may also be that the cloud providers are offering things they don't need. Maybe that's uptime guarantees, potential scalability or integration with other services.

BiteCode_dev · on April 13, 2020

Sure.

I just mention it because most people in the world have a job that make them less money than my friends web services.

And I'm assuming most startups, unless heavily founded, may have to think about this trade off as well.

The cost in infra can easily be a factor of ten.

aledalgrande · on April 13, 2020

If you take SaaS, they are skewed towards buy, because their margin is ridiculously high. Better to hit the market now by using stable products than optimizing on spend.

The bigger you are and the more you can think about optimizing spend.

aprdm · on April 14, 2020

I think it is quite the opposite. Having worked in startups that were VC funded it seems they consider VC money free money and just spend a lot of it in infrastructure without the need to.

I know for a fact our bill was around 15-20k USD per month for running a webapp that could be ran in digital ocean 40 dollars/month.

For a lifestyle business without VC money to burn it makes all the difference.

aledalgrande · on April 14, 2020

Oh I've definitely seen that too. My example requires responsible (and smart) leadership. My thought is that optimizing costs engineering hours, which are much more expensive than paying for the service bills.

turbinerneiter · on April 13, 2020

I agree with the idea, but I also think there are limits.

Let's say you build some kind of product based on some Azure/AWS product. The negative reading of it is: you spend money on innovation and have to pay Azure/AWS for infrastructure. Your product will constantly have to improve to keep up with the competitors, Azure/AWS keeps getting your money. And if your product becomes stable enough, they will make it part of their offering.

Another example would be Apps, with Apple later folding their functionality into the OS and kicking your app off the store.

But judging by my own examples, this migth be more of a platform problem.

NomDePlum · on April 13, 2020

Surely that very much depends on whether your product fits within the current or near future offering of Azure/AWS.

The other side of that coin is that you use a build horizontal integration strategy to replace elements of the products you buy from Azure/AWS if (and only if) it can be done in a way that increase your profit/value.

It can work both ways. Moving from buy to build also has elements of de-risking and economies of scale if you do it at the point where there is measurable success/product to market fit.

Building something you can buy too early runs the risk of premature optimisation if not done for the right reasons.

DanHulton · on April 13, 2020

Absolutely. This is largely what I'm betting on with my latest project, that the time you spend spinning up all the "standard stuff" for a SaaS project is duplicated - and thus wasted - effort throughout the industry. Why write another user login system?

Though it sets up an interesting situation for me, where the "special sauce" of _my product_ is the 80% "off the shelf" for my customers.

mumblemumble · on April 13, 2020

Another reason I like to start with buy: It makes scope creep a lot harder. You have a specific feature set, and it lets you do certain things, and not others. You may eventually find there are certain things it has that you can't live without, at which point you may still have to build, but that's offset by the fact that you'll probably also discover there are features you only thought you needed.

By contrast, whenever I've been on a project where the company went straight to building their own, it ended up being a quagmire of scope creep, and the development team would end up stuck on an endless treadmill of implementing features that stakeholders vehemently insist on, but never actually use.

Even for dev tooling and libraries. Some of the worst piles of technical debt I've encountered happened when some rockstar decided that they knew better than everyone else, and dived straight into building what they thought would be a better mouse trap. It's not that the in-house option isn't any more or less likely to solve the problem well (no comment), so much as that the in-house option is going to end up being more deeply tangled with the rest of the codebase. So you're stuck with it, because, even if it doesn't turn out to be all it's cracked up to be, migrating away from it may be next to impossible. Homegrown ORMs seem to always turn out this way.

The one exception is when you know that a quick-and-dirty DIY solution will be cheaper and easier, because it needs to do only one specific thing while the off-the-shelf option has to take on a bunch of extra complexity in order to be general enough for everyone's needs. But even there, I'm only likely to trust that argument if it's being made by someone who's been burned by the complexities of the off-the-shelf solution in the past, and who has a proven track record of obeying the KISS principle.

hyperpallium · on April 13, 2020

Modularity by purchase. Information hiding by ownership. Interfaces by corporation.

Brooks' software system products (9x the work) is usually done this way in practice. In most usage cases, I would also include open source projects.

It's so hard to get interfaces right and it's helpful to have many users. But I considered your point, of it helping to enforce boundaries.

weinzierl · on April 13, 2020

> Except in the case of Splunk. Their pricing is ridiculous.

Their pricing is ridiculous and what is even stranger is that with the ELK stack there is a good and free alternative. Now Splunk is good, no doubt, but I still wonder how they can be that successful with that pricing.

sethammons · on April 13, 2020

We ran the largest elastic search cluster on the west coast some number of years ago. If you are small enough, elastic search can go a long way. Our data teams and operations folks celebrated the day that that system was turned off. Elastic search does not even hold a candle to the way we leverage splunk at our org, but that could be because we are bigger than some and deal with scale that few others deal with. Splunk costs us a fortune but enables amazing data analysis. It is easily our most expensive service we pay for and worth it. It would be great for our bottom line if it were cheaper, but they can charge what they do because the do it so well.

aprdm · on April 14, 2020

Can you give us a sense of scale and what issues you faced by elastic?

dmitriid · on April 13, 2020

Splunk is insanely good. To the point that people forget that other tools exist, or that it makes sense to build specialized solutions when needed.

So people start running anything — from dashboards to analytics — from Splunk alone. Their ability to combine and extract data from almost anything in very small amounts of time is unparalleled IMO.

m0xte · on April 13, 2020

Lock in and one trick pony trained staff. I'm going to kill it in the next 12 months in our org if it doesn't kill me first.

jeremyjh · on April 13, 2020

Splunk is incredibly powerful though, and almost all of that power is available at query time. Replicating the functionality in ELK often means indexing changes, and so when you have a question that isn't answered by the index, you'll forgo the answer unless you really really need it. A very simple example is the 'transaction' command in Splunk, which I absolutely could not live without and often surprise myself with the keys I end up using to research a particular topic.

Cthulhu_ · on April 14, 2020

I think people are easy to underestimate the cost of building and maintaining a new piece of software. I mean yeah you can set up an ELK stack (just as an example) on your own hardware / cloud easily enough, but what will it cost in the long run? How many people will spend how much time on it? Or how many people will end up having to spread their attention between the core objective of your business and managing the ELK stack and other services you decided to build / operate yourself?

zmmmmm · on April 14, 2020

> My preference in these situations is always to start with buy, and then try to justify build

What if there are significant costs associated with that? Integration, training, friction to move to a new solution later, etc?

When things are commoditized it works OK but for anything complex and strategic, the cost of implementing a potentially bad solution can be absolutely crippling, especially if dependencies on that solution grow rapidly within the organisation.

jedberg · on April 14, 2020

Then those are all great justifications in the build column!

dzimine · on April 13, 2020

A story of Netflix adopting StackStorm for autoremediation is a nice illustration of this approach. They began to build a tool, learned enough to be dangerous, discovered StackStorm and used it, experimented with it like with a breadboard, until finally figured out exactly what works - and reimplemented it. All these time learning the system was working and delivering vale.

Part of this story is here https://netflixtechblog.com/introducing-winston-event-driven...

Disclamer: StackStorm founder here (but not pitching).

GiorgioG · on April 13, 2020

And Splunk is horrendously slow in my experience.

sethammons · on April 13, 2020

We self host splunk and it can plow through petabytes of high cardinality data pretty dang fast. If the fields are not indexed and the search is complex, it can take minutes or hours. But usually, I can get live and historic data in a few seconds.

As an example, we have a pipeline of services. I can compute the time spent in each service with multiple levels of percentiles and group that data by high cardinality fields (as in, hundreds of thousands or more values). I just did a search for 4 hours of data across thousands of nodes for half a dozen or so services with multiple eval statements all piped to a timechart doing over a dozen stats operations. Half a billion events. It got done in under a minute.

Splunk charges so much because they are just so dang powerful.

wenc · on April 13, 2020

This has not been true in my experience. I run a Splunk server in production and at my data volumes it has been very performant. It's also much easier to setup and maintain than ELK clusters.

In the early days Splunk pricing was exorbitant (we evaluated Splunk 7 years ago and dismissed it), but licensing has changed in recent years and it is now priced by volume ingested (the pricing is transparent and listed on their website now). At low volumes, the pricing is similar to Sumologic, and is pretty accessible now to smaller dev shops. Open-source collectors like fluentd also help to intelligently reduce the ingest volume.

At high volumes, the TCO changes quite a bit.

GiorgioG · on April 13, 2020

My experience is solely at work where we use Splunk Cloud and it's slow as molasses.

dzimine · on April 13, 2020

Does the speed matter? How exactly? I am genuinely curious: at Scalyr we _can_ be very fast but it is a balance with the cost that we want to pass on as price savings. Same with self-hosted Elastic: one can fine-tune it to be fast but minding the cost constraints gets it slower. WDYT?

m0xte · on April 13, 2020

Same experience. Also splunk forwarders appear to be universally unreliable.

sethammons · on April 13, 2020

we have extra monitoring for splunk forwarders, and even then we fail to notice them fail from time to time. Sigh.....

m0xte · on April 14, 2020

Yes same here. I actually monitor the throughput of the network interfaces on our forwarder with prometheus/statsd_exporter and if outbound is smaller than inbound it sets off alerts!

wenc · on April 13, 2020

Ours is self-hosted and is plenty fast. This might be something to bring up with the Splunk folks. Maybe their cloud side of things needs tuning.