Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I work for a big AI consultancy. Most of the time we build ETLs for the data-engineering side, in a client driven capacity-building effort. We do this because our focus is on Data Science, not data engineering, and we often work in situations where the client doesn't have an existing data science platform. It's simpler to build, to handover and later to maintain.

In projects where the client already have a mature engineering and data science department, we bring the big guns! The scope is usually much larger, with several workstreams and involve production-ready deployments. In this situation we might build upon what the client already have (ETLs), or initiate a full event-driven transformation with a "backbone" team responsible for creating a platform, and several use cases building upon it. In the usual scenario, a team would want to start large computations or simulations upon recieving a trigger event from a monitoring system (model drift) or a human operator ("what would be the impact in € of a small decrease in parameter X over the next 7 days of forecasted sales"?).

Even-driven systems are much more robust than traditional ETLs with a central data warehouse, but they are also much more complex to understand and operate. In the end, we rarely deploy them because they cost us way too much engineering time compared to the benefits. That's mostly because we spend >70% of our time dealing with "security teams" and "access issues". Seriously.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: