How We reduced our Google Maps API cost

disillusioned · on April 4, 2020

Well, that's more clever than a company I did due diligence for.

Their strategy was to have a pool of API keys attached to new accounts that would take advantage of the Google Maps API free tier, and monitor its usage. As the free tier usage would run out, the system would roll over to a new API key automatically.

Wrote that one up in big red marker in my report...

smnrchrds · on April 4, 2020

Genuine question: wouldn't this be considered write fraud?

Last time I asked this question [0] on a different story [1], the responses I got were that it definitely is wire fraud, but this is so mind-blowing that I would like to ask again to confirm.

[0] https://news.ycombinator.com/item?id=22362682

[1] https://news.ycombinator.com/item?id=22354357

dsr_ · on April 4, 2020

It's violation of terms of service, and possibly a violation of contract, but it's clearly a civil matter, not a criminal one.

bryanrasmussen · on April 4, 2020

hmm, not sure, some DA might want to prosecute it as theft of service.

graton · on April 4, 2020

https://arstechnica.com/tech-policy/2020/03/court-violating-...

A snippet from the article: A federal court in Washington, DC, has ruled that violating a website's terms of service isn't a crime under the Computer Fraud and Abuse Act, America's primary anti-hacking law. The lawsuit was initiated by a group of academics and journalists with the support of the American Civil Liberties Union.

bryanrasmussen · on April 4, 2020

Sometimes my simplest comments get misunderstood the greatest; if someone were to want to take this to a criminal court it would be idiotic to go for violating terms of service as the reason, especially for the link you provide, but as I said in my post someone might want to try this as theft of services.

Did you think my use of the word services was somehow related to terms thereof, because no. So to quote wikipedia because it was the first that came up when I googled "Theft of services is the legal term for a crime which is committed when a person obtains valuable services — as opposed to goods — by deception, force, threat or other unlawful means, i.e., without lawfully compensating the provider for these services", do you see how someone might argue that changing out the api key could be seen as a form of deception?

I am not saying that I would think it right that someone bring this to criminal court (I figured I better put that out there as even the simplest of comments can be misunderstood, so who knows what several paragraphs together might lead to), I am not saying that they would even win, I am not saying anyone who did it would be doing so for the purest of motives. But I am saying it does seem something like theft of services by using deception.

on edit: the theft and new api keys refers several ancestors back to this anecdote "Their strategy was to have a pool of API keys attached to new accounts that would take advantage of the Google Maps API free tier, and monitor its usage. As the free tier usage would run out, the system would roll over to a new API key automatically."

nitrogen · on April 4, 2020

Presumably there are other laws available, e.g. ones related to stealing cable TV service.

jsjddbbwj · on April 4, 2020

What contract? You don't sign a contract when you create a Google account (which is basically what you need to create an API key with access to the free tier)

judge2020 · on April 4, 2020

These terms[0] are, in general, legally binding (especially as you're a business signing them and not just a person), and it's obvious bad-faith to do this, making any sort of lawsuit hard to fight. While they most likely won't actually take you to court over this, you risk suspension of your main GCP account.

> 3.3 Restrictions.

> Customer will not, and will not allow third parties under its control to: ... (d) create multiple Applications, Accounts, or Projects to simulate or act as a single Application, Account, or Project (respectively) or otherwise access the Services in a manner intended to avoid incurring Fees or exceed usage limits or quotas;

0: https://cloud.google.com/terms

dmoy · on April 4, 2020

Elaborating on above, not all contracts are signed. Not all contracts are even written - some are verbal or implied.

DeathArrow · on April 4, 2020

It's not clever, it's just much more simpler. Using linear interpolation for the time between 2 stops will have low accuracy because for the particular situation time might not be linear with position and distance. Also traffic incidents might happen.

Going with a pool of free keys will be much more dependable, even if somehow more complicated to manage and easier to break.

bad_user · on April 4, 2020

It's probably against their ToS and Google could detect it and ban all of those accounts, or the servers' IPs.

You can then keep fooling them by creating new accounts or changing IPs (assuming your usage doesn't have clear patterns they could look at).

But such events would be clearly disruptive for the business. Works for a POC, but if your business has actual customers, it's a terrible solution.

disillusioned · on April 4, 2020

This company was shockingly deep into their lifecycle to still be using this approach. And yeah, they'd cycle IPs as needed too. I think the thought was that Google isn't doing a ton of fraud analysis for this particular modality of fraud. Still though...

DeathArrow · on April 4, 2020

You don't have many options if you need high accuracy: you have to pay a lot or try to trick Google which might be both immoral and against the law and for sure is tricky, hard to maintain and you can't count on it in the long run.

Let's hope there will be alternatives to Google provided traffic data. For now they seemed to monopolized it by offering it for free while losing money to discourage competition.

raverbashing · on April 4, 2020

> You don't have many options if you need high accuracy: you have to pay a lot

What happened to actually trying to solve problems with programming?

Interpolation is one solution. Caching is another. Temporal analysis. Put everything together.

You don't need to query the magic Google box for every small update you make (and they might get that info from the transit providers, which given my experience are not that great sometimes).

sdenton4 · on April 4, 2020

Linear interpolation gets you 9X% of the way there for cheap, though. You can then come up with strategies for attacking the last 10% at somewhat higher cost instead of committing the entire stack to the high-cost strategy.

For example, if you ascertain that a bus is more than 2 minutes late, switch to polling that bus more often until it makes it to its next stop. And then switch back to linear interpolation once it gets to that stop. But you'll pay a little bit more for the added accuracy.

Morale of the story: if you want to get high-resolution real-time data, you (and your customers) have to pay for it, as that shit ain't easy.

mjburgess · on April 4, 2020

> So, we ensure a maximum of 20 meters between two location coordinates to improve accuracy of information.

Hardly "low accuracy".

The key change is modelling the problem as one of routes rather than journeys. Since a route can be "polled" at a certain resolution & used regardless of the number of journeys on it.

> This approach made the API calls independent of the number of vehicles and dependent only on our stops, which helped us in scaling up our fleet with no additional cost.

nebulous1 · on April 4, 2020

> Since a route can be "polled" at a certain resolution

Unless I'm misunderstanding, they aren't polling these inter-stop coordinates

joking · on April 4, 2020

nobody will notice the difference between 4:00 minutes and 4:30 minutes, is the difference of stopping on a red light or finding it in green.

the_gipsy · on April 4, 2020

I worked at a company where they did the same thing, for a different Google API.

It didn't work out, and it still baffles me how anyone thought it could.

toomuchtodo · on April 4, 2020

It usually works until it doesn’t.

masonhensley · on April 4, 2020

The wrong kind of “do things that don’t scale”

mattlondon · on April 4, 2020

From what I understand (I have no insider info/hints or anything related), this sort of thing was precisely why Google radically changed the pricing structure to make the free tier much smaller for Maps API recently.

I.e. Google knew there was rampant abuse by people like this (this example is not the only thing like this I have heard of...) so Google fixed the glitch and in the process ruined it for all the people genuinely using the service's free tier.

This is why we can't have nice things :) I guess we are lucky that Google didn't decide to just cut their losses and close the whole shebang down - that would be a shame as Google maps is really useful IMO.

hinkley · on April 5, 2020

I discovered that our preproduction servers use the free tier while trying to work on some CI/CD issues. Normally we never get anywhere near the limit, unless someone (say, me) is trying to work on the test suite during a time of day when a lot of pushes are happening. Had a few enforced breaks there for a little while.

pas · on April 4, 2020

Is that okay in terms of the ToS? How frequently did someone have to create new accounts?

Won't Google just ban the server's IP?

9nGQluzmnq3M · on April 4, 2020

Needless to say, that's very much against the TOS and it's a matter of time until they get blocked. Surely at some point it'd be easier and cheaper just to pony up for a license?

PudgePacket · on April 4, 2020

... obviously not? :)

mv4 · on April 4, 2020

Pooling free-tier accounts is usually against ToS.

popotamonga · on April 4, 2020

Easier to use a random key, normal distribution.

idontusegoogle · on April 4, 2020

[flagged]

filoleg · on April 4, 2020

The whole chain of replies by this user in this thread sounds like the obnoxious vegan stereotype of the tech world.

How do you know someone is not using google products out of some moral grandstanding principle? Dont worry, they will let you know, even if it is just some thread that is only tangentially related to the topic (and their username will likely tell you as well).

tantalor · on April 4, 2020

Fraud.

AnotherGoodName · on April 4, 2020

I did the same years ago. We were providing realtime suburb data for a fleet of trains. Each train received a GPS coordinate once per minute, we took this and displayed the suburb. So 1440 updates per day per train. For the fleet it was going to be over $100 a day in API costs.

We were going to not display suburb data because of cost. In the end I found a creative commons placename database (geonames.org). For placenames with >500 people it's ~10MB of data and that covers the entire planet (surprisingly small). I then wrote a KD-Tree based library to look it up the nearest point in this table extremely efficiently (log(N) time).

I'll admit i haven't updated or maintained it. The server running it has been chugging along well though >5years later. https://github.com/AReallyGoodName/OfflineReverseGeocode

nerdponx · on April 4, 2020

We were going to not display suburb data because of cost. In the end I found a creative commons placename database (geonames.org). For placenames with >500 people it's ~10MB of data and that covers the entire planet (surprisingly small). I then wrote a KD-Tree based library to look it up the nearest point in this table extremely efficiently (log(N) time).

At a previous employer I tried to convince my managers to let me do this for months. They always balked. Their loss.

eruci · on April 4, 2020

I built a similar open service at https://3geonames.org/api . It supports XML and Json output and is entirely free. Eg: https://api.3geonames.org/51.4647,0.0079.json

saila · on April 4, 2020

Have you used PostGIS at all? It seems like a good choice for this kind of lookup.

https://postgis.net/

AnotherGoodName · on April 4, 2020

PostGIS would be worth it if there was a lot more to do on that project and full respect to that project. I remember looking at it and it felt like it was a sledgehammer for that particular job. You still had to write up a lot of code to import the dataset you had and then extract it again. Not to mention maintenance since you now have a whole dependency chain installed on your server. This was a single engineer project.

The above solution is a copy paste-able set of classes with zero dependencies that will output the nearest place. Sometimes a single purpose solution is perfect and i'm really not kidding when i say i haven't maintained or even looked at it in over 5 years yet it's still running fine as part of a larger application.

BubRoss · on April 5, 2020

Why wouldn't you cache the results instead of making 1440 queries per day?

AnotherGoodName · on April 5, 2020

You still have the same problem. The input is a GPS coordinate. You can just return the same value as the last X minutes but that obviously sucks since you could be X minutes off in the real time suburb of a fast moving train. You could find the nearest to the cache but then you might as well just have a list of places you're finding the nearest again. Which is what i did.

BubRoss · on April 5, 2020

I didn't realize the train coordinates were what the API was for. Were these public trains? How does google have their gps coordinates and no one else does?

AnotherGoodName · on April 6, 2020

We installed hardware on each train to monitor them. So from each moving train we received a coordinate every minute to our servers.

BubRoss · on April 7, 2020

Are you saying that you did that to switch off of google's API?

AnotherGoodName · on April 7, 2020

No. The only part of Googles API we considered was reverse geocoding. The train data was always ours.

BubRoss · on April 7, 2020

Right... but that would mean that what you are querying google for doesn't change. The coordinates of the trains would even only be along their route. Once you have done a reverse lookup on enough points along the route, you should have a pretty dense idea of where the neighborhoods change, right?

I am not sure if I'm missing some big, but it really seems like there wouldn't be an ongoing need for querying google's APIs in this case. Finding the boundaries of the neighborhoods and especially labeling stops should be easy enough to do if it would save so much money.

superpermutat0r · on April 4, 2020

If you track the vehicles every day and collect the location data, you can easily augment the Open Source Routing Machine to give you traffic accounted estimates.[0] Combined with some Kalman filters you'd get almost perfect estimates when live.

Of course, this is for a use case where you have similar routes every day, this allows you to really tune the Kalman filters.

0: https://github.com/Project-OSRM/osrm-backend/wiki/Traffic

cookie_monsta · on April 4, 2020

My understanding is that Google's does real time traffic reporting so well because it's constantly pinging the location of android devices. There have been several write ups on how to spoof it, but historical data is never going to be a match for a feed like that.

thejosh · on April 4, 2020

If you carry lots of android devices it will alter the traffic on Google Maps - https://www.youtube.com/watch?v=k5eL_al_m7Q

throwaway2048 · on April 4, 2020

I find it insane that the fact that Google is constantly harvesting device locations is blatantly obvious from things like this, but yet if you make that claim in other contexts people will be extremely skeptical and demand extensive proof.

Danieru · on April 4, 2020

Those phones all had Google Maps app open. That Google is harvesting device locations while you have a Google app open showing your GPS location should not be a surprise.

What would be a surprise is if Google forced on your GPS to gather your location in other contexts.

gog · on April 4, 2020

It is not a surprise, it is already happening. You can turn it off, but AFAIK if you don't it collects this information even if the Google maps application is not open.

apacheCamel · on April 4, 2020

From my personal experience, I have an Iphone and it asks me about once a week about Google Maps using my location in the background and if that is okay. I always tell it no. I didn't know I could turn it off directly in the app, thank you for the information.

throwaway2048 · on April 4, 2020

Google absolutely harvests phone GPS data to enhance things like AGPS (mapping of wifi MAC addresses, or celll towers to location coordinates), and it does this regardless of the location setting in the system UI.

https://www.theverge.com/2017/11/21/16684818/google-location...

cookie_monsta · on April 4, 2020

GPS is definitely the best way of ascertaining a device's location but far from the only one. Google's own location API lets you access fine location (GPS) or coarse (triangulating via cell towers). After public outcry and legal action Google stopped driveby data harvesting by Street View vehicles but it remains unclear if they stopped collecting SSIDs and MAC addresses (I guess if you squint hard enough you could say that these are publicly available data points).

It doesn't take too much tin foil to think that android may be phoning home when it detects an SSID, the physical location of which is already known.

jdofaz · on April 4, 2020

I have to keep location services (android) disabled to keep my daily movement from appearing in google location history.

FireBeyond · on April 4, 2020

And Apple too - I am constantly seeing this (traffic/congestion flags) on side and residential streets with no traffic monitoring devices in my Apple Maps.

tener · on April 4, 2020

It sounds like you could use Google's live traffic info to augment your own predictions. You should be fine with just a few API calls. This would be pretty cheap - perhaps even within the free tier.

missosoup · on April 4, 2020

It would violate the ToS.

cookie_monsta · on April 4, 2020

Which part of the ToS (that isn't getting violated already)?

The real problem with this approach is that you will never know when real time is conflicting with historical unless you're calling the API constantly anyway

rfrey · on April 4, 2020

Google's ToS says you're not allowed to do any computation on your own?

I mean, I'm not disputing it, I just hadn't realized we were already there.

progval · on April 4, 2020

Section 3.2.3 of the Google Maps ToS

> (c) No Creating Content From Google Maps Content. Customer will not create content based on Google Maps Content.

> (d) No Re-Creating Google Products or Features. Customer will not use the Services to create a product or service with features that are substantially similar to or that re-create the features of another Google product or service.

https://cloud.google.com/maps-platform/terms/

tener · on April 4, 2020

For the use case described in the article it sounds just fine. The content part is a big vague. If read very broadly it would be super prohibitive. Perhaps it is?

baq · on April 4, 2020

i don't see how using this is different from using the directions api for the purpose in the article?

_flbt · on April 4, 2020

Where is the best place to get/purchase traffic data for use with OSRM?

karussell · on April 4, 2020

If they have their own fleet they can generate their own (historic) traffic data e.g. via Map Matching and use an open source routing engine like GraphHopper with OpenStreetMap data. (disclaimer: I'm one of the developers of GraphHopper.)

It is unclear to me whether their current practice is in harmony with the Google Maps TOS. For some places there are also open traffic data sources: https://github.com/graphhopper/open-traffic-collection

pedro_hab · on April 4, 2020

Yeah, I think you can't cache gmaps data, each user has to request at the time of use, "to ensure the freshness of the data".

But I'm not sure if it applies to this use case.

samcheng · on April 4, 2020

For what it's worth, I found OpenStreetMaps and the Open Source Routing Machine to be sufficient for our purposes.

If you don't want features like real-time traffic awareness, it's worth investigating the open source tooling. It can save a LOT of money.

Doctor_Fegg · on April 4, 2020

PSA: Open Source Routing Machine (OSRM) was largely abandoned by its maintainers. Several of us are working to reboot it, so if you enjoy map data and/or graph theory and have C++ skills, this would be a great project to work on.

https://github.com/Project-OSRM/osrm-backend/

(Reboot discussion at https://github.com/Project-OSRM/osrm-backend/issues/5209)

fulafel · on April 4, 2020

That issue links to another open source routing engine for OSM (Valhalla) that is actively developed, what are the strengths of each?

Doctor_Fegg · on April 4, 2020

OSRM is astonishingly fast (it uses the Contraction Hierarchies routing algorithm, or alternatively Multi-Level Dijkstra). This is compelling for draggable routing UIs, and for large matrix calculations used in the Vehicle Routing Problem (Travelling Salesman). It also makes it easy to customise your routing weightings through Lua 'profile' scripts.

The principal downside is that the routing graph takes a lot of time and memory to prepare; runtime RAM usage is also high, though not so much. I think there's some potential for reducing its memory footprint.

Valhalla builds on the older A* algorithm, so it's not so fast (or memory-hungry), though it does make some use of hierarchies to shorten query time. Graphhopper is another, featureful routing engine designed for use with OSM data (written in Java).

Youden · on April 4, 2020

I'm a big fan of OpenTripPlanner [0]. I mainly use it for accurate isochrones of public transit networks (super handy for figuring out where to live) but it has support for foot, bike and car routes as well. There was some PoC code to support traffic data [1] but it was removed [2], I believe because nobody was interested in maintaining it. If someone wants to add it back and maintain it I doubt they'd object.

[0]: https://github.com/opentripplanner/OpenTripPlanner

[1]: https://github.com/opentripplanner/OpenTripPlanner/pull/2077

[2]: https://github.com/opentripplanner/OpenTripPlanner/pull/2698

yetihehe · on April 4, 2020

+1, I've made my own geocoding/geodecoding on openstreetmap data, it worked 10x faster than google and with our level of usage, server essentially pays for itself just for 20% of resources. With our level of usage, we would burn through free tier in several hours.

zylepe · on April 4, 2020

Nice! Is this open source or based on an open source project? I’m interested in doing something similar for one of my projects.

yetihehe · on April 4, 2020

Unfortunately no, this was entirely custom closed source code and customized for one specific use. But you can try making your own map server, when you have data in postgis and search a little, geodecoding is rather easy. Just find geometry with specific tags near your point, then select nearest street number, street name, administrative zone names. I've tested first on one country (low space requirements), importing whole europe can take several days even for multicore servers with ssd.

sudhirj · on April 4, 2020

I think real time traffic awareness is exactly what the poster wanted, but yeah it seems like people who don’t need that would be better off using OSS where possible.

kwhitefoot · on April 4, 2020

If they run a lot of buses they have some traffic awareness built in to the system. I dare say not as good as Google but a knowing that a bus that is already on a segment that another bus is due to enter is delayed surely allows you to adjust the predicted arrival time of both buses.

samcheng · on April 4, 2020

Yeah, unfortunately I'm not aware of any open/free sources of real-time traffic data.

Reason077 · on April 4, 2020

MapBox makes their global real-time traffic data available on commercial terms. Not free, but could be a credible alternative to paying Google prices:

https://www.mapbox.com/traffic-data/

karussell · on April 4, 2020

Here is a collection I started some time ago: https://github.com/graphhopper/open-traffic-collection

samcheng · on April 5, 2020

These seem to be average / historical data sets, not "is there an accident / traffic jam on my route right now."

Thanks for the excellent work, though!

DeathArrow · on April 4, 2020

Maybe there is the opportunity for someone starting an open source solution to provide real time traffic data.

Until now there wasn't a need since Google provided the data for free.

But everything changed and I bet there are lots of people who need the data but can't afford paying Google.

The only question is how you gather the data since you don't have Google maps and Waze?

You can try to provide a Waze alternative, but how would you convince people to use it?

hkeide · on April 4, 2020

The various European governments have free data but it’s not very granular.

superpermutat0r · on April 4, 2020

OSRM can be augmented with your traffic data.

https://github.com/Project-OSRM/osrm-backend/wiki/Traffic

It works quite well.

winrid · on April 4, 2020

+1 for OSRM. Switched after a bad API call cost me thousands :p

kartayyar · on April 4, 2020

This seems to be violating the Maps API terms of service?

https://cloud.google.com/maps-platform/terms

(a) No Scraping. Customer will not export, extract, or otherwise scrape Google Maps Content for use outside the Services. For example, Customer will not: (i) pre-fetch, index, store, reshare, or rehost Google Maps Content outside the services; (ii) bulk download Google Maps tiles, Street View images, geocodes, directions, distance matrix results, roads information, places information, elevation values, and time zone details; (iii) copy and save business names, addresses, or user reviews; or (iv) use Google Maps Content with text-to-speech services.

(b) No Caching. Customer will not cache Google Maps Content except as expressly permitted under the Maps Service Specific Terms.

xtony · on April 4, 2020

Caching is allowed (for up to 30 days). Here are the specific terms for the API in the article: https://cloud.google.com/maps-platform/terms/maps-service-te...

"Customer can temporarily cache latitude (lat) and longitude (lng) values from the Directions API for up to 30 consecutive calendar days, after which Customer must delete the cached latitude and longitude values."

cookie_monsta · on April 4, 2020

But lat/lng pairs are pretty useless on their own, particularly with a travel times app

DeathArrow · on April 4, 2020

That might be a mistake on their side which they might repair now that they started to ask for lots of money for the data.

wstuartcl · on April 4, 2020

rofl. funny.

KingOfCoders · on April 4, 2020

We moved to Stadia Maps with significant cost reduction, but Stadia matched our use case quite well, I can see others need to stick with Google Maps.

https://www.eventsofa.de/campus/migrating-away-from-google-m...

Shoutout to Stadia Maps CEO who was always very very helpful.

lukeqsee · on April 4, 2020

Thanks for the kind words, KingOfCoders!

I'm cofounder of Stadia Maps, and I'm happy to field any questions folks may have.

DeathArrow · on April 4, 2020

The conclusion I get from this is not plan anything on "free" features promoted by SaaS providers because that can hit you hard in the future. Either plan based on something payd and covered by a contract or try to come up with another solution, maybe an in-house solution, maybe a free solution with a failover based on another provider.

sm2i · on April 4, 2020

At a cloud storage company i worked for we had the most cost-intensive part of our infrastructure, the object storage ready to be switched over from Azure to AWS, just in case.

mkchoi212 · on April 4, 2020

Random thought but I think it’s funny how about 40 years ago, people would write about their tricks to save program memory usage. Now, people are talking about how to shave costs of doing API calls.

CamperBob2 · on April 4, 2020

Meanwhile, the old maxim "All programming is ultimately an exercise in caching" seems to have been forgotten entirely. This whole thread is just surreal.

dev_tty01 · on April 5, 2020

Yeah, it is amusing. Given the small set of fixed routes, this problem is just crying out for "local" caching of the expensive data. I guess what is old is new again...

einpoklum · on April 4, 2020

As others suggest, switching to OpenStreetMap is something to consider.

However - it is also important to _contribute_ funds to OpenStreetMaps - for cityflo and for us.

Donations: https://wiki.osmfoundation.org/wiki/Donate

Individual membership: https://wiki.osmfoundation.org/wiki/Membership

For organizations: https://welcome.openstreetmap.org/how-to-give-back/

The OSM Foundation in general: https://wiki.osmfoundation.org/wiki/Main_Page

chowes · on April 4, 2020

They mention this in the article

rkda · on April 4, 2020

Author mentioned OSM but they could also use Mapbox's navigation APIs if they don't want to host things themselves.

https://www.mapbox.com/navigation/

Reason077 · on April 4, 2020

It would certainly be interesting if they had compared the quality of Mapbox’s traffic data to Google’s. Perhaps the authors tried this and the data wasn’t as good? But it’s not mentioned.

OpenStreetMap itself is just the base map, not a provider of traffic data or routing.

rkda · on April 4, 2020

Yeah. Mapbox is getting there but I don't think they have the same volume of users contributing traffic data as Google does.

lucb1e · on April 4, 2020

Indeed, and it will be very hard getting anywhere near as good as Google's without owning Android which nobody else could possibly do.

I still think location data collection for the public good should be run by someone who is not Google / Facebook / some other company that doesn't already track everything about you. I'd be happy to send my real-time location to a company like Mozilla or some mapping company (just in the tiny netherlands I know of AND, Geodan, OsmAnd... I'd be fine if any of them organize this).

We also have some open data via the government, based on detection loops in the road, but if I remember correctly it's only (or mainly) on highways.

graynk · on April 4, 2020

We found Mapbox incredibly helpful and easy to use, these guys are great.

I still can't believe Google does not have a GUI to both style a map AND add markers to it, but they have GUI for those things separately.

kijin · on April 4, 2020

Since they have actual vehicles on the ground, they could collect data on actual travel times under various traffic conditions and apply a bit of ML to predict how long it's going to take under current conditions. No need for any Google API there.

But now that their Google bill is only ~$50/day, it might not be worth building their own prediction system.

DeathArrow · on April 4, 2020

If ML could predict everything with accuracy, they would be better by predicting lottery, stock prices or Forex pairs.

I used an app which predicts public transportation arrival time based on historic data and it mistakes most often than not, sometimes by a lot.

srean · on April 4, 2020

To me your comment sounds unnecessarily rude.

I guess you are reacting to private smug conversations with people who eulogize ML but are clueless about what they are talking about. Usually the amount of conviction they add to their comments and advice is inversely proportional to their knowledge. I am tired of those too, to the point of being allergic in spite of being an ML practitioner myself.

The parent comment could have been worded better but I think you will agree that there is scope for using ML or statistical modeling of some kind to optimize the use case both in API cost and accuracy. Making some minimal number of API calls would very likely be part of such a system.

manigandham · on April 4, 2020

Technically, that's what Google's API is doing anyway, just with much more data, training and resources.

barrkel · on April 4, 2020

Taking this approach to its logical conclusion, one could sample Google to infer a congestion heat map for all the main routes in an area, and calculate the transit time estimates directly from the inferred heat map.

IME congestion is also often modal, there's probably reductions in sampling effort you could make by noticing the patterns in how commuters route, whether school is on break, and what the latest roadworks are.

blumomo · on April 4, 2020

So could increasing pricing under circumstances help to make your customers more creative, more efficient and more ecologically sustainable by making your costumers consume less of your computing power and bandwidth?

ant6n · on April 4, 2020

Does it matter whether google or the consumer of the google api caches data?

lrem · on April 4, 2020

Placement of the cache definitely matters, even more so if you can take advantage of the more specific data freshness requirements.

Disclaimer: I work in Google. But I also worked a bit on effects of cache placement in my PhD.

boojing · on April 4, 2020

Why is the fact that you work at Google and have researched caching for your PhD a disclaimer? I fail to see how it denies your previous statement

lrem · on April 4, 2020

Actually dataflow is right, I should have written "disclosure". Which I am contractually obliged to in all discussions relating to Google.

Then, I might be perceived as biased due to that employment. I might even be actually biased. Better disclose that up front.

dataflow · on April 4, 2020

People say "disclaimer" when they mean "disclosure" for some reason.

ptsneves · on April 4, 2020

Because it is argument of authority, that he works with the subject company and has subject matter expertise. Take it however you will

arpa · on April 4, 2020

Caching responses might be against ToS, at least with Maps API, AFAIR, you are not allowed to cache (some of?) the results.

melbourne_mat · on April 4, 2020

They mention a 94% budget saving but there's no mention of the drop in accuracy (which I'm sure has occurred)?

cookie_monsta · on April 4, 2020

That's the one line that kept me reading 'til the end (and never appeared, even though other approaches were dismissed on accuracy grounds)

punnerud · on April 4, 2020

Why not save the route, decode the polygon into its respective coordinates and use that to find overlapping roadparts? This would work all of Google Maps, also for bikes or walking.

Example here on how to decode: https://github.com/geodav-tech/decode-google-maps-polyline

austurist · on April 4, 2020

It is a proof of that Google API is overpriced. Google apparently sells something, that can be computed as well.

cookie_monsta · on April 4, 2020

If the quality of every dataset was equal this argument might be more compelling but we know it's not so this rings a little untrue.

Besides, I suspect that Google knows exactly what their data is worth and keep a very close eye on the price/demand model.

Just because something is subjectively expensive doesn't make it objectively overpriced.

izacus · on April 4, 2020

Everyone of us software developers sells something that can be computed or made as well. For insane prices no less.

remus · on April 4, 2020

That's a pretty cynical interpretation. To me it seems more like they took an easy engineering approach when the cost per API call was low, then when the price went up they re-designed their software to use API calls more frugally by using some domain specific knowledge (that google couldn't know about so wouldn't be able to build in to their pricing).

prepend · on April 4, 2020

I wonder how Google prices this api? The actual cost must be difficult to determine because if the source data.

Perhaps they have some margin formula for “buying” data from their internal sources.

It seems to me that they are value pricing and since they are close to a monopoly on this service they are testing the price elasticity.

filoleg · on April 4, 2020

The data for traffic is usually needed in real time, not just dumping historical data in one call and sitting on it. With that in mind, the most obvious pricing policy seems to be to either charge per API call or to have subscription tiers where you have a cap on maximum number of API calls per hour or per day or whatever other time interval they choose.

prepend · on April 5, 2020

Sorry, I meant the price it sets for the api calls. The article calls out $10/1000 so the current price is a penny per call. I’m interested in how they set that rate.

bartkappenburg · on April 4, 2020

We also had an increased Maps bill after the new pricing model for each map load on a detail page for a specific real estate object. Our map is now disabled by default and can be loaded with a “show map” button on a static cached map image.

cma · on April 4, 2020

Seems like the more frugally people use it, the more they can up the price.

asdfasgasdgasdg · on April 4, 2020

Only if the goal is to stop getting people to use it. If the pricing of the API was meant to be reflective of the cost of running and to encourage customers to use it more efficiently, then increasing the price beyond what they have done is maybe not a good idea.

Enginerrrd · on April 4, 2020

I really don't mean to be snarky, but if this counts as an innovation worth writing about, I am severely undervaluing my worth.

darkerside · on April 4, 2020

I wonder if they simply tried calling Google and negotiating a volume based discount. I realize Google is notoriously impersonal, but I always prefer if you can solve a problem without code.

mdorazio · on April 4, 2020

I worked with a major automotive company that was upgrading its head units to be more user friendly. Their #1 customer request at the time was to have it use Google Maps, so they tried to negotiate a discount with Google. I can't share the discount amount, but it was laughably small and the company would have been shelling out millions of dollars a month to Google for the integration. Needless to say, they stuck with a different mapping provider.

lucb1e · on April 4, 2020

I was actually kind of surprised that my spouse's new company car did not come with either Google Maps or OpenStreetMap but instead credits the map data to some mapping company from eastern Europe if I remember correctly -- the data looks like whitelabeled TomTom or HERE data.

I just wish they would either augment their data with OSM or augment OSM with their data: now, with proprietary data, the data was already old when we got the car, and the fixes I contributed to OSM are obviously not in there.

Instead of contracting some middle man for inferior map data, they could have used completely free (as in beer) data for the regions where OSM is more up to date and complete than any commercial provider (empirically, this is most of the land mass on earth plus countries with a lot of mapping enthusiasts and/or free government data, like Germany and the Netherlands).

mdorazio · on April 5, 2020

For what it's worth, traditional (i.e. not Tesla) OEMs generally had a "no open source anything" policy for a long time, for a variety of reasons I don't really agree with. That's changing a little bit these days, but not a whole lot.

aimad41 · on April 4, 2020

I've build my own tile server map with mapbox gl spec, thx to them, with simple osm features, data about transportation is fair enough to get some basic calculation features, i mean it's a good start, Google maps pricing is insane, if u plan to support thousands of users like us this gonna kill your profitability, we have no money to invest for that part and imo u should be vendor less if u want to develop ur activity til u buisness model works at least

lucb1e · on April 4, 2020

Hey, just a heads up: the "thx", "u", and spelling mistakes make your comment less attractive to read. I'm not saying your English has to be flawless, but things like using "u" instead of "you" is a choice.

devit · on April 4, 2020

Does the API not give timing info for parts of the route like the website does? If it does, then they could just query for first stop to last stop and infer all subtrips.

Also given they seem to own the buses, they could give the drivers a rooted Android phone and get the data by reading the memory (or network traffic after reading the TLS key from memory) of their running instance of Google Maps that they are using to drive, which should be undetectable by Google.

fedorareis · on April 4, 2020

While that seems like a good start it seems like you would want to supplement that approach with the GTFS and GTFS-realtime feeds from transit agencies where available.

Having worked for a company that provided transit software I am well aware that GTFS-realtime data isn’t always accurate (especially the ETAs). But it seems like combining their ETA estimate with yours and the scheduled arrival time would be a reasonable approach. Or if you just started building a model of the average speeds of the bus on each link using the GTFS-r speed and location data you could probably build up a pretty good historical model of the traffic flow between 2 stops at any given point during the day.

Yes, I am aware that scheduled times in transit can be meaningless. There are a lot of factors that contribute to that from traffic to poor schedule planning. They can however be a useful baseline estimate.

ankit219 · on April 4, 2020

I like the assumption made while the bus is between two stops but that could be very inaccurate in a geography like Mumbai. I have seen your posters at one of the metro stations and assume that you operate in the city. I feel the biggest choke point in the city is the number of signals - around the metro stations - and hence the uncertainties around timing before and after crossing the signal. Traveling in a cab, this is where Google maps timing fails too by about 5-10 minutes. Not a big deal as a customer, just a mildly interesting titbit in my daily travel (before Corona)

One thing to look at given you would have everyday data is to see if there is a pattern (if not already did it). Then you can suggest commuters to take the bus now and save commute time v taking after 15 mins.

bluesign · on April 4, 2020

This takes accuracy hit for sure, total time of route can be different than the total time of segments. Especially if the route is long.

iforgotpassword · on April 4, 2020

Why? That would entirely depend on how Google implements this. Ideally the result would match up exactly. If not you can certainly approximate the error introduced by this method well enough. They already needed to have something in place anyways to account for the bus actually stopping at all the stops. The API doesn't know about this in their old approach.

amelius · on April 4, 2020

Only works if you have a fleet. Perhaps we should share the data so even people without a fleet (or a small fleet) could benefit.

thelastbender12 · on April 4, 2020

This is a pretty nifty solution! To nitpick, I'm curious how much the ETA between successive stops fluctuates.

For example, if you have 2 buses on the route `A-B-C-D-E-F-G-H`, using the same value of T(F-H) for the bus starting off at A and another one already at D, might not be quite right?

DeathArrow · on April 4, 2020

They said in the article they sample the time as frequently they deem necessary.

thelastbender12 · on April 4, 2020

Hmm, could you please expand? What I'm trying to get at is that the Bus at D and the bus at A expect to reach stop F at different times. So, the ETA between F and H is never quite the same for them, as the traffic conditions could change.

When google maps gives me an ETA, I assume they account for it (or their ML model does from the vast troves of past data).

DeathArrow · on April 4, 2020

I get they compare the prediction from maps with actual time for a few stops / busses and if the difference is big they make a new API request, making Google wealthier.

tomjuggler · on April 5, 2020

I used to use Google's geocoding API when it was free, really hate it when they make stuff and then when it gets popular start charging $$$

I still haven't found a viable free replacement, unfortunately. I'm just doing this for personal projects, so no big budget.

NetToolKit · on April 6, 2020

FYI, we at NetToolKit offer 1,000 free queries per day via our development key, and the cost beyond that is very affordable at $10 for 100,000 queries.

raz32dust · on April 4, 2020

How does this take traffic into account? It is clever and I enjoyed the read, but it is essentially doing some of the work GMaps is doing for you yourself, losing some accuracy in the process. That's probably a great trade-off in this case but not always.

giovannibonetti · on April 4, 2020

As mentioned in the article, they still get estimates from the Directions API, which takes traffic into account. However, instead of getting the ETA for each bus they get it for the path, apply some clever interpolation and refresh the base data from the API once a while

tech_dreamer · on April 4, 2020

Point to note : "It was our route design that essentially helped us decouple the bus location updates and travel time computation for each bus, thereby reducing any redundant calculations"

coopsmgoops · on April 4, 2020

I'd like to have known the error compared to the GM estimates. Some sort of Monte Carlo approach where you check your estimate against Google's.

AtomicGalaxy · on April 4, 2020

I wonder why none are talking about an automated GUI script instead of multiple keys? (I know both are against Google’s ToS)

AtomicGalaxy · on April 4, 2020

Is instead of using multiple keys, writing an AutoGUI script better? Of course I know both are against their ToS.

asjw · on April 4, 2020

Or switch to HERE that is much cheaper than Google

DeathArrow · on April 4, 2020

[flagged]

soinus · on April 4, 2020

I usually try to stay away, but I'll bite this time. I have no affiliation with Google, don't own their stock nor ever have, but these kind of responses bug me.

Working in data science I keep thinking about how I would solve a particular problem. Regarding real-time traffic: how would you solve something like this? Clearly one has to have a lot of anonymized data one can trust. Now the question is how to get these data. Google is building it's business on providing a useful service, like gmaps, for free and on using the data from the users of this service in aggregate to sell useful information to businesses that require these data for whatever they are working on, like the company in the article. Google is very transparent about how they use your data when you use their service and they seem to try to do their best to anonymize these data, e.g. you hardly ever get the access to even anonymous data outside of Google, but rather to the information computed from these data. So I see no objective reason for being so angry. What am I missing here? In the end, if you ever want a service like "ETA at destination" someone will need to acquire the data anyway, right?

zozbot234 · on April 4, 2020

These comments are so boring and predictable. Of course, Google makes their traffic data available to end-users too, in the Maps app and website. Yes, quite likely they're deriving it by aggregating driving data, but it's rather transparent as far as it goes. It's harder to say that about other Google services.

cookie_monsta · on April 4, 2020

It got replaced by "do the right thing" about five years ago, and that got scrapped entirely in 2018.

Sounds like you need some retraining.

triangleman · on April 4, 2020

"don't be evil" - there's a difference.

izacus · on April 4, 2020

> Like in all of Google's "free" apps, the consumer is the product. They gather your data through Waze and Google maps and sell it for nice money to paying customers.

Huh? This article is talking about a Google PAID product and how to avoid paying for such product.

Why are you complaining over Google here? Shouldn't you be complaining about the company who's looking for workarounds on how not to pay for a product and essentially breed the culture where selling consumers is the only viable business model?

DeathArrow · on April 4, 2020

It looks to me that Google hindered the competition by providing the data for free while they did lose money at it. Once they were sure nobody can afford to compete with them, they started charging money.

Have there been competition between traffic data providers, small companies would have choices.

mav3rick · on April 4, 2020

Blame them for something that isn't happening. Completely fair criticism.

yaktubi · on April 4, 2020

Switch to open street maps and reduce by 100%!

chiefalchemist · on April 4, 2020

I believe there's another way to look at it. More from the consumer's POV and less as pure technology per se.

First, is the fuild ETA absolutely necessary? Surely the bus has a schedule. Is it on time, or not. If not how late might it be? That's not the same as ETA. Certainly, they've collected and keep collecting enough data to infer such things. That is, 2 mins late to Stop B translates to what at Stop F and stop N.

The consumer doesn't need ETA per se. They need to know if they're going to be on time to their destination or not.

Like putting mirror next to an elevator to shorten the perceived wait; there might be other opportunities to solve this problem.

If you're interested in such things, this book from a year or so ago was intriguing.

Rory Sutherland

Alchemy: The Dark Art and Curious Science of Creating Magic in Brands, Business, and Life

4.6 out of 5 stars

https://www.amazon.com/Alchemy-Curious-Science-Creating-Busi...

map08 · on April 4, 2020

We at nb.ai help companies with similar get off of Google solutions, and are able to achieve similar reductions in API calls, along with typically, an IMPROVEMENT in accuracy. We find building good machine learning, traffic and routing models, along with OSRM and GraphHopper goes really far