Well, that's more clever than a company I did due diligence for.
Their strategy was to have a pool of API keys attached to new accounts that would take advantage of the Google Maps API free tier, and monitor its usage. As the free tier usage would run out, the system would roll over to a new API key automatically.
Wrote that one up in big red marker in my report...
Genuine question: wouldn't this be considered write fraud?
Last time I asked this question [0] on a different story [1], the responses I got were that it definitely is wire fraud, but this is so mind-blowing that I would like to ask again to confirm.
A snippet from the article:
A federal court in Washington, DC, has ruled that violating a website's terms of service isn't a crime under the Computer Fraud and Abuse Act, America's primary anti-hacking law. The lawsuit was initiated by a group of academics and journalists with the support of the American Civil Liberties Union.
Sometimes my simplest comments get misunderstood the greatest; if someone were to want to take this to a criminal court it would be idiotic to go for violating terms of service as the reason, especially for the link you provide, but as I said in my post someone might want to try this as theft of services.
Did you think my use of the word services was somehow related to terms thereof, because no. So to quote wikipedia because it was the first that came up when I googled "Theft of services is the legal term for a crime which is committed when a person obtains valuable services — as opposed to goods — by deception, force, threat or other unlawful means, i.e., without lawfully compensating the provider for these services", do you see how someone might argue that changing out the api key could be seen as a form of deception?
I am not saying that I would think it right that someone bring this to criminal court (I figured I better put that out there as even the simplest of comments can be misunderstood, so who knows what several paragraphs together might lead to), I am not saying that they would even win, I am not saying anyone who did it would be doing so for the purest of motives. But I am saying it does seem something like theft of services by using deception.
on edit: the theft and new api keys refers several ancestors back to this anecdote "Their strategy was to have a pool of API keys attached to new accounts that would take advantage of the Google Maps API free tier, and monitor its usage. As the free tier usage would run out, the system would roll over to a new API key automatically."
What contract? You don't sign a contract when you create a Google account (which is basically what you need to create an API key with access to the free tier)
These terms[0] are, in general, legally binding (especially as you're a business signing them and not just a person), and it's obvious bad-faith to do this, making any sort of lawsuit hard to fight. While they most likely won't actually take you to court over this, you risk suspension of your main GCP account.
> 3.3 Restrictions.
> Customer will not, and will not allow third parties under its control to: ... (d) create multiple Applications, Accounts, or Projects to simulate or act as a single Application, Account, or Project (respectively) or otherwise access the Services in a manner intended to avoid incurring Fees or exceed usage limits or quotas;
It's not clever, it's just much more simpler. Using linear interpolation for the time between 2 stops will have low accuracy because for the particular situation time might not be linear with position and distance. Also traffic incidents might happen.
Going with a pool of free keys will be much more dependable, even if somehow more complicated to manage and easier to break.
This company was shockingly deep into their lifecycle to still be using this approach. And yeah, they'd cycle IPs as needed too. I think the thought was that Google isn't doing a ton of fraud analysis for this particular modality of fraud. Still though...
You don't have many options if you need high accuracy: you have to pay a lot or try to trick Google which might be both immoral and against the law and for sure is tricky, hard to maintain and you can't count on it in the long run.
Let's hope there will be alternatives to Google provided traffic data. For now they seemed to monopolized it by offering it for free while losing money to discourage competition.
> You don't have many options if you need high accuracy: you have to pay a lot
What happened to actually trying to solve problems with programming?
Interpolation is one solution. Caching is another. Temporal analysis. Put everything together.
You don't need to query the magic Google box for every small update you make (and they might get that info from the transit providers, which given my experience are not that great sometimes).
Linear interpolation gets you 9X% of the way there for cheap, though. You can then come up with strategies for attacking the last 10% at somewhat higher cost instead of committing the entire stack to the high-cost strategy.
For example, if you ascertain that a bus is more than 2 minutes late, switch to polling that bus more often until it makes it to its next stop. And then switch back to linear interpolation once it gets to that stop. But you'll pay a little bit more for the added accuracy.
Morale of the story: if you want to get high-resolution real-time data, you (and your customers) have to pay for it, as that shit ain't easy.
> So, we ensure a maximum of 20 meters between two location coordinates to improve accuracy of information.
Hardly "low accuracy".
The key change is modelling the problem as one of routes rather than journeys. Since a route can be "polled" at a certain resolution & used regardless of the number of journeys on it.
> This approach made the API calls independent of the number of vehicles and dependent only on our stops, which helped us in scaling up our fleet with no additional cost.
From what I understand (I have no insider info/hints or anything related), this sort of thing was precisely why Google radically changed the pricing structure to make the free tier much smaller for Maps API recently.
I.e. Google knew there was rampant abuse by people like this (this example is not the only thing like this I have heard of...) so Google fixed the glitch and in the process ruined it for all the people genuinely using the service's free tier.
This is why we can't have nice things :) I guess we are lucky that Google didn't decide to just cut their losses and close the whole shebang down - that would be a shame as Google maps is really useful IMO.
I discovered that our preproduction servers use the free tier while trying to work on some CI/CD issues. Normally we never get anywhere near the limit, unless someone (say, me) is trying to work on the test suite during a time of day when a lot of pushes are happening. Had a few enforced breaks there for a little while.
Needless to say, that's very much against the TOS and it's a matter of time until they get blocked. Surely at some point it'd be easier and cheaper just to pony up for a license?
The whole chain of replies by this user in this thread sounds like the obnoxious vegan stereotype of the tech world.
How do you know someone is not using google products out of some moral grandstanding principle? Dont worry, they will let you know, even if it is just some thread that is only tangentially related to the topic (and their username will likely tell you as well).
I did the same years ago. We were providing realtime suburb data for a fleet of trains. Each train received a GPS coordinate once per minute, we took this and displayed the suburb. So 1440 updates per day per train. For the fleet it was going to be over $100 a day in API costs.
We were going to not display suburb data because of cost. In the end I found a creative commons placename database (geonames.org). For placenames with >500 people it's ~10MB of data and that covers the entire planet (surprisingly small). I then wrote a KD-Tree based library to look it up the nearest point in this table extremely efficiently (log(N) time).
We were going to not display suburb data because of cost. In the end I found a creative commons placename database (geonames.org). For placenames with >500 people it's ~10MB of data and that covers the entire planet (surprisingly small). I then wrote a KD-Tree based library to look it up the nearest point in this table extremely efficiently (log(N) time).
At a previous employer I tried to convince my managers to let me do this for months. They always balked. Their loss.
PostGIS would be worth it if there was a lot more to do on that project and full respect to that project. I remember looking at it and it felt like it was a sledgehammer for that particular job. You still had to write up a lot of code to import the dataset you had and then extract it again. Not to mention maintenance since you now have a whole dependency chain installed on your server. This was a single engineer project.
The above solution is a copy paste-able set of classes with zero dependencies that will output the nearest place. Sometimes a single purpose solution is perfect and i'm really not kidding when i say i haven't maintained or even looked at it in over 5 years yet it's still running fine as part of a larger application.
You still have the same problem. The input is a GPS coordinate. You can just return the same value as the last X minutes but that obviously sucks since you could be X minutes off in the real time suburb of a fast moving train. You could find the nearest to the cache but then you might as well just have a list of places you're finding the nearest again. Which is what i did.
I didn't realize the train coordinates were what the API was for. Were these public trains? How does google have their gps coordinates and no one else does?
Right... but that would mean that what you are querying google for doesn't change. The coordinates of the trains would even only be along their route. Once you have done a reverse lookup on enough points along the route, you should have a pretty dense idea of where the neighborhoods change, right?
I am not sure if I'm missing some big, but it really seems like there wouldn't be an ongoing need for querying google's APIs in this case. Finding the boundaries of the neighborhoods and especially labeling stops should be easy enough to do if it would save so much money.
If you track the vehicles every day and collect the location data, you can easily augment the Open Source Routing Machine to give you traffic accounted estimates.[0] Combined with some Kalman filters you'd get almost perfect estimates when live.
Of course, this is for a use case where you have similar routes every day, this allows you to really tune the Kalman filters.
My understanding is that Google's does real time traffic reporting so well because it's constantly pinging the location of android devices. There have been several write ups on how to spoof it, but historical data is never going to be a match for a feed like that.
I find it insane that the fact that Google is constantly harvesting device locations is blatantly obvious from things like this, but yet if you make that claim in other contexts people will be extremely skeptical and demand extensive proof.
Those phones all had Google Maps app open. That Google is harvesting device locations while you have a Google app open showing your GPS location should not be a surprise.
What would be a surprise is if Google forced on your GPS to gather your location in other contexts.
It is not a surprise, it is already happening. You can turn it off, but AFAIK if you don't it collects this information even if the Google maps application is not open.
From my personal experience, I have an Iphone and it asks me about once a week about Google Maps using my location in the background and if that is okay. I always tell it no. I didn't know I could turn it off directly in the app, thank you for the information.
Google absolutely harvests phone GPS data to enhance things like AGPS (mapping of wifi MAC addresses, or celll towers to location coordinates), and it does this regardless of the location setting in the system UI.
GPS is definitely the best way of ascertaining a device's location but far from the only one. Google's own location API lets you access fine location (GPS) or coarse (triangulating via cell towers).
After public outcry and legal action Google stopped driveby data harvesting by Street View vehicles but it remains unclear if they stopped collecting SSIDs and MAC addresses (I guess if you squint hard enough you could say that these are publicly available data points).
It doesn't take too much tin foil to think that android may be phoning home when it detects an SSID, the physical location of which is already known.
And Apple too - I am constantly seeing this (traffic/congestion flags) on side and residential streets with no traffic monitoring devices in my Apple Maps.
It sounds like you could use Google's live traffic info to augment your own predictions. You should be fine with just a few API calls. This would be pretty cheap - perhaps even within the free tier.
Which part of the ToS (that isn't getting violated already)?
The real problem with this approach is that you will never know when real time is conflicting with historical unless you're calling the API constantly anyway
> (c) No Creating Content From Google Maps Content. Customer will not create content based on Google Maps Content.
> (d) No Re-Creating Google Products or Features. Customer will not use the Services to create a product or service with features that are substantially similar to or that re-create the features of another Google product or service.
For the use case described in the article it sounds just fine. The content part is a big vague. If read very broadly it would be super prohibitive. Perhaps it is?
If they have their own fleet they can generate their own (historic) traffic data e.g. via Map Matching and use an open source routing engine like GraphHopper with OpenStreetMap data. (disclaimer: I'm one of the developers of GraphHopper.)
PSA: Open Source Routing Machine (OSRM) was largely abandoned by its maintainers. Several of us are working to reboot it, so if you enjoy map data and/or graph theory and have C++ skills, this would be a great project to work on.
OSRM is astonishingly fast (it uses the Contraction Hierarchies routing algorithm, or alternatively Multi-Level Dijkstra). This is compelling for draggable routing UIs, and for large matrix calculations used in the Vehicle Routing Problem (Travelling Salesman). It also makes it easy to customise your routing weightings through Lua 'profile' scripts.
The principal downside is that the routing graph takes a lot of time and memory to prepare; runtime RAM usage is also high, though not so much. I think there's some potential for reducing its memory footprint.
Valhalla builds on the older A* algorithm, so it's not so fast (or memory-hungry), though it does make some use of hierarchies to shorten query time. Graphhopper is another, featureful routing engine designed for use with OSM data (written in Java).
I'm a big fan of OpenTripPlanner [0]. I mainly use it for accurate isochrones of public transit networks (super handy for figuring out where to live) but it has support for foot, bike and car routes as well. There was some PoC code to support traffic data [1] but it was removed [2], I believe because nobody was interested in maintaining it. If someone wants to add it back and maintain it I doubt they'd object.
+1, I've made my own geocoding/geodecoding on openstreetmap data, it worked 10x faster than google and with our level of usage, server essentially pays for itself just for 20% of resources. With our level of usage, we would burn through free tier in several hours.
Unfortunately no, this was entirely custom closed source code and customized for one specific use. But you can try making your own map server, when you have data in postgis and search a little, geodecoding is rather easy. Just find geometry with specific tags near your point, then select nearest street number, street name, administrative zone names. I've tested first on one country (low space requirements), importing whole europe can take several days even for multicore servers with ssd.
I think real time traffic awareness is exactly what the poster wanted, but yeah it seems like people who don’t need that would be better off using OSS where possible.
If they run a lot of buses they have some traffic awareness built in to the system. I dare say not as good as Google but a knowing that a bus that is already on a segment that another bus is due to enter is delayed surely allows you to adjust the predicted arrival time of both buses.
(a) No Scraping. Customer will not export, extract, or otherwise scrape Google Maps Content for use outside the Services. For example, Customer will not: (i) pre-fetch, index, store, reshare, or rehost Google Maps Content outside the services; (ii) bulk download Google Maps tiles, Street View images, geocodes, directions, distance matrix results, roads information, places information, elevation values, and time zone details; (iii) copy and save business names, addresses, or user reviews; or (iv) use Google Maps Content with text-to-speech services.
(b) No Caching. Customer will not cache Google Maps Content except as expressly permitted under the Maps Service Specific Terms.
"Customer can temporarily cache latitude (lat) and longitude (lng) values from the Directions API for up to 30 consecutive calendar days, after which Customer must delete the cached latitude and longitude values."
The conclusion I get from this is not plan anything on "free" features promoted by SaaS providers because that can hit you hard in the future. Either plan based on something payd and covered by a contract or try to come up with another solution, maybe an in-house solution, maybe a free solution with a failover based on another provider.
At a cloud storage company i worked for we had the most cost-intensive part of our infrastructure, the object storage ready to be switched over from Azure to AWS, just in case.
Random thought but I think it’s funny how about 40 years ago, people would write about their tricks to save program memory usage. Now, people are talking about how to shave costs of doing API calls.
Meanwhile, the old maxim "All programming is ultimately an exercise in caching" seems to have been forgotten entirely. This whole thread is just surreal.
Yeah, it is amusing. Given the small set of fixed routes, this problem is just crying out for "local" caching of the expensive data. I guess what is old is new again...
It would certainly be interesting if they had compared the quality of Mapbox’s traffic data to Google’s. Perhaps the authors tried this and the data wasn’t as good? But it’s not mentioned.
OpenStreetMap itself is just the base map, not a provider of traffic data or routing.
Indeed, and it will be very hard getting anywhere near as good as Google's without owning Android which nobody else could possibly do.
I still think location data collection for the public good should be run by someone who is not Google / Facebook / some other company that doesn't already track everything about you. I'd be happy to send my real-time location to a company like Mozilla or some mapping company (just in the tiny netherlands I know of AND, Geodan, OsmAnd... I'd be fine if any of them organize this).
We also have some open data via the government, based on detection loops in the road, but if I remember correctly it's only (or mainly) on highways.
Since they have actual vehicles on the ground, they could collect data on actual travel times under various traffic conditions and apply a bit of ML to predict how long it's going to take under current conditions. No need for any Google API there.
But now that their Google bill is only ~$50/day, it might not be worth building their own prediction system.
I guess you are reacting to private smug conversations with people who eulogize ML but are clueless about what they are talking about. Usually the amount of conviction they add to their comments and advice is inversely proportional to their knowledge. I am tired of those too, to the point of being allergic in spite of being an ML practitioner myself.
The parent comment could have been worded better but I think you will agree that there is scope for using ML or statistical modeling of some kind to optimize the use case both in API cost and accuracy. Making some minimal number of API calls would very likely be part of such a system.
Taking this approach to its logical conclusion, one could sample Google to infer a congestion heat map for all the main routes in an area, and calculate the transit time estimates directly from the inferred heat map.
IME congestion is also often modal, there's probably reductions in sampling effort you could make by noticing the patterns in how commuters route, whether school is on break, and what the latest roadworks are.
So could increasing pricing under circumstances help to make your customers more creative, more efficient and more ecologically sustainable by making your costumers consume less of your computing power and bandwidth?
Why not save the route, decode the polygon into its respective coordinates and use that to find overlapping roadparts?
This would work all of Google Maps, also for bikes or walking.
That's a pretty cynical interpretation. To me it seems more like they took an easy engineering approach when the cost per API call was low, then when the price went up they re-designed their software to use API calls more frugally by using some domain specific knowledge (that google couldn't know about so wouldn't be able to build in to their pricing).
The data for traffic is usually needed in real time, not just dumping historical data in one call and sitting on it. With that in mind, the most obvious pricing policy seems to be to either charge per API call or to have subscription tiers where you have a cap on maximum number of API calls per hour or per day or whatever other time interval they choose.
Sorry, I meant the price it sets for the api calls. The article calls out $10/1000 so the current price is a penny per call. I’m interested in how they set that rate.
We also had an increased Maps bill after the new pricing model for each map load on a detail page for a specific real estate object. Our map is now disabled by default and can be loaded with a “show map” button on a static cached map image.
Only if the goal is to stop getting people to use it. If the pricing of the API was meant to be reflective of the cost of running and to encourage customers to use it more efficiently, then increasing the price beyond what they have done is maybe not a good idea.
I wonder if they simply tried calling Google and negotiating a volume based discount. I realize Google is notoriously impersonal, but I always prefer if you can solve a problem without code.
I worked with a major automotive company that was upgrading its head units to be more user friendly. Their #1 customer request at the time was to have it use Google Maps, so they tried to negotiate a discount with Google. I can't share the discount amount, but it was laughably small and the company would have been shelling out millions of dollars a month to Google for the integration. Needless to say, they stuck with a different mapping provider.
I was actually kind of surprised that my spouse's new company car did not come with either Google Maps or OpenStreetMap but instead credits the map data to some mapping company from eastern Europe if I remember correctly -- the data looks like whitelabeled TomTom or HERE data.
I just wish they would either augment their data with OSM or augment OSM with their data: now, with proprietary data, the data was already old when we got the car, and the fixes I contributed to OSM are obviously not in there.
Instead of contracting some middle man for inferior map data, they could have used completely free (as in beer) data for the regions where OSM is more up to date and complete than any commercial provider (empirically, this is most of the land mass on earth plus countries with a lot of mapping enthusiasts and/or free government data, like Germany and the Netherlands).
For what it's worth, traditional (i.e. not Tesla) OEMs generally had a "no open source anything" policy for a long time, for a variety of reasons I don't really agree with. That's changing a little bit these days, but not a whole lot.
I've build my own tile server map with mapbox gl spec, thx to them, with simple osm features, data about transportation is fair enough to get some basic calculation features, i mean it's a good start, Google maps pricing is insane, if u plan to support thousands of users like us this gonna kill your profitability, we have no money to invest for that part and imo u should be vendor less if u want to develop ur activity til u buisness model works at least
Hey, just a heads up: the "thx", "u", and spelling mistakes make your comment less attractive to read. I'm not saying your English has to be flawless, but things like using "u" instead of "you" is a choice.
Does the API not give timing info for parts of the route like the website does? If it does, then they could just query for first stop to last stop and infer all subtrips.
Also given they seem to own the buses, they could give the drivers a rooted Android phone and get the data by reading the memory (or network traffic after reading the TLS key from memory) of their running instance of Google Maps that they are using to drive, which should be undetectable by Google.
While that seems like a good start it seems like you would want to supplement that approach with the GTFS and GTFS-realtime feeds from transit agencies where available.
Having worked for a company that provided transit software I am well aware that GTFS-realtime data isn’t always accurate (especially the ETAs). But it seems like combining their ETA estimate with yours and the scheduled arrival time would be a reasonable approach. Or if you just started building a model of the average speeds of the bus on each link using the GTFS-r speed and location data you could probably build up a pretty good historical model of the traffic flow between 2 stops at any given point during the day.
Yes, I am aware that scheduled times in transit can be meaningless. There are a lot of factors that contribute to that from traffic to poor schedule planning. They can however be a useful baseline estimate.
I like the assumption made while the bus is between two stops but that could be very inaccurate in a geography like Mumbai. I have seen your posters at one of the metro stations and assume that you operate in the city. I feel the biggest choke point in the city is the number of signals - around the metro stations - and hence the uncertainties around timing before and after crossing the signal. Traveling in a cab, this is where Google maps timing fails too by about 5-10 minutes. Not a big deal as a customer, just a mildly interesting titbit in my daily travel (before Corona)
One thing to look at given you would have everyday data is to see if there is a pattern (if not already did it). Then you can suggest commuters to take the bus now and save commute time v taking after 15 mins.
Why? That would entirely depend on how Google implements this. Ideally the result would match up exactly. If not you can certainly approximate the error introduced by this method well enough. They already needed to have something in place anyways to account for the bus actually stopping at all the stops. The API doesn't know about this in their old approach.
This is a pretty nifty solution! To nitpick, I'm curious how much the ETA between successive stops fluctuates.
For example, if you have 2 buses on the route `A-B-C-D-E-F-G-H`, using the same value of T(F-H) for the bus starting off at A and another one already at D, might not be quite right?
Hmm, could you please expand? What I'm trying to get at is that the Bus at D and the bus at A expect to reach stop F at different times. So, the ETA between F and H is never quite the same for them, as the traffic conditions could change.
When google maps gives me an ETA, I assume they account for it (or their ML model does from the vast troves of past data).
I get they compare the prediction from maps with actual time for a few stops / busses and if the difference is big they make a new API request, making Google wealthier.
FYI, we at NetToolKit offer 1,000 free queries per day via our development key, and the cost beyond that is very affordable at $10 for 100,000 queries.
How does this take traffic into account? It is clever and I enjoyed the read, but it is essentially doing some of the work GMaps is doing for you yourself, losing some accuracy in the process. That's probably a great trade-off in this case but not always.
As mentioned in the article, they still get estimates from the Directions API, which takes traffic into account. However, instead of getting the ETA for each bus they get it for the path, apply some clever interpolation and refresh the base data from the API once a while
Point to note : "It was our route design that essentially helped us decouple the bus location updates and travel time computation for each bus, thereby reducing any redundant calculations"
I usually try to stay away, but I'll bite this time. I have no affiliation with Google, don't own their stock nor ever have, but these kind of responses bug me.
Working in data science I keep thinking about how I would solve a particular problem. Regarding real-time traffic: how would you solve something like this? Clearly one has to have a lot of anonymized data one can trust. Now the question is how to get these data. Google is building it's business on providing a useful service, like gmaps, for free and on using the data from the users of this service in aggregate to sell useful information to businesses that require these data for whatever they are working on, like the company in the article. Google is very transparent about how they use your data when you use their service and they seem to try to do their best to anonymize these data, e.g. you hardly ever get the access to even anonymous data outside of Google, but rather to the information computed from these data. So I see no objective reason for being so angry. What am I missing here? In the end, if you ever want a service like "ETA at destination" someone will need to acquire the data anyway, right?
These comments are so boring and predictable. Of course, Google makes their traffic data available to end-users too, in the Maps app and website. Yes, quite likely they're deriving it by aggregating driving data, but it's rather transparent as far as it goes. It's harder to say that about other Google services.
> Like in all of Google's "free" apps, the consumer is the product. They gather your data through Waze and Google maps and sell it for nice money to paying customers.
Huh? This article is talking about a Google PAID product and how to avoid paying for such product.
Why are you complaining over Google here? Shouldn't you be complaining about the company who's looking for workarounds on how not to pay for a product and essentially breed the culture where selling consumers is the only viable business model?
It looks to me that Google hindered the competition by providing the data for free while they did lose money at it. Once they were sure nobody can afford to compete with them, they started charging money.
Have there been competition between traffic data providers, small companies would have choices.
I believe there's another way to look at it. More from the consumer's POV and less as pure technology per se.
First, is the fuild ETA absolutely necessary? Surely the bus has a schedule. Is it on time, or not. If not how late might it be? That's not the same as ETA. Certainly, they've collected and keep collecting enough data to infer such things. That is, 2 mins late to Stop B translates to what at Stop F and stop N.
The consumer doesn't need ETA per se. They need to know if they're going to be on time to their destination or not.
Like putting mirror next to an elevator to shorten the perceived wait; there might be other opportunities to solve this problem.
If you're interested in such things, this book from a year or so ago was intriguing.
Rory Sutherland
Alchemy: The Dark Art and Curious Science of Creating Magic in Brands, Business, and Life
We at nb.ai help companies with similar get off of Google solutions, and are able to achieve similar reductions in API calls, along with typically, an IMPROVEMENT in accuracy. We find building good machine learning, traffic and routing models, along with OSRM and GraphHopper goes really far
Their strategy was to have a pool of API keys attached to new accounts that would take advantage of the Google Maps API free tier, and monitor its usage. As the free tier usage would run out, the system would roll over to a new API key automatically.
Wrote that one up in big red marker in my report...