9. So How Can Hive Help?

The most dimensionally rich reports were generally scheduled reports, since it didn't make sense to present such voluminous data in a web UI.

Our initial solution was therefore bifurcated: We still cubes a limited set of dimensions for UI presentation purposes, while sending full log files to Hadoop for scheduled reports. A batched report doesn't need to have a sub-second response time.

This meant we could actually expand the dimensions available in our scheduled reports, since Hadoop gets the raw log files.

It also meant we could perform free-form analytics on our log data in future, using Hive as an exploratory tool. Maybe we could ask even better questions of our data in future, which could in turn suggest better scheduled reports to present to our customers.

If our traffic volume increased to the point where even our cubed data with limited dimensions are taking too long to load, we could have gone to a model where we updated the UI-facing relational tables from our Hive database.

Too bad the company died and none of that happened. But, uh, it was a good idea.