Five Great Blog Posts on Dealing with Hive for Hadoop SQL: Date and Time, Numeric, String, Conditional, and Collection Functions

I currently work using SQL for Hive for Hadoop. Hive SQL is a little eccentric. If you have worked in other forms of SQL, it takes some getting used to.

Below are five blog posts from folkstalk.com which have helped me immensely.

  1. Date and Time Functions
  2. Numeric functions
  3. String functions
  4. Conditional functions
  5. Collection functions

Why are these posts any good?

  1. Date and Time Functions – Dates and Times in Hadoop are stored as strings in the format ‘YYYY-MM-DD HH:MM:SS.T’ (T is tenth of a second), not a date/time datatype or an integer. It gives you a good sense of how to deal with this.
  2. Numeric functions – It’s pretty much the same as other forms of SQL. Given how nerve rattling Hive SQL can be at first, it’s reassuring to know that some things are still the same.
  3. String functions – Check out the CONCAT function. It’s an actual function in this case and not a symbol like ‘&’.
  4. Conditional functions – Check out the COALESCE function. This one gets used quite a bit when one is trying to clean data by combining multiple fields. All it does is return the first non NULL value from the fields listed. It’s surprisingly useful.
  5. Collection functions – Check out the CAST function. This is how you change a value from one type to another. For example, to change the field ‘beers_drunk’ from a string to an integer would be CAST(beers_drunk AS int)