10. How Else Can Perl Help Hive?

Remember that Hadoop can store data as text files. (It often will, if you're sending it web log files.) Perl is good at manipulating those. Hive has hooks to stream data through a program in order to preprocess it.

    ADD FILE ${env:HOME}/lib/Utils/Hive.pm;
    ADD FILE ${env:HOME}/bin/file.pl;

    SELECT
        TRANSFORM (infield1, infield2, ...)
        USING "file.pl"
        AS (outfield1, outfield2, ...)
    FROM table1;
    

The input and output row lists may be of different lengths. The schemas need not be the same in any way.

This allows you to extend Hive's fairly limited set of built-in functions using a language you know and love, or even other, less lovely languages.

We haven't used this capability at Nami yet, but we may in future.

This is generally similar to Hadoop's overall incorporation of streaming, but this talk is about Perl and Hive, so FDSN.