Hive and Hadoop work much better with small numbers of large files than with large numbers of small files.
Partitions are your friends. If you frequently query data along one specific partitioning scheme (by day, by hour, by account, etc.), then load your data files into subdirectories accordingly. If your data volume justifies the use of Hive at all, you'll be glad you did.
If your systems are CPU-heavy, compress your files in HDFS. If your systems are disk-heavy, don't.
Try to avoid loading your data into HDFS directly from a node that is functioning as a DataNode, itself. Doing so will result in the first replication block always being stored on the local DataNode. This will load that node unevenly and cause a dramatic performance decrease, especially if it is also your NameNode.