Essentially the same Spark caveats of lazy evaluation and immutability of caches: neither are a big deal on small datasets, but making a mistake on either on a large dataset can result in a lot of lost time or confusion.
Then there are the massive shuffle read/writes that result in 50GB i/o which are not great for SSDs.
Out of curiosity, what kinks did you find?