also: I was under the impression single precision was fine for most deep learning applications and double precision doesen't even have good support in most libraries but I guess it depends on the use case.
FLOP-wise that makes sense. But for deep learning, the big deal is in the 12Gb GPU-local memory, which has enormous bandwidth (and can store more of your dataset / parameters at once). The largest concern with GPU processing is keeping the GPU adequately fed with data - and avoiding round-trips of blobs of data with the CPU helps a lot.
Oh I agree and the article talks plenty about that topic as well. For me the temptation with Titan X is primarily the "laziness" of a) not manualy having to try to parallelize AWS units and b) not needing to try to squeeze in models into 4-6gb.