Layers

From this HN thread:

So you have LLM-based English prompts as an interop layer to Python + PySpark, which is itself an interop layer onto the Spark core. Also, the generated Spark SQL strings inside the DataFrame API have their own little compiler into Spark operations.

When Databricks wrote PySpark, it was because many programmers knew Python but weren't willing to learn Scala just to use Spark. Now, they are offering a way for programmers to not bother learning the PySpark APIs, and leverage the interop layers all the way down, starting from English prompts.

This makes perfect sense when you zoom out and think about what their goal is -- to get your data workflows running on their cluster runtime. But it does make a programmer like me -- who developed a lot of systems while Spark was growing up -- wonder just how many layers future programmers will be forced to debug through when things go wrong.

Debugging PySpark code is hard enough, even when you know Python, the PySpark APIs, and the underlying Spark core architecture well. But if all the PySpark code I had ever written had started from English prompts, it might make debugging those inevitable job crashes even more bewildering.

I haven't, in this description, mentioned the "usual" programming layers we have to contend with, like Python's interpreter, the JVM, underlying operating system, cloud APIs, and so on.

If I were to take a guess, programmers of the future are going to need more help debugging across programming language abstractions, system abstraction layers, and various code-data boundaries than they currently make do with.