Spark – Data Processing

11 Min to complete

What are some key metrics to check when processing data using Spark? Also links with more info and examples.
It’s highly recommended to set up Scylla monitoring and use its dashboard or advisor. Some things to watch are the load, load PER SHARD, latencies (big partitions?), and coordination (switch to shard aware?).
Also, use spark executors and stages view to understand how well are tasks distributed and enable the spark history server.

 

Transcript
fa-angle-up