Workspaces

Workspaces is similar to a folder structure in windows that allow you to organize all your work on Databricks. It allows you to save notebooks and libraries and share them across. Workspaces are not connected to data and are not used to store data but are used to work on data.

Notebooks

Notebooks are a set of cells that allow you to execute commands. You may write code in the language of your choice among: Python, Scala, SQL, R or Markdown, in a cell. You can even write code in multiple languages in a single notebook by including %[language name] at the top of each cell. For instance %sql. In order to execute commands, Notebooks need to be connected to a cluster however they are not permanently tied to it allowing notebooks to be shared. Moreover, Notebooks can be scheduled as jobs to run a data pipeline, update a machine learning model or update a dashboard.

Clusters

Clusters are groups of nodes, acting as single, that allows you to execute code from notebooks or libraries on set of data. That may be raw data located on cloud storage or structured data uploaded as a table to the cluster. They provide access controls to control the access to each cluster.

Libraries

Libraries are custom written packaged packages that can be uploaded manually or can be installed via package management utilities like maven or pypi. These may be written Scala or Java jars, Python eggs or custom packages.

Dashboards

Dashboards are used to display the output of cells that can be created from notebooks.

Tables

Tables are structured data that is used for analysis. Tables can be stored at multiple location such as cloud storage, on the cluster, or can be cached in memory.

Jobs

Jobs are the tool to schedule execution on an already existing cluster or a cluster of its own. These can be notebooks as well as jars or Python scripts.

Apps

Apps are third party integrations with the Databricks including Tableau.

To read about Azure Databricks please read article “Azure Databricks Overview

Advertisements