Zeppelin

Zeppelin is a notebook system, somewhat similar to Jupyter. If you want to use Scala it's probably better. It make also have a wider variety of graphics options for Python. WARNING: Development for Zeppelin seems to have slowed. It's not clear whether support is going to continue. While it has advantages over Jupyter, you might still want to start by looking at Jupyter.

The URL is https://zeppelin.cs.rutgers.edu.

NOTE: To login, use your computer science username and password. Most computer science web applications use a University login, but that won't work for Zeppelin.

You may also want to look at the Zeppelin project's own documentation.

Zeppelin is a "notebook," a web interface that makes it easier to use Spark-related technologies. It supports Scala, Python, and R. It could be used for these languages even if you're not interested in Spark, of course, if you prefer the Notebook interface rather than a command line. If you're not interested in Spark, please use the %python.ipython for Python, %spark for Scala, and %r.ir for R.

We actually have two notebooks, Zeppelin and Jupyterhub. Zeppelin is newer, and potentially might have issues, but you may prefer its design, particularly its support for graphical output. Make sure you look at the section below on "If things go wrong." You should generally be able to recover from problems by restarting your intepreter.

Zeppelin is version 0.10.1. It uses Scala 2.12, Python 3.9, and Spark 3.2.1 R is 3.6.3, because version 4 doesn't currently work with Zeppelin. Java is 1.8, but that should be invisible, because there isn't a Java notebook type. You'd use Scala instead.

After you've logged into https://zeppelin.cs.rutgers.edu, you'll see a list of notesbooks in the left column, and a list of help documents in the second column. The notebooks include both your own and notebooks from others that you are allowed to access. For other people's notebooks, you may be able to look at them, but not change them on run programs.

We suggest that you start by looking at some of the tutorials. If you want to run the examples in the tutorials, you'll have to make your own copy. At the top, just to the right of the title, you'll see a set of icons. The 5th icon is "duplicate." It will make a copy of the current document.

On the main page, right above the list of current notebooks, you'll see a "create new notebook" link.

You can also import files saved from another Zeppelin instances, or .ipynb files from a Jupyter notebook. Zeppelin also allows you to export notebooks into Jupyter's format. HOWEVER, the import function doesn't work for me on Chrome. Firebox works.

I do not recommend using Safari on the Macintosh. I haven't tried it extensively on the iPhone, but that version seems better.

WARNING: If you login to Zeppelin but don't run anything for 8 hours, your Kerberos ticket will expire. Weird failures will occur. Simply logout and login.

If your Kerberos credentais have expired, you will get an error the next time you try to run a paragraph. Logout and login. It should not expire as long as you have an active interpreter.

THe Zeppelin project has excellent documentation, and the Tutorials give lots of examples, so we're not going to provide documentation on the functionality here, just specifics for our copy.

There are Tutorials for Python, R, and Spark (scala, python, and R). These are set so you can't run the code. Either copy and paste the cells into your own notebook, or make your own copy of the notebook using the Duplicate icon (the 5th from the left next to the title). Not all features in the tutorials are actually present.

Notebook Contents

There's only one kind of notebook. You can use any of the languages of facilities from any of them. However when you create a notebook you specify a default paragraph type, so that creates some differences. A notebook is made up of "paragraphs," regions of the screen that use a specific language. An annotation at the top of each paragraph shows what language it uses. E.g. "%spark" says it's Spark code written in Scala. You don't need the annotation for paragraphs using the notebook's default type.

WARNING: This is a default Zeppelin setup. It comes with intepreters for several languages that we don't use. We expect it to be used for Spark with Scala, Spark with python, standalone python, and possibly R (though R hasn't been tested very well).

Here are the types of paragraph:

  • %spark. Spark is version 3.2.1, Python is 3.9. %spark is Scala, %spark.pyspark is python. %spark.r and %spark.ir is R 3.6.3 %spark.pyspark has the following set up for you:
  • %python. Python 3.9.5, without Spark. To use Spark with python, use a Spark notebook with a %pyspark cell. It has some of the Zeppelin extensions, but you may have to use z.z to get access to data shared with other interpreters such as Angular. Use this if you want to use Python without Spark.
  • %r and %ir. R without Spark. To use Spark, use %spark.ir
  • %sh - commands in this paragraph are sent to your default shell
  • %md - intended for documentation and explanations. Uses Markdown
  • %angular - intended for grpahics. Angular (a Javascript-based system) code can be used to display data. The data is stored in the global variable z, which is a ZeppelinContext. z is shared by angular and the %spark variants. You'll need to put the data to be displayed into z using z.put(attribute, value), where attribute is a string. When I tried this with %spark.r, I initially got a null pointer error. I had to first do a z.put in Python or Scala to create the structure. Then z.put and z.get worked in R.
  • %kotlin. This is a Java-like language. Although it is installed, we have done virtually no testing with it.

    For examples, see the Zeppelin Tutorial notebook and the other introductory notebooks that you'll find when you login. We've run the examples, so the output you see was generated on this instance.

    Examples using the following tools don't work because we don't have the corresponding software installed:

    COMPLETION: In ipython, spark(scala), and ir, you can type part of a variable name and use control-period or tab. That will show you all of the possibilities beginning with what you typed. In some cases (dependinging upon context) you can hit control-period right after ., to show the available properties and methods. This functionality appears not to be in any of the documentation, so make sure to tell your students.

    There are many different types of interpreters available. You can see them in Zeppelin's documentation. If a class needs an interpreter we don't have, please contact help@cs.rutgers.edu. We're willing to install additional types. We don't want to have to setup and test software that no one is going to use.