Lucidworks Big Data & Oozie Workflow With VizOozie

In this post we will discuss how to create a visualized workflow graph for Oozie. Oozie is a workflow management system for Hadoop jobs. Oozie Workflow jobs are DAG (Directed Acyclical Graphs) of actions: http://oozie.apache.org

At Lucidworks we use Oozie in our Lucidworks Big Data product. The workflows which we provide with the platform are configured and run with Oozie. Developers create workflow.xml, workflow definition files for Oozie, and deploy them to Hadoop. A good explanation of how this works is provided here:http://www.infoq.com/articles/oozieexample

Some workflows get complicated pretty quickly and may include subworkflows, forks and joins and other actions which are hard to follow in xml. A visualization tool then would help streamlining workflow designs and quickly grasp the gist of what the workflow does.

VizOozie is an open source tool which helps converting your static xml workflow definitions into dot files, which can be used by graphviz dot program to create pdf or other formats: http://www.graphviz.org/

You will need a Unix like environment, python, and graphviz dot installed to run this.

Check it out from github and run:

python vizoozie/vizoozie.py example/workflow.xml example/workflow.dot

or use your own Oozie workflow xml file.

This will generate a dot file which can be easily converted to pdf with dot:

dot -Tpdf example/workflow.dot -o example/workflow.pdf

workflow

Standard workflow shapes are used for the start, end, process, join, fork and decision nodes. The action node backfill colors are configurable in the vizoozie.properties file (e.g. java action is in blue).

The code is pretty simple, it takes each node type and converts xml to dot string using xml.dom.minidom and writes it out. For example, given an XML snippet:

  <fork name="post-process">
    <path start="complex-math" />
    <path start="more-complex" />
    <path start="geek-candy-process" />
  </fork>

the code for a fork node looks like this:

    def processFork(self, doc):
        output = ''
        for node in doc.getElementsByTagName("fork"):
            name = self.getName(node)
            output += 'n' + name.replace('-', '_') + " [shape=octagon];n"
            for path in node.getElementsByTagName("path"):
                start = path.getAttribute("start")
                output += 'n' + name.replace('-', '_') + " -> " + start.replace('-', '_') + ";n"
        return output

In this method, there is just some node name normalization with name.replace(‘-‘, ‘_’) as well specific node shape insertion (shape=octagon). Then, it just looks for the fork’s start paths like these: <path start=”complex-math” />. From our example above, this method will produce an output like this:

post_process [shape=octagon];
post_process -> complex_math;
post_process -> more_complex;
post_process -> geek_candy_process;

When used with dot program, it will generate a fork node with three children nodes. I hope you find this explanation useful.

How an electronics giant meets engineers where they are, with 44 million products in catalog

Meet Mohammad Mahboob: A search platform director navigating 44 million products across...

From Search to Solutions: How AI Agents Can Power Digital Commerce in 2025

Watch this on-demand webinar to discover the six smartest AI-driven DX strategies...

Build custom AI agents without writing a single line of code? Yep, we did that.

Finally, a low-code AI platform (really, no code) that lets the people...

Lucidworks Big Data & Oozie Workflow With VizOozie

You Might Also Like

How an electronics giant meets engineers where they are, with 44 million products in catalog

From Search to Solutions: How AI Agents Can Power Digital Commerce in 2025

Build custom AI agents without writing a single line of code? Yep, we did that.