Developing with Operators


The EnOS Stream Processing Service provides a set of underlying packaged operators for developers to develop customized stream data processing jobs to meet the requirements of complex business scenarios.

Overview

EnOS Stream Processing Service provides a user-friendly drag-and-drop UI for designing data processing pipelines. You can quickly configure pipelines by adding operators (stages) to the pipeline, thus completing data ingestion, filtering, processing, and storage tasks without any programming.


A data processing pipeline usually consists of multiple stages that are connected by arrows that define the data stream, though which the data sequentially flows. Each stage represents a read-and-write or processing operation to the data. This kind of process forms a stream data processing job. A pipeline can include the following stages.

  • Origin

    The stage that specifies the data source and passes the output data to later stages, such as the Kafka Consumer stage.

  • Processor

    The stage for data conversion, where the input data is normalized or changed (such as data filtering, transforming, calculation, etc.).

  • Destination

    The stage for storing the processed data in the target storage or sending data for further processing.

Designing a Pipeline


Prerequisites

The EnOS Stream Processing Service provides multiple versions of operator libraries. Before designing stream data processing jobs, you need to install the needed version of operator library. For more information, see Installing an Operator Library or Template.


Develop a Stream Data Processing Pipeline with Operators

  1. Log in to the EnOS Management Console, select Stream Processing > Pipeline Designer, and click the + icon above the list of stream processing pipelines.

  2. On the New Stream window, select New to create the stream processing pipeline. You can also choose to import a configuration file to create the job quickly.

  3. Enter the name and description of the stream processing pipeline.

  4. From the Template drop-down list, select Origin Pipeline.

  5. From the Operator Version drop-down list, select the installed operator library version.

  6. For Message Channel, select the source of data to be processed:

    • If the data is ingested from connected devices, select Real-Time.

    • If the data is integrated through the offline message channel, select Offline.

  7. Click OK to create the stream processing pipeline with the basic settings above.

    _images/creating_streamsets_pipeline.png

Designing the Stream Processing Pipeline

Follow the steps below to design the stream processing pipeline with stages.

  1. On the pipeline designing canvas, select a stage you want to use (like the Point Selector stage) from the Stage Library in the upper right corner of the page to add it to the canvas.

    _images/streamsets_stage_library.png


  2. Delete the arrow connecting the Kafka DataSource and Kafka Producer stages and connect the Data Source stage to the new stage by clicking the output point of the Data Source stage and dragging it to the input point of the new stage. Do the same to connect the new stage to the Producer stage to complete adding the new stage to the pipeline. Click on the new stage and complete the parameter configuration.

    _images/streamsets_add_stage.png


  3. Repeat steps 1 and 2 to add more stages to the pipeline and complete the parameter configuration of the added stages.

  4. Click Save in the tool bar to save the configuration of the pipeline.

  5. Click the Validate icon icon_validate in the tool bar to check the parameter configuration of the stages. If the validation fails, update the configuration accordingly.

    _images/streamsets_validation.png


For more information about designing stream processing pipelines, see StreamSets User Guide.

Publishing and Running the Pipeline

If the validation is successful, you can publish the pipelien online and start it.

  1. Click Release in the tool bar to publish the pipeline.

  2. Open the Stream Processing > Pipeline Operation page, view the published pipeline, whose status is PUBLISHED by default.

  3. Complete the running resource configuration and alarm settings for the pipeline, ensure that the required system pipelines are running, and click the Start icon icon_start to start running the pipeline.


For more information about pipeline operations, see Maintaining Stream Processing Pipelines.

Operator Documentation

For detailed information about the function, parameter configuration, and output of the available operators, see the Operator Documentation.

Tutorial

To learn how to develop a stream data processing pipeline for a more complex business scenario, see Developing with Operators.