Enriching data with BigConnect

BigConnect uses a complex pipeline to enrich the data stored in the system: Named Entity Extraction on text, Object detection, Machine Learning, OCR, Speech2Text, custom plugins

January 14, 2019

BigConnect has a complex data processing pipeline that can accommodate even the toughest tasks. We call it the Data Worker pipeline, and it uses… Data Workers :)

Data Workers are plugins that run in random order when something changes in the data. They can run on a single machine or on hundreds of machines for large deployments and resource-intensive tasks. A Data Worker can to do three things:

Every change on a data element or property inside BigConnect will notify the Data Worker pipeline which in turn will execute all available Data Workers, passing along the changed element and its changed property if needed. The pipeline works using queues, so it can be easily distributed to any number of machines. Some Data Workers can run on some machines, while others can run on other machines, depending on resource consumption and volume requirements. It’s completely up to the developer to properly architect the pipeline setup.

Data Workers don’t run in a particular order. First, the isHandled method is checked to see if the Data Worker can process the change and if so, its execute method will be called. It’s completely up to the developer to establish when a Data Worker will actually execute on a specific element. By not enforcing a specific order, developers have maximum flexibility.

We provide quite a few Data Workers out of the box to get you started. The most notable ones are:

You can easily develop you own data workers for specific tasks by extending the com.mware.core.ingest.dataworker.DataWorker class. Package your class in a jar file along with a META-INF/services/com.mware.core.ingest.dataworker.DataWorker service file and place it under the $BIGCONNECT_DIR/lib/ext folder.

Check out the source code for the provided Data Workers to learn how you can create your own one.

Back to Blog