A lightweight, powerful design and execution engine that streams data in real time. Use Data Pipelines to route and process data in your data streams.
create AccountTo define the flow of data, you design a pipeline in DataFabric Lab. A pipeline consists of stages that represent the origin and destination of the pipeline, and any additional processing that you want to perform. After you design the pipeline, you click Start and DataFabric goes to work.
DataFabric processes data when it arrives at the origin and waits quietly when not needed. You can view real-time statistics about your data, inspect data as it passes through the pipeline, or take a close look at a snapshot of data.
see documentationData passes through the pipeline in batches. This is how it works:
The origin creates a batch as it reads data from the origin system or as data arrives from the origin system, noting the offset. The offset is the location where the origin stops reading.
The origin sends the batch when the batch is full or when the batch wait time limit elapses. The batch moves through the pipeline from processor to processor until it reaches pipeline destinations.
see documentationA visual environment to develop, debug, test and run data pipelines in a way that's simple, intuitive and easy.
When you configure a pipeline, you define how you want data to be treated: Do you want to prevent the loss of data or the duplication of data?
The Delivery Guarantee pipeline property offers the following choices:
The information above describes a standard single-threaded pipeline - the origin creates a batch and passes it through the pipeline, creating a new batch only after processing the previous batch.
Some origins can generate multiple threads to enable parallel processing in multithreaded pipelines. In a multithreaded pipeline, you configure the origin to create the number of threads or amount of concurrency that you want to use. And Data Collector creates a number of pipeline runners based on the pipeline Max Runners property to perform pipeline processing. Each thread connects to the origin system, creates a batch of data, and passes the batch to an available pipeline runner.
see documentation