Lab Home | Phone | Search | ||||||||
|
||||||||
The total number of data-producing devices in the world is increasing, and data-heavy organizations are feeling a growing pressure to reduce the amount of time it takes to extract value and insight from their data. Traditional Extract-Transform-Load (ETL) pipelines allow for blind storage of this content into data lakes, but typically act as bottlenecks as queries begin taking many hours to complete. The need to process data in-stream is apparent, and the efficacy of the strategy is further cemented by the recent popularity and success of stream-processing platforms (Spark et al.). Watchful is a massive event processor that makes stream processing easy and fast. It allows users to identify, filter, and route data in real-time based on content rather than inflexible headers/schemas. Under the hood, Watchful is powered by a sophisticated distributed non-backtracking regular expression engine and coordination layer designed to provide a turn-key experience for end users. In this talk, we will discuss Watchful’s high level architecture, the regex evaluation strategies it bundles with the engine, and the concurrency guarantees/scaling patterns it is designed for. We will also touch on our experiment with a team from A-1 (experiment led by Geoffrey Fairchild and Sara Del Valle) on the heterogeneous cluster, Darwin, to predict and classify known events via Twitter chatter using Watchful to front-load all pattern matching for more than 8 billion tweets and over 1000 concurrent regular expressions. Host: Geoffrey Fairchild |