Lab Home | Phone | Search
Center for Nonlinear Studies  Center for Nonlinear Studies
 Home 
 People 
 Current 
 Executive Committee 
 Postdocs 
 Visitors 
 Students 
 Research 
 Publications 
 Conferences 
 Workshops 
 Sponsorship 
 Talks 
 Seminars 
 Postdoc Seminars Archive 
 Quantum Lunch 
 Quantum Lunch Archive 
 P/T Colloquia 
 Archive 
 Ulam Scholar 
 
 Postdoc Nominations 
 Student Requests 
 Student Program 
 Visitor Requests 
 Description 
 Past Visitors 
 Services 
 General 
 
 History of CNLS 
 
 Maps, Directions 
 CNLS Office 
 T-Division 
 LANL 
 
Thursday, August 24, 2017
11:30 AM - 12:30 PM
CNLS Conference Room (TA-3, Bldg 1690)

Seminar

Real Time Text Matching at Scale

Shayan Mohanty
Co-Founder and CEO of Watchful.io

The total number of data-producing devices in the world is increasing, and data-heavy organizations are feeling a growing pressure to reduce the amount of time it takes to extract value and insight from their data. Traditional Extract-Transform-Load (ETL) pipelines allow for blind storage of this content into data lakes, but typically act as bottlenecks as queries begin taking many hours to complete. The need to process data in-stream is apparent, and the efficacy of the strategy is further cemented by the recent popularity and success of stream-processing platforms (Spark et al.). Watchful is a massive event processor that makes stream processing easy and fast. It allows users to identify, filter, and route data in real-time based on content rather than inflexible headers/schemas. Under the hood, Watchful is powered by a sophisticated distributed non-backtracking regular expression engine and coordination layer designed to provide a turn-key experience for end users. In this talk, we will discuss Watchful’s high level architecture, the regex evaluation strategies it bundles with the engine, and the concurrency guarantees/scaling patterns it is designed for. We will also touch on our experiment with a team from A-1 (experiment led by Geoffrey Fairchild and Sara Del Valle) on the heterogeneous cluster, Darwin, to predict and classify known events via Twitter chatter using Watchful to front-load all pattern matching for more than 8 billion tweets and over 1000 concurrent regular expressions.

Host: Geoffrey Fairchild