Nimbus: Tuning Filters Service on Tweet Streams
- Resource Type
- Conference
- Authors
- Lai, Chien-An; Donahue, Jim; Musaev, Aibek; Pu, Calton
- Source
- 2015 IEEE International Congress on Big Data Big Data (BigData Congress), 2015 IEEE International Congress on. :623-630 Jun, 2015
- Subject
- Communication, Networking and Broadcast Technologies
Computing and Processing
Engineering Profession
General Topics for Engineers
Sparks
Databases
Twitter
Time factors
Web servers
Tuning
- Language
- ISSN
- 2379-7703
With hundreds of millions of tweets being generated by Twitter users every day, tweet analysis has drawn considerable attention for event detection and trending sentiment indication. The problem is finding the few important tweets in this huge volume of traffic. A number of systems provide applications the ability to filter a complete or partial Twitter stream based on keywords and/or text properties to try to separate the relevant tweets from all of the noise. Designing a filter to produce useful results can be extremely difficult. For instance, consider the problem of finding tweets related to the Target Corporation or Guess USA. Just scanning the text of tweets for "target" or "guess" is likely to generate lots of hits, but few really relevant tweets. Nimbus is a service that can be used to tune filters on tweet streams. The Nimbus service builds a database of tweets from a Twitter stream (it does not have to be a full Twitter fire hose) and provides an API for testing filters (based on the Power Track language and Spark as evaluation engine) against the database. The important feature of Nimbus is that it allows repeatable testing of filter expressions against real Twitter data using the same filter language that can be used against live Twitter streams. This makes it possible for users of the service to tune their filters before putting them into production use.