Kafka provides an extremely high throughput distributed publish/subscribe messaging system. Additionally, it supports relatively long term persistence of messages to support a wide variety of consumers, partitioning of the message stream across servers and consumers, and functionality for loading data into Apache Hadoop for offline, batch processing.
Kafka is written in Scala and depends on Apache ZooKeeper for coordination amongst its producers, brokers and consumers.
Kafka was developed internally at LinkedIn to meet our particular use cases, but will be useful to many organizations facing a similar need to reliably process large amounts of streaming data. Therefore, we would like to share it the ASF and begin developing a community of developers and users within Apache.