Architecture
Event processing: There’s usually a need to respond to certain incoming interactions within milliseconds, e.g., to flag possible fraud, to bid on an auction, to respond to a routing request, or to make a recommendation. Typically these responses involve a fast response based on a model that was previously scored in a cluster. In large volume applications, this response often involves horizontally scaling out a database for reading and writing state, which has been the raison d’être for NoSQL databases. For some applications, there’s a need for more advanced correlation among events, which has led to the development of complex event processing systems.
Batch processing: To respond effectively in near real-time it’s important to apply analytics in advance, by crunching large amounts of data. This is where scale-out clusters, such as those built on Hadoop MapReduce, really shine. Immediately, this includes the production cycle, which involves updating profiles for items (cookies, placements, content, places, devices, etc.) that can in turn be pushed out for real-time event response and for fast analytics. However, the cluster is also used for a science cycle, which is a process of investigation and improvement that’s used to improve the production cycle — typically new approaches are simulated in the cluster and when they appear promising, they are A/B tested.
Fast analytics: Both data scientists and business analysts need access to summarized calculations of common values to explore and visualize data, and to make decisions. Some of these values need to be available quickly to facilitate faster iterations and quick decision making (e.g., for reporting and common decision support needs). This kind of analytic information is another kind that is typically precomputed in a cluster in batch, and then exported to a low latency database (whether relational or NoSQL) to feed BI tools. Low latency analytic databases that can also serve as sources for MapReduce calculations are valuable, allowing fast lookups and more comprehensive ad hoc analysis queries.
Batch processing: To respond effectively in near real-time it’s important to apply analytics in advance, by crunching large amounts of data. This is where scale-out clusters, such as those built on Hadoop MapReduce, really shine. Immediately, this includes the production cycle, which involves updating profiles for items (cookies, placements, content, places, devices, etc.) that can in turn be pushed out for real-time event response and for fast analytics. However, the cluster is also used for a science cycle, which is a process of investigation and improvement that’s used to improve the production cycle — typically new approaches are simulated in the cluster and when they appear promising, they are A/B tested.
Fast analytics: Both data scientists and business analysts need access to summarized calculations of common values to explore and visualize data, and to make decisions. Some of these values need to be available quickly to facilitate faster iterations and quick decision making (e.g., for reporting and common decision support needs). This kind of analytic information is another kind that is typically precomputed in a cluster in batch, and then exported to a low latency database (whether relational or NoSQL) to feed BI tools. Low latency analytic databases that can also serve as sources for MapReduce calculations are valuable, allowing fast lookups and more comprehensive ad hoc analysis queries.
from partner page
Who is MiSONE
|
Highlights
|
Follow Us On
|
Worldwide
|
|
Copyright © 2015 MISONE.COM.TR | MİSONE HABERLEŞME TEKNOLOJİ VE YAZILIM A.Ş.
|