Hadoop: The Definitive Guide by Tom White

Hadoop: The Definitive Guide by Tom White

Author:Tom White [White, Tom]
Language: eng
Format: epub, azw3, mobi, pdf
ISBN: 9781491901632
Publisher: O'Reilly Media
Published: 2015-03-24T22:00:00+00:00


Batching

For efficiency, Flume tries to process events in batches for each transaction, where possible, rather than one by one. Batching helps file channel performance in particular, since every transaction results in a local disk write and fsync call.

The batch size used is determined by the component in question, and is configurable in many cases. For example, the spooling directory source will read files in batches of 100 lines. (This can be changed by setting the batchSize property.) Similarly, the Avro sink (discussed in Distribution: Agent Tiers) will try to read 100 events from the channel before sending them over RPC, although it won’t block if fewer are available.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.