I/O buffering
Input/output (I/O) buffering is a mechanism that improves the throughput of input and output operations. It is implemented directly in hardware and the corresponding drivers (hence the block devices found in Unix-like systems), and is also ubiquitous among programming language standard libraries.
I/O operations often have high latencies; the time between the initiation of an I/O process and its completion may be millions of processor clock cycles. Most of this latency is due to the hardware itself; for example, information cannot be read from or written to a hard disk until the spinning of the disk brings the target sectors directly under the read/write head. (At the time of this writing, 7200 RPM hard drives are the norm, so this process may take up to about 8 milliseconds to complete.) When the input/output device is a network interface, the latency is usually greater yet.
This is alleviated by having one or more input and output buffers associated with each device. Even if a program only wants to read one block of data from a device, the driver might fetch that block plus several of the blocks immediately following it on the disk, caching these in memory, because programs often access the disk sequentially, meaning that the next block the program will request is likely the next physical block on the disk. When it actually does, the driver, instead of performing another physical read on the disk, can then simply return that block (which is cached in memory), and hence reduce the latency dramatically. When writes to disk are requested, the driver may simply cache the data in memory until it has accumulated enough blocks of data, at which point it writes them all at once; this is called flushing the output buffer, or syncing. The driver will normally provide a means to request that data be flushed immediately, rather than cached. This must be done, for example, before the device is removed from the system (should it happen to be removable media such as an optical disc) or when the system is shutting down.
On a multitasking operating system, hardware devices are controlled by the kernel, and user space applications may not directly access them. For this reason, performing I/O requires performing system calls, which, for various reasons, introduce overhead. This overhead is typically on the order of microseconds rather than milliseconds, so using buffering here is not crucial for programs that perform a relatively small amount of I/O, but makes a big difference for applications that are I/O bound (meaning that they spend most of their time waiting for I/O to complete, rather than performing computation that would be sped up if the CPU were faster).
Thus, nearly every program written in a high-level programming language will have its own I/O buffers (typically one input buffer for each file or device that the program wants to read from, and one output buffer for each it wants to write to). These buffers may be much larger than the ones maintained by the low-level drivers, and they exist at a higher level of abstraction, as they are associated with file handle or file descriptor objects (an abstraction provided by the system) rather than actual hardware. Now, when a program wants to read from a file, it first checks whether anything is left in that file's input buffer; if so, it simply returns that, and only when the buffer is exhausted (or when a seek is performed, which renders the buffer's contents useless), does the program perform a system call to read more data from the file—often, more than is requested by the program at the current time. Likewise, each output (write) routine simply tacks data onto the buffer, until it is filled, at which point its contents are sent to the system. This is all performed "behind-the-scenes" (that is, in the implementation) by the standard library's input and output routines, which again will usually expose some means to manually flush the output buffers. (The Unix convention is to use the term flush at the application level, in the C standard library, versus sync at the device level, in unistd.h