Setting configuration parameters that affect how mysql bin-log is flushed to disk

There are 2 parameters that affect how bin-log is flushed to disk:
max_binlog_cache_size
If a multiple-statement transaction requires more than this many bytes of memory, the server generates a Multi-statement transaction required more than 'max_binlog_cache_size' bytes of storage error. The minimum value is 4096. The maximum and default values are 4GB on 32-bit platforms and 16PB (petabytes) on 64-bit platforms
This affects only multi-statement transaction to be written to bin-log.
sync_binlog
If the value of this variable is greater than 0, the MySQL server synchronizes its binary log to disk (using fdatasync()) after every sync_binlog writes to the binary log. There is one write to the binary log per statement if autocommit is enabled, and one write per transaction otherwise. The default value of sync_binlog is 0, which does no synchronizing to disk. A value of 1 is the safest choice because in the event of a crash you lose at most one statement or transaction from the binary log. However, it is also the slowest choice (unless the disk has a battery-backed cache, which makes synchronization very fast).
If the value of sync_binlog is 0 (the default), no extra flushing is done. The server relies on the operating system to flush the file contents occasionally as for any other file.
With setting of 1 we lose atmost 1 transaction in case of OS or mysql crash which is the safest one and slowest one But with batter backed drive cache the performance can be improved little bit. Any number greater than 1 would direct mysql to do sync_binlog number of writes to log cache before flushing to bin-log file. With crash you would lose sync_binlog number of writes. With bin-log=0 there is no synchronization and mysql leaves it to OS to do flushes. Setting of 0 achieved 20 to 30% performance improvement over setting of 1.
with setting of sync_binlog=0, mysql allows flushing of bin-log to bin-log file to linux. Linux default setting for flush is 5 secs.
Linux usually writes data out of the page cache using a process called pdflush. At any moment, between 2 and 8 pdflush threads are running on the system. You can monitor how many are active by looking at /proc/sys/vm/nr_pdflush_threads. Whenever all existing pdflush threads are busy for at least one second, an additional pdflush daemon is spawned. The new ones try to write back data to device queues that are not congested, aiming to have each device that's active get its own thread flushing data to that device. Each time a second has passed without any pdflush activity, one of the threads is removed. There are tunables for adjusting the minimum and maximum number of pdflush processes, but it's very rare they need to be adjusted.
pdflush tunables
Exactly what each pdflush thread does is controlled by a series of parameters in /proc/sys/vm:
/proc/sys/vm/dirty_writeback_centisecs (default 500): In hundredths of a second, this is how often pdflush wakes up to write data to disk. The default wakes up the two (or more) active threads every five seconds.
There can be undocumented behavior that thwarts attempts to decrease dirty_writeback_centisecs in an attempt to make pdflush more aggressive. For example, in early 2.6 kernels, the Linux mm/page-writeback.c code includes logic that's described as "if a writeback event takes longer than a dirty_writeback_centisecs interval, then leave a one-second gap". In general, this "congestion" logic in the kernel is documented only by the kernel source itself, and how it operates can vary considerably depending on which kernel you are running. Because of all this, it's unlikely you'll gain much benefit from lowering the writeback time; the thread spawning code assures that they will automatically run themselves as often as is practical to try and meet the other requirements.
The first thing pdflush works on is writing pages that have been dirty for longer than it deems acceptable. This is controlled by:
/proc/sys/vm/dirty_expire_centiseconds (default 3000): In hundredths of a second, how long data can be in the page cache before it's considered expired and must be written at the next opportunity. Note that this default is very long: a full 30 seconds. That means that under normal circumstances, unless you write enough to trigger the other pdflush method, Linux won't actually commit anything you write until 30 seconds later.
The second thing pdflush will work on is writing pages if memory is low. This is controlled by:
/proc/sys/vm/dirty_background_ratio (default 10): Maximum percentage of active that can be filled with dirty pages before pdflush begins to write them
Note that some kernel versions may internally put a lower bound on this value at 5%.
Most of the documentation you'll find about this parameter suggests it's in terms of total memory, but a look at the source code shows this isn't true. In terms of the meminfo output, the code actually looks at
MemFree + Cached - Mapped
So on the system above, where this figure gives 2.5GB, with the default of 10% the system actually begins writing when the total for Dirty pages is slightly less than 250MB--not the 400MB you'd expect based on the total memory figure.
Summary: when does pdflush write?
In the default configuration, then, data written to disk will sit in memory until either a) they're more than 30 seconds old, or b) the dirty pages have consumed more than 10% of the active, working memory. If you are writing heavily, once you reach the dirty_background_ratio driven figure worth of dirty memory, you may find that all your writes are driven by that limit. It's fairly easy to get in a situation where pages are always being written out by that mechanism well before they are considered expired by the dirty_expire_centiseconds mechanism.
There is another parameter involved though that can spill over into management of user processes:
/proc/sys/vm/dirty_ratio (default 40): Maximum percentage of total memory that can be filled with dirty pages before processes are forced to write dirty buffers themselves during their time slice instead of being allowed to do more writes.
Note that all processes are blocked for writes when this happens, not just the one that filled the write buffers. This can cause what is perceived as an unfair behavior where one "write-hog" process can block all I/O on the system. The classic way to trigger this behavior is to execute a script that does "dd if=/dev/zero of=hog" and watch what happens. See Kernel Korner: I/O Schedulers for examples showing this behavior.
Tuning Recommendations for write-heavy operations
The usual issue that people who are writing heavily encouter is that Linux buffers too much information at once, in its attempt to improve efficiency. This is particularly troublesome for operations that require synchronizing the filesystem using system calls like fsync. If there is a lot of data in the buffer cace when this call is made, the system can freeze for quite some time to process the sync.
Another common issue is that because so much must be written before any phyiscal writes start, the I/O appears more bursty than would seem optimal. You'll have long periods where no physical writes happen at all, as the large page cache is filled, followed by writes at the highest speed the device can achieve once one of the pdflush triggers is tripped.
dirty_background_ratio: Primary tunable to adjust, probably downward. If your goal is to reduce the amount of data Linux keeps cached in memory, so that it writes it more consistently to the disk rather than in a batch, lowering dirty_background_ratio is the most effective way to do that. It is more likely the default is too large in situations where the system has large amounts of memory and/or slow physical I/O.
dirty_ratio: Secondary tunable to adjust only for some workloads. Applications that can cope with their writes being blocked altogether might benefit from substantially lowering this value. See "Warnings" below before adjusting.
dirty_expire_centisecs: Test lowering, but not to extremely low levels. Attempting to speed how long pages sit dirty in memory can be accomplished here, but this will considerably slow average I/O speed because of how much less efficient this is. This is particularly true on systems with slow physical I/O to disk. Because of the way the dirty page writing mechanism works, trying to lower this value to be very quick (less than a few seconds) is unlikely to work well. Constantly trying to write dirty pages out will just trigger the I/O congestion code more frequently.
dirty_writeback_centisecs: Leave alone. The timing of pdflush threads set by this parameter is so complicated by rules in the kernel code for things like write congestion that adjusting this tunable is unlikely to cause any real effect. It's generally advisable to keep it at the default so that this internal timing tuning matches the frequency at which pdflush runs.

Labels

Setting configuration parameters that affect how mysql bin-log is flushed to disk

Anonymous

0 comments

Daily Views

About me

Search This Blog

Blog Archive

Popular Posts

Contact Form

Translate

Wikipedia

Labels

Setting configuration parameters that affect how mysql bin-log is flushed to disk

Share This Story

Anonymous

You Might Also Like

0 comments

Daily Views

About me

Search This Blog

Blog Archive

Popular Posts

Contact Form

Translate

Wikipedia