|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectcom.sleepycat.je.cleaner.UtilizationCalculator
public class UtilizationCalculator
Contains methods for calculating utilization and for selecting files to clean. In most cases the methods in this class are called by FileSelector methods, and synchronization order is always FileSelector then UtilizationCalculator. In some cases methods in this class are called directly by FileProcessor, and such methods must take care not to call FileSelector methods. This class maintains the size correction factor for LNs whose size is not counted, and related info, as explained in detail below. [#18633] (Historical note: Originally in JE 5, the corrected average LN size was used to adjust utilization. This was changed to a correction factor since different log files may have different average LN sizes. [#21106]) LN size correction is necessary because the obsolete size of LNs may be unknown when we delete or update an LN that is not resident in cache. We do not wish to fetch the LN just to obtain the size of the previous version. Therefore we record in the FileSummary that an LN became obsolete (by incrementing obsoleteLNCount) and also that its size was not counted (by not incrementing obsoleteLNSizeCounted and not adding a size to obsoleteLNSize). Later we estimate utilization by assuming that such obsolete LNs have sizes that are roughly the average LN size for obsolete LNs whose size was not counted in the file, as calculated by FileSummary.getObsoleteLNSize and getAverageObsoleteLNSizeNotCounted. This is fairly accurate when the sizes of obsolete and non-obsolete LNs in a file are uniformly distributed. But it can be inaccurate when they are not. If obsolete LNs are generally smaller than non-obsolete LNs, the average will be larger than the true size and estimated utilization will be lower then true utilization; in this case there is danger of over-cleaning and wasted resources. If obsolete LNs are generally larger, the average will be smaller than the true size and estimated utilization will be higher than true utilization; in this case there is danger of unbounded log growth and a "disk space leak". To correct the inaccurate estimate we take advantage of the fact that the true LN sizes and true average LN size can be determined when the FileProcessor reads a log file. We save the true average LN size as the result of a FileProcessor run in the adjustUtilization method. The true average LN size is then used to calculate a correction factor -- a factor that, when multiplied by the estimated average LN size, results in the true average LN size. The correction factor is used when determining the utilization of other log files; this is done by the getBestFile method. To guard against over-cleaning (the first case described above), the FileProcessor will naturally select a file for cleaning because estimated utilization is lower than true utilization. When the file is cleaned, the LN size correction factor will be determined. However, to guard against unbounded log growth (the second case above), this mechanism is not sufficient because estimated utilization is higher than true utilization and a file may not be selected for cleaning. To handle this case, a log file must sometimes be read by the FileProcessor, even though estimated utilization is above the minUtilization threshold, to "probe" the true utilization and determine the true average LN size. Probing is a read-only operation and does not migrate the active LNs in the log file or attempt to delete the file. Probing a log file is extra work and we strive to do it as infrequently as possible. The following algorithm is used. 1) We determine the number of new log files N that have been written since the last time adjustUtilization was called. Recall that adjustUtilization is called when FileProcessor reads a file, for cleaning a file or probing it. N is determined by the shouldPerformProbe method, which is called via FileSelector.selectFileForCorrection by the FileProcessor when the FileProcessor is woken to do cleaning but no file is selected for cleaning. 2) shouldPerformProbe returns true only if N is greater than a threshold value. The threshold has a high value (20 by default) if the minimum number of initial adjustments (files cleaned or probed, 5 by default) has not been reached, and has a lower value (5 by default) after the initial adjustment minimum has been reached. In other words, we do probes more frequently when we haven't collected a lot of input data. 3) If shouldPerformProbe returns true, then selectFileForCorrection in the FileSelector calls the getBestFile method here, passing true for the isProbe argument. In this mode, getBestFile will select a file based on worst case (lowest possible) utilization. The maximum LN size in the file is used, rather than the average LN size, for this worst case calculation. See the FileSummary getMaxObsoleteLNSize method. If no file is selected using worst case utilization, then probing is not needed because total utilization cannot be lower than the configured minUtilization. 4) If FileSelector.selectFileForCorrection returns a non-null file number, then FileProcessor will read the file in calcUtilizationOnly mode. True utilization will be determined and adjustUtilization will be called, but active LNs will not be migrated and the file will not be deleted. To summarize, a file is read in probing mode only if both of the following conditions are satisfied: a) The number of new files written since the last correction, N, is greater than the threshold, which is 5 or 20 depending on the whether the last correction is greater or less than 0.9. b) Using a worst case utilization calculation that uses the maximum obsolete LN size in a file rather than the average, the total log utilization is under the configured minUtilization. To determine the LN size correction factor that is applied when estimating utilization, rather than using the size for a single log file, the sizes for the last several log files are saved and the average value is used. The average over the last 10 adjustments is computed when adjustUtilization is called. If the mix of LN sizes for obsolete and non-obsolete LNs is roughly the same for all log files, and this mix does not change over time, then using a running average would not be necessary and a single size would be sufficient. However, some applications have access patterns that are not uniform over time, and the running average size will help to compensate for this. It is important to avoid premature use of the LN size correction factor in selecting files for cleaning, before enough input data has been collected to make the computed value meaningful. To accomplish this, the correction factor is not computed or used until the average size data for a minimum number of LNs with uncounted sizes (1000 by default) is available. Because it is expensive to compute, the recently computed values are stored persistently as part of the CheckpointEnd log entry. Flushing this info once per checkpoint is sufficient, and no special handling is necessary after a crash; losing the new info since the last checkpoint is acceptable.
Nested Class Summary | |
---|---|
(package private) static class |
UtilizationCalculator.AverageSize
Bundles a count of LNs and their total size, for use in calculating a running average. |
(package private) static class |
UtilizationCalculator.TestAdjustment
For passing adjustment information to a test hook. |
Constructor Summary | |
---|---|
UtilizationCalculator(EnvironmentImpl env,
Cleaner cleaner)
|
Method Summary | |
---|---|
(package private) void |
adjustUtilization(long fileNum,
long endFileNum,
FileSummary estimatedFileSummary,
FileSummary trueFileSummary)
Saves the true average LN size for use in calculating utilization. |
(package private) Long |
getBestFile(SortedMap<Long,FileSummary> fileSummaryMap,
boolean forceCleaning,
Set<Long> lowUtilizationFiles,
boolean isBacklog,
boolean isProbe)
Returns the best file that qualifies for cleaning or probing, or null if no file qualifies. |
(package private) Long |
getCheapestFileToClean(SortedMap<Long,FileSummary> fileSummaryMap,
SortedSet<Long> candidateFiles)
Returns the cheapest file to clean from the given list of files. |
float |
getLNSizeCorrectionFactor()
Returns the factor to be multiplied by the average LN size (for LNs with uncounted sizes) to correct for differences between obsolete and active LN sizes. |
CleanerLogSummary |
getLogSummary()
Returns log summary info that should be saved persistently. |
(package private) boolean |
isCorrectionEstablished()
Returns whether enough adjustments have been made to conclude that the the LN size correction factor has been established, or at least unnecessary because very few LN sizes are uncounted. |
(package private) void |
setAdjustmentHook(TestHook<UtilizationCalculator.TestAdjustment> hook)
See UtilizationCorrectionTest. |
void |
setLogSummary(CleanerLogSummary logSummary)
Restores log summary info that was read from persistent storage. |
(package private) void |
setProtectedFiles()
Determine which files are protectd from deletion, which influences which files are cleaned. |
(package private) boolean |
shouldPerformProbe(long endFileNum)
Returns whether a correction probe should be attempted, if worst case utilization also indicates that cleaning may be needed. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
UtilizationCalculator(EnvironmentImpl env, Cleaner cleaner)
Method Detail |
---|
public CleanerLogSummary getLogSummary()
public void setLogSummary(CleanerLogSummary logSummary)
void setAdjustmentHook(TestHook<UtilizationCalculator.TestAdjustment> hook)
void setProtectedFiles()
public float getLNSizeCorrectionFactor()
Long getBestFile(SortedMap<Long,FileSummary> fileSummaryMap, boolean forceCleaning, Set<Long> lowUtilizationFiles, boolean isBacklog, boolean isProbe)
fileSummaryMap
- the map containing file summary info.forceCleaning
- is true to always select a file, even if its
utilization is above the minimum utilization threshold.lowUtilizationFiles
- is a returned set of files that are below the
minimum utilization threshold.isBacklog
- is true if there is currently a backlog, in which case
FilesToMigrate won't be used to return a file.isProbe
- is true to use the maximum LN obsolete size to determine
file utilization. It should be false when selecting a file cleaning a
file normally, to use the average LN size for uncounted sizes along with
correction factor. It should be true when selecting a file to calculate
utilization without cleaning, to determine the worst case (lowest
possible) utilization and to ignore the correction factor.Long getCheapestFileToClean(SortedMap<Long,FileSummary> fileSummaryMap, SortedSet<Long> candidateFiles)
void adjustUtilization(long fileNum, long endFileNum, FileSummary estimatedFileSummary, FileSummary trueFileSummary)
boolean isCorrectionEstablished()
boolean shouldPerformProbe(long endFileNum)
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |