简介

本文总结一下tensorflow serving的模型加载策略，以及模型serving策略。

模型加载策略

tf-serving支持多平台、多模型、多版本，支持制定模型维度的加载策略。
这个策略配置在serving-1.15.0/tensorflow_serving/config/model_server_config.proto中，字段如下:

// Version policy for the model indicating which version(s) of the model to
// load and make available for serving simultaneously.
// The default option is to serve only the latest version of the model.
//
// (This can be changed once a model is in serving.)
FileSystemStoragePathSourceConfig.ServableVersionPolicy model_version_policy = 7;

这个字段的定义在serving-1.15.0/tensorflow_serving/sources/storage_path/file_system_storage_path_source.proto中，内容如下:

// A policy that dictates which version(s) of a servable should be served.
message ServableVersionPolicy {
    // Serve the latest versions (i.e. the ones with the highest version
    // numbers), among those found on disk.
    //
    // This is the default policy, with the default number of versions as 1.
    message Latest {
        // Number of latest versions to serve. (The default is 1.)
        uint32 num_versions = 1;
    }

    // Serve all versions found on disk.
    message All {
    }

    // Serve a specific version (or set of versions).
    //
    // This policy is useful for rolling back to a specific version, or for
    // canarying a specific version while still serving a separate stable
    // version.
    message Specific {
        // The version numbers to serve.
        repeated int64 versions = 1;
    }

    oneof policy_choice {
        Latest latest = 100;
        All all = 101;
        Specific specific = 102;
    }
}

末尾policy_choice中可以判断使用的是哪种:

latest 表示只加载最新版本
all 加载所有版本
specific 指定版本(可以多个版本)

除此之外，AspiredVersionsManager类有两种版本管理策略:

AvailabilityPreservingPolicy 先加载新版本，后卸载老版本
ResourcePreservingPolicy 先卸载老版本，后加载新版本

模型serving策略

模型serving策略实际上是在加载策略基础上实现的。
在模型对应的配置中有字段version_labels，定义如下:

// String labels to associate with versions of the model, allowing inference
// queries to refer to versions by label instead of number. Multiple labels
// can map to the same version, but not vice-versa.
//
// An envisioned use-case for these labels is canarying tentative versions.
// For example, one can assign labels "stable" and "canary" to two specific
// versions. Perhaps initially "stable" is assigned to version 0 and "canary"
// to version 1. Once version 1 passes canary, one can shift the "stable"
// label to refer to version 1 (at that point both labels map to the same
// version -- version 1 -- which is fine). Later once version 2 is ready to
// canary one can move the "canary" label to version 2. And so on.
map<string, int64> version_labels = 8;

这个字段可以指定版本的标签。

接着在请求包有的model_spec字段中，可以指定版本或者指定标签，如下:

// Metadata for an inference request such as the model name and version.
message ModelSpec {
    // Required servable name.
    string name = 1;

    // Optional choice of which version of the model to use.
    //
    // Recommended to be left unset in the common case. Should be specified only
    // when there is a strong version consistency requirement.
    //
    // When left unspecified, the system will serve the best available version.
    // This is typically the latest version, though during version transitions,
    // notably when serving on a fleet of instances, may be either the previous or
    // new version.
    oneof version_choice {
        // Use this specific version number.
        google.protobuf.Int64Value version = 2;

        // Use the version associated with the given label.
        string version_label = 4;
    }

    // A named signature to evaluate. If unspecified, the default signature will
    // be used.
    string signature_name = 3;
}

serving服务处理请求时，根据version_choice的取值来选择版本，默认是选择最新版本，代码如下:

switch (model_spec.version_choice_case()) {
    case ModelSpec::kVersion: {
        *servable_request = ServableRequest::Specific(model_spec.name(), model_spec.version().value());
        break;
    }
    case ModelSpec::kVersionLabel: {
        if (!options_.allow_version_labels) {
            return errors::InvalidArgument("ModelSpec has 'version_label' set, but it is not currently allowed by the server.");
        }
        int64 version;
        TF_RETURN_IF_ERROR(GetModelVersionForLabel(model_spec.name(), model_spec.version_label(), &version));
        *servable_request = ServableRequest::Specific(model_spec.name(), version);
        break;
    }
    case ModelSpec::VERSION_CHOICE_NOT_SET: {
        *servable_request = ServableRequest::Latest(model_spec.name());
        break;
    }
}

上面的模型选择需要与模型加载机制相配置，如果只加载了一个版本的模型，这时如果指定版本或者标签，就有可能找不到对应的模型，
这时就会返回标签不存在或者模型版本找不到的错误，然后请求失败，并没有兜底请求一说。