TensorFlow has long been a popular framework for developers working on image classification, object detection, and other vision tasks. Many might associate Hugging Face with natural language processing, but it has expanded its capabilities into machine learning for computer vision. Deploying a trained TensorFlow vision model can seem daunting, but TensorFlow Serving simplifies this process by offering REST or gRPC interfaces.
Preparing the Vision Model for Deployment
Before deploying, ensure your model is properly trained and exported. TensorFlow vision models can be trained using the Keras API or the tf.vision
module. Suppose you’ve already trained a model for image classification on datasets like CIFAR-10 or a custom dataset using tf.keras
.
Save your completed model in the SavedModel format, which is compatible with TensorFlow Serving:
model.save('export/1/')
The directory path is crucial because TensorFlow Serving uses folder-based versioning, where each model version must be saved in a numbered directory. This exported model includes the architecture, weights, and necessary assets for serving.
While Hugging Face doesn’t host TensorFlow models for live serving, it allows you to share models via the Model Hub, enabling others to download and reuse them. The key is to use Hugging Face for distribution and versioning and TensorFlow Serving for live application serving.
Setting Up TensorFlow Serving
TensorFlow Serving is a model server specifically designed for TensorFlow models, working with REST or gRPC protocols for performance and flexibility. The simplest setup method is using Docker.
First, pull the TensorFlow Serving Docker image and mount your exported model:
docker pull tensorflow/serving
Run the container:
docker run -p 8501:8501 --name=tf_model_serving \
--mount type=bind,source=$(pwd)/export,target=/models/vision_model \
-e MODEL_NAME=vision_model -t tensorflow/serving
The model is now served on port 8501 via REST:
http://localhost:8501/v1/models/vision_model:predict
You can send a POST request with an image (preprocessed to match the input shape) in JSON format. Note that preprocessing remains the client’s responsibility, unless integrated into the model using tf.keras.layers.Rescaling
or similar layers.
Hosting and Sharing the Model on Hugging Face
Hugging Face’s Model Hub supports various model formats, including TensorFlow’s SavedModel, making it an excellent platform to host your vision model post-training.
Convert your local SavedModel directory to a Hugging Face model repo structure. Although Hugging Face prefers transformers
or datasets
formats, it’s flexible with TensorFlow models. Use the huggingface_hub
Python library to upload:
from huggingface_hub import create_repo, upload_folder
create_repo("my-tf-vision-model", private=True)
upload_folder(
repo_id="username/my-tf-vision-model",
folder_path="export",
repo_type="model"
)
Include a README with model details and examples. Once uploaded, others can download your model using the library or via direct Git clone.
To serve the model live, replicate the Docker setup with TensorFlow Serving. Note that Hugging Face does not offer real-time inference hosting for TensorFlow models like it does for PyTorch Transformers, so TensorFlow Serving remains essential for live usage.
Handling Model Updates and Versioning
Model updates are essential due to data shifts or new architectures. TensorFlow Serving easily handles updates by deploying new versions in a directory:
export/
├── 1/
├── 2/
TensorFlow Serving automatically routes traffic to the latest version, or you can specify a version in requests. Hugging Face also supports model versioning, allowing you to push updates to the same repository with clear commit messages and README updates for transparency.
This workflow keeps local serving (via TF Serving) and global sharing (via Hugging Face) coordinated yet separate, enabling efficient experimentation and deployment without confusion. The Hugging Face Model Hub acts as the canonical source for your TensorFlow vision model, aiding developers in finding references or models to fine-tune.
Conclusion
Deploying TensorFlow vision models using TensorFlow Serving alongside Hugging Face Model Hub for distribution offers both live inference capabilities and collaborative reach. This modular approach balances performance with openness, making it ideal for building a computer vision API or sharing work with a broader community. By combining these tools, you simplify both deployment and sharing without adding unnecessary overhead.