Operating a large-scale recommendation system is a complex undertaking: it requires high availability and throughput, involves many services and teams, and the environment of the recommender system changes every second. For example, new members or new items may come to the service at any time. New code and new ML models get deployed to production frequently. One question we need to address at Netf