Why Wallaroo Moved From Pony To Rust

Wallaroo.AI
6 min readOct 19, 2021

--

The Background.

Wallaroo delivers pushbutton productionization of ML models today, but that wasn’t always the case — before our pivot, our small team of engineers set out to build a next-generation stream processing framework. Our goal was extreme scalability, performance, and resilient in-memory state. From experience, we felt that the commonly used solutions such as Apache Storm, Apache Spark, etc. were not optimized for many use cases, such as requiring large volumes of data to be processed, or needing very low latency. Furthermore, the existing Apache tools depended on Java — specifically the JVM — where it’s really hard to get predictable, very low latency results (and also were not suited to modern data science algorithms and workflows). We believed we could do better — process more data, faster, and at far lower cost. Our goal was to build a system that could perform millions of computations a second, with very low (and consistent) latency, and very low infrastructure footprint while being simpler to use and manage.

The Wallaroo team was clear that building another distributed computing framework in Java was not the answer. We built an initial prototype in Python to test some ideas, but we knew before we even began that Python was out of the question once we started building the actual framework for all sorts of performance, flexibility and scalability reasons.

So if not Java, and not Python, what language should we choose? We knew that we’d be writing performance intensive, highly concurrent code. We considered C++, Pony, and Rust. From a purely performance perspective, C or C++ would have been a good choice. However, from past experience we knew that building highly distributed data processing applications in C/C++ was no easy task. We ruled out C++ because we wanted better safety guarantees around memory and concurrency.

Rust 1.0 had just been released, but we knew that we would need to build our own runtime for handling concurrency if we chose it. Pony, on the other hand, was a language with strong memory and concurrency safety guarantees that also had a high-performance, actor-based runtime. We decided it would be the best way to move quickly and build a reliable, performant system. Our bet paid off for early iterations of our software. Though, like Rust, Pony has a learning curve, we found that once you got through it, the language made concurrent programming easy, and eliminated whole classes of bugs that we’ve dealt with in the past. And because we didn’t have to build a concurrent runtime, we were able to very quickly focus on our core domain. Pony also provided an actor-based concurrent garbage collector that was designed to avoid long “stop the world” pauses. This was critical since we were aiming for very low, predictable tail latencies.

It turns out Pony was a great language for our initial goals around Wallaroo. It helped us solve hard problems quickly without having to build a concurrent runtime from scratch, and it enabled us to meet ambitious performance targets.

However, in the meantime data science and machine learning continued to explode, and we found that customer interest in deploying machine learning models far outstripped demand for more conventional stream processing data algorithms. With the increasing focus on MLOps, Rust became a better fit. Like Pony, it also gave us safety and performance, but the larger library ecosystem was critical for our MLOps work.

The Switch to RUST

By the time we started focusing on MLOps, the situation had changed. On one hand, Rust had come a long way in a short period of time. For a concurrent runtime, for example, we now had Tokio. On the other hand, we found that we needed the more robust ecosystem of existing libraries available for Rust. Pony continues to make progress, but has a smaller community, and as a small startup we were better off not having to solve problems outside our core domain, particularly around low-level integrations. And there has been a lot of development on the ML side in Rust, which has been the most valuable for us since moving into this space. Hiring for Rust was also an easier proposition, which was important as we planned to rapidly scale up the company.

We started the transition with some experiments to get a feel for how we would solve some of our existing problems with Rust. One of our first experiments was recreating a special-purpose HTTP server that we had built for one of our customers. This allowed us to test out the ergonomics of the language. Coming from Pony, a language built around actors, we were especially interested in learning about Rust’s support for asynchronous programming.

Pleased with the results of this early work, the engineering team began to dive into learning Rust and to architect our platform around its paradigms. Having come from Pony we were already used to thinking closely about memory safety, so we were well prepared to deal with Rust’s borrow checker, which can be a daunting challenge to many new Rustaceans.

Having used Rust for a while now, we’re very pleased with the results. Being a type-safe language like Pony, has allowed us to keep some important advantages. We can quickly move forward with our codebase, knowing that the compiler will catch a whole class of errors before they become runtime issues. The borrow checker keeps us safe from spending hours or days hunting down data races.

RUST Benefits

Our new Rust-based platform recently handled millions of inferences a second of a sophisticated ad-targeting model with a median latency of 5 milliseconds, and a yearly infrastructure cost of $130K. From a customer perspective, that’s a resounding success.

What’s more, our desire to have access to a larger library ecosystem has been met. We are currently using open source libraries such as TensorFlow and ONNX to run ML models. These libraries use Rust’s FFI to wrap the underlying C implementations. It would have taken us weeks of development time just to get basic wrappers around these systems if we had written them ourselves, and months to get them to a robust state where we were comfortable sending them to customers. And there are a wealth of other libraries at our disposal if the need should arise for us to use them. This has allowed us to stay focused on the business goal of delivering a platform that makes it simple to rapidly deploy and run machine learning models in production.

In addition to the wealth of available libraries, we also benefit from Rust’s maturity and high adoption rate. We know that we are building on a well-designed and well-tested language, and we have access to a large community that will guarantee ongoing future support. This lets us focus all of our engineering energy on helping our customers solve their problems. The community has created a wide variety of tools, including IDEs, alternative compilers, analysis tools, and new target platforms. All of these tools make it easier to develop high quality software.

We have adapted our onboarding process to include reading “The Rust Programming Language”, “The Rustonomicon”, and articles about Rust’s asynchronous runtimes. Even though Rust is more widely used than Pony there are still plenty of great programmers who have not learned it yet, so we want to make sure that we have a way of getting them up to speed as part of their onboarding.

We are happy that we had an opportunity to work with Pony. It is a great language that helped us get to where we are today. We would encourage anyone reading this who is interested in new programming ideas to spend some time with it, because we think you will learn a thing or two even if you never use it in production. One of our engineering values at Wallaroo is “The Right Tool for the Job”, and at this stage Rust is the tool that does what we need. We are excited about using it and are rapidly enhancing and evolving our product

Some further notes about Rust and Pony

  • Rust provides most of the same advantages as Pony (designed for very high performance systems programming, memory and data race safety at compile time, the ability to use C/C++ libraries, compiles to native code, etc.)
  • Rust has a robust ecosystem of libraries. We get ready-made support for a wide variety of integrations.
  • There is a much larger pool of engineers who are eager to learn and work in Rust, or who already have significant Rust experience.
  • There are more resources available for learning Rust, and more opportunities for participation in conferences, etc.

Further Reading

https://blog.wallaroolabs.com/2017/10/why-we-used-pony-to-write-wallaroo/

https://www.ponylang.io/discover/#what-is-pony

https://www.rust-lang.org/

--

--

Wallaroo.AI

90% of AI projects fail to deliver ROI. We change that. Wallaroo solves operational challenges for production ML so you stay focused on business outcomes.