Rust vs Go vs C: Database and IoT Application Performance Benchmarks

Rust vs Go vs C: Database and IoT Application Performance Benchmarks

Rust vs Go vs C: Database and IoT Application Performance Benchmarks

Rust is a language designed to be both secure and fast [1], and in recent years, it has been expected to make further strides as a professional language employed in business [3][8]. However, the results of the 2021 survey [3] also show that while the rate of use at work has increased significantly from 42% to 59%, the lack of actual adoption in the industry was cited as the biggest concern (38%) about Rust's future.

In this article, to determine the practicality of Rust, we sought to compare and evaluate its implementation against application implementations with the same specifications in other programming languages, such as C and Go. We prepared two targeted evaluation applications, Database (Redis) and IoT (ECHONET Lite), to assess the efficiency and performance of practical Rust implementation.

In summary, Go is considered the best successor to C from a Better C perspective, followed by Objective-C, Rust, and C++. Rust offers safety and speed but has limitations in productivity, interoperability, and programming flexibility. Despite these challenges, Go stands out for its implementation efficiency and stable performance, making it a safe choice for general-purpose applications.

Evaluation 1 - Database Application (Redis)

This evaluation will compare C, Rust, and Go implementations of the Redis [19] specification in the database area as the same application; the official Redis implementation [19] is in C, while the Rust and Go implementations are unofficial subset implementations. The Rust implementation is mini-redis[21], which was released as a learning tool for the Tokio[20] library, and the Go implementation is a sample implementation (go-redis-server) of my Redis-compatible database implementation go-redis[23].

Compared to the official Redis implementation [19], mini-redis [21] and go-redis [23] are subset implementations, and therefore cannot be evaluated for implementation efficiency using LOC (Lines of Code), only for performance.

Evaluation Benchmarks

The benchmark programs evaluated were performed with redis-benchmark, the official benchmark tool for Redis. The benchmarks are not a full set but are limited to SET/GET commands and are run every 10,000 iterations with the standard 50 threads.

The reason for being limited to only basic SET/GET commands is that mini-redis[21] is only a minimal command implementation, such as SET/GET commands, and we followed the benchmark parameters[22] in mini-redis[21]. Note that mini-redis[21] is the latest version at the time of evaluation and the environment is "Mac mini (2018) + macOS 12.6".

Performance Evaluation - C > Go > Rust

The results of the SET/GET command execution by "redis-benchmark" are shown below. The 99th percentile (p99(ratio)) value, which is used as an indicator for database operation, is shown at the right end of the table along with the performance ratio based on the C language.

In the evaluation benchmarks, the official implementations of redis-server in C [19], go-redis in Go [23], and mini-redis in Rust [21] were faster, in that order. A graph of the above table is shown below.

The C redis-server[19] was optimized and was about 3 times faster than the sample implementations in Go and Rust. Both Go's go-redis [23] and Rust's mini-redis [21] were implemented as samples, and both have room for improvement. Taking this into consideration, a brief review of the evaluation results for each programming language is presented below.

Rust

Compared to the official C redis-server [19], SET: 28% and GET: 41% slower, and compared to the Go implementation go-redis [23], SET: 78% (=2.879/3.663) and GET: 88% (=2.167/2.455) slower.

Since the Tokio[20] library has been released for training purposes, it is assumed that full-scale optimization has not yet been performed. However, to some extent, optimization seems to have been performed by comparing performance with the official implementation of Redis [19] [22].

It is positioned for learning the Tokio[20] library, and the implementation basically uses the Tokio[20] library, with the network part built based on the Tokio[20] TCP server (tokio::net::TcpListener). However, the key-value data management part of the database is implemented with the HashMap of the standard (std) library and Mutex of the standard (std) library for exclusive control, as in the implementation of Evaluation 2. Performance issues may be based on performance issues of the standard (std) library [18], such as HashMap [17], as in Evaluation 2.

Go

Compared to the official C redis-server[19], SET: 35% and GET: 46% faster, while compared to Rust's mini-redis[21], SET: 127% (=3.663/2.879) and GET: 113% (=2.455/2.167) faster.

Go-redis[23] is a framework for implementing Redis-compatible databases, and like mini-redis[21], the target of this evaluation is a sample implementation, so the implementation is simple. The speed difference from the official implementation was unexpected due to the assumption that there was little room for optimization.

Go-redis[23] and the sample implementation "go-redis-server" are implemented only with the standard library, and the key-value data management part of the database is implemented with sync.Map. It will be necessary to refer to the official implementation for performance differences with the official implementation, in parallel with identifying bottlenecks in the profiling.

Evaluation 2 - IoT Applications (ECHONET Lite)

This evaluation targets client-server function implementations of ECHONET Lite[9], a communication protocol in the IoT field. This implementation is an evaluation of an application for which the target requirement specifications (= ECHONET Lite[9] specifications) are the same, and the design and implementers are the same.

Evaluation Frameworks

The Rust implementation [12] of ECHONET Lite [9] implemented in C [11], Go [13], and Python [14] are compared. The basic ECHONET Lite [9] functions are implemented using the same design as far as possible, but there are some differences in function implementation between the languages. An overview of the differences in implemented functions is shown in the figure below.

The ECHONET Lite[9] framework implementation subject to this evaluation was implemented in C, Go, Python, and Rust, in that order, and basically follows the same design.

As explained in the ECHONET Lite[9] Specification, the basic Device function, the Controller function for operating devices, and the Database function for standard devices[10] defined in ECHONET Lite are implemented in all programming languages.

However, for the transport layer of the ECHONET Lite [9] specification, only the mandatory UDP communication function is implemented in all programming languages, while the optional TCP communicator is implemented only in C [11] and Go [13], with implementation in Rust [12] and Python [14] omitted.

Implementation Efficiency (≠ Effort) Evaluation - Python > Rust ≈ Go > C

The efficiency of implementation in Rust was evaluated in terms of lines of code (LOC), which was calculated using Tokei [16] and excludes comments and blank lines. The results of LOC calculation are shown in the figure below.

For Auto, only the source code, which is the automatically generated code defining the ECHONET Lite standard devices [10] and excluding the C language header file, was subject to evaluation. The graphs in the above table are shown below.

Taking into account the range of implemented functionality, the LOC evaluation results in the following order: Python > Rust ≈ Go > C, with the least implemented code. Below is a brief review of each programming language.

Rust

Based on the LOC results (although there is no implementation of TCP functionality), the implementation evaluation is equivalent to that of Go, with a LOC efficiency of 53% compared to C. This Rust implementation uses only standard (std) libraries, such as Mutex, except for the UDPSocket issue described below.

Rust is a language that generally requires a learning period before it feels productive[5]. Therefore, in Rust, it is not simply "less LOC = less implementation effort." Also, the cause of the extremely high LOC of automatically (Auto) generated code is due to the specification of the standard maximum line width (max_width=100) of Rust's formatter (rustfmt), which is excluded from the evaluation.

There are also restrictions on Rust-specific move semantics and lifetimes, which can make it difficult to adapt multilingual designs and increase design costs. In this implementation, the implementation had to be partially redesigned due to the limitations of introducing the Observer pattern for trait objects and the locking interval synchronized with the Mutex lifetime.

C

The implementation in C resulted in the largest LOC. The reason is that this implementation does not use any external libraries and includes homegrown libraries (e.g., strings, list structures, object-oriented structures, etc.) that are not needed in other languages, as well as wrapper classes (e.g., Mutex, Thread, etc.) for portability. The homegrown libraries are shared with my other open-source C language projects, and the amount of implementation in this application is not as large as the LOC numbers suggest.

Go

The LOC results show that the implementation efficiency is equivalent to Rust, with an implementation rating of 63% in source code compared to C. In Rust, there were issues with some of the standard libraries, but in the Go language, it is implemented using only the standard libraries.

Basically, it is implemented in Go as it is in C. Unlike the implementation in Rust, it can be implemented by converting the design in C, and it is flexible. In addition, the standard library is more extensive than that of the C language, and it can be evaluated that "less LOC = less implementation effort" is directly related to it.

Python

The Python implementation had the lowest LOC in this evaluation. Also, in comparison with Rust, the fact that implementation inheritance is possible also contributes to the reduction of LOC.

Python also has the flexibility to straightforwardly convert designs in C or Go. It may be the best prototyping language, except for the execution speed issue described below.

Performance Evaluation - Go > C > Rust > Python

The execution performance of Rust was evaluated for the same level of functionality of ECHONET Lite[9], the target of the evaluation, with the same basic design implementation. The benchmark program evaluated was a node in which the controller and objects of ECHONET Lite [9] were implemented.

Evaluation Benchmarks

In the benchmark program evaluated, the ECHONET Lite[9] controller is a UDP client, and the object is a UDP server. The main loop, which is the basic sequence of the benchmark, consists of 12 UDP requests (Requests) from the controller and 12 UDP responses (Responses) from the object per execution.

To explain this in accordance with the ECHONET Lite [9] Specification, the ECHONET Lite controller itself is discovered by a UDP multicast request (ESV: 0x62) and the implementation-required properties of the node profile object (0x0EF001) contained in the discovered controller node (0x0EF001) are set to the following values (12). The benchmark is 10,000 iterations of the request (ESV:0x62) response (ESV:0x72) operation with the UDP protocol for the values of [10] (12 values). For the evaluation of implemented code, no optimization options were used in each environment. The environment is "Mac mini (2018) + macOS 12.6" and the details of the evaluation scripts are available here [15].

Performance evaluation results

The performance evaluation was measured by using the time command to measure the execution time of 10,000 iterations of the basic sequence shown in the evaluation benchmarks. The evaluation results are shown below, along with an overview of the compilation conditions and implementation techniques for each programming language.

In the evaluation benchmarks, Go was the fastest implementation, followed by C, Rust, and Python. Since UDP communication is asynchronous, data transmission and reception from a request (Request) to a response (Response) are implemented using the condition and channel mechanisms of each programming language. A graph of the above table is shown below.

To compare the performance of Python with that of C, the performance ratio of each programming language based on C is shown in the figure, and a brief review of the evaluation results for each programming language is given below.

Rust

The Rust implementation performed worse than the Go and C implementations, being only 33% as fast as C and about 20% (=33/160) as fast as Go. While the system (sys) time is comparable to Go and faster than C, the user (user) time is larger than in C and Go.

The cause needs to be investigated, but according to "The Rust Performance Book" [17] and "The Rust Language FAQ" [18], there are some standard (std) libraries, such as HashMap, that may be slow under certain circumstances.

In addition, the standard (std) library Mutex, which is used extensively in this implementation, is difficult to handle because its valid interval is the same as that of the target object, and there are parts where copy semantics are unavoidably used to avoid this. Since the multilingual implementation uses Zero Copy, there is room for improvement from a programmatic point of view (although significant design changes will be required).

In addition, the standard (std) library UDPSocket cannot support socket creation by the same port number, so a non-standard crate must be used in conjunction with the implementation of IoT-related protocols such as ECHONET Lite[9] and mDNS, which are the target of this implementation. In addition, although IPv6 functionality has been implemented, it cannot be enabled at this time due to an error.

C

Basically, the user (user) time is the fastest, but the system (sys) time is 2.5 times slower than Go and Rust. We had no concerns about the performance of the C language until this evaluation, but there may be issues with the use of standard libraries such as pthreads.

Go

The Go implementation was the fastest in this evaluation benchmark, 1.6 times (=160%) faster than the C implementation and 5 times (=4.8=160%/33%) faster than the Rust implementation. It seems that the best results were obtained by writing a straightforward program.

Python

As might be expected, the Python implementation produced the lowest results in this evaluation benchmark. The system (sys) time degraded by only 1/2 (=53%) compared to C, but the user (user) time performance is less than 1/100 (<1%), a characteristic of the interpreter execution since it is a pure Python implementation.

Conslusion

If I were to evaluate Go as the successor to C and from a Better C perspective, I would personally rank Go = (Objective-C) > Rust > C++. C++ is overspecified in terms of Better C, and Objective-C has stopped evolving since 2007 (2.0) when ARC was introduced. Swift, the supposed successor, also has an uncertain multi-platform future[23].

Regarding productivity, Rust is a language that generally requires a learning period before it feels productive [5]. It is a language that is both safe and fast [1], but due to its limitations, it is a language that forces a struggle with the compiler, both good and bad. If it does not follow Rust's semantics, it is not possible to bring in design patterns (experience) from other languages. However, its limitations make it a language that requires contending with the compiler, in both good and bad ways.

In terms of productivity, it is often desirable to use existing development assets rather than implementations that can be completed only in Rust, as in this evaluation. Interoperability of C/C++ language assets continues to be highly desired in Rust and is recognized as a major challenge [5][25][26][4][27]. While there are FFI generators such as rust-bindgen [27], they are not interoperable with existing assets, and the interoperability issue is a major barrier compared to C++, Objective-C, and Go, where interoperability with C is guaranteed at the language level and can be used simply by including header files. Interoperability issues will be a factor in the decision to adopt Rust.

For safety, Rust is a programming language that is assured by static analysis and dynamic boundary checks. However, even in traditional C/C++, a wealth of static analysis tools (e.g., Clang Static Analyzer) and dynamic analysis tools (e.g., Valgrind) can be used. Rust also has to bring the same reference (Arc) and exclusion control (Mutex) semantics as C/C++/Go in concurrency applications (with data sharing), and it cannot escape the semantic limitations of Rust's language specification, such as move and lifetime. As a result, it is difficult to compare Rust with other programming languages. The degree of programming flexibility is limited compared to other programming languages, and as shown in the performance evaluation discussion in Section 2, design and implementation trade-off decisions must be made that affect productivity and performance.

Finally, in the performance evaluation, issues were identified in the Rust, C, and Go implementations. In this evaluation, the Go implementation showed good implementation efficiency and stable performance. Go may be the safest candidate as a general-purpose option. In any case, we will investigate the performance issues of each language in more detail.