Rust vs Go vs C: Database and IoT Application Performance Benchmarks
Rust is a language designed to be both secure and fast [1], and in recent years, it has been expected to make further strides as a professional language employed in business [3][8]. However, the results of the 2021 survey [3] also show that while the rate of use at work has increased significantly from 42% to 59%, the lack of actual adoption in the industry was cited as the biggest concern (38%) about Rust's future.
In this article, to determine the practicality of Rust, we sought to compare and evaluate its implementation against application implementations with the same specifications in other programming languages, such as C and Go. We prepared two targeted evaluation applications, Database (Redis) and IoT (ECHONET Lite), to assess the efficiency and performance of practical Rust implementation.
In summary, Go is considered the best successor to C from a Better C perspective, followed by Objective-C, Rust, and C++. Rust offers safety and speed but has limitations in productivity, interoperability, and programming flexibility. Despite these challenges, Go stands out for its implementation efficiency and stable performance, making it a safe choice for general-purpose applications.
Evaluation 1 - Database Application (Redis)
This evaluation will compare C, Rust, and Go implementations of the Redis [19] specification in the database area as the same application; the official Redis implementation [19] is in C, while the Rust and Go implementations are unofficial subset implementations. The Rust implementation is mini-redis[21], which was released as a learning tool for the Tokio[20] library, and the Go implementation is a sample implementation (go-redis-server) of my Redis-compatible database implementation go-redis[23].
Compared to the official Redis implementation [19], mini-redis [21] and go-redis [23] are subset implementations, and therefore cannot be evaluated for implementation efficiency using LOC (Lines of Code), only for performance.
Evaluation Benchmarks
The benchmark programs evaluated were performed with redis-benchmark, the official benchmark tool for Redis. The benchmarks are not a full set but are limited to SET/GET commands and are run every 10,000 iterations with the standard 50 threads.
The reason for being limited to only basic SET/GET commands is that mini-redis[21] is only a minimal command implementation, such as SET/GET commands, and we followed the benchmark parameters[22] in mini-redis[21]. Note that mini-redis[21] is the latest version at the time of evaluation and the environment is "Mac mini (2018) + macOS 12.6".
Performance Evaluation - C > Go > Rust
The results of the SET/GET command execution by "redis-benchmark" are shown below. The 99th percentile (p99(ratio)) value, which is used as an indicator for database operation, is shown at the right end of the table along with the performance ratio based on the C language.
In the evaluation benchmarks, the official implementations of redis-server in C [19], go-redis in Go [23], and mini-redis in Rust [21] were faster, in that order. A graph of the above table is shown below.
The C redis-server[19] was optimized and was about 3 times faster than the sample implementations in Go and Rust. Both Go's go-redis [23] and Rust's mini-redis [21] were implemented as samples, and both have room for improvement. Taking this into consideration, a brief review of the evaluation results for each programming language is presented below.
Rust
Compared to the official C redis-server [19], SET: 28% and GET: 41% slower, and compared to the Go implementation go-redis [23], SET: 78% (=2.879/3.663) and GET: 88% (=2.167/2.455) slower.
Since the Tokio[20] library has been released for training purposes, it is assumed that full-scale optimization has not yet been performed. However, to some extent, optimization seems to have been performed by comparing performance with the official implementation of Redis [19] [22].
It is positioned for learning the Tokio[20] library, and the implementation basically uses the Tokio[20] library, with the network part built based on the Tokio[20] TCP server (tokio::net::TcpListener). However, the key-value data management part of the database is implemented with the HashMap of the standard (std) library and Mutex of the standard (std) library for exclusive control, as in the implementation of Evaluation 2. Performance issues may be based on performance issues of the standard (std) library [18], such as HashMap [17], as in Evaluation 2.
Go
Compared to the official C redis-server[19], SET: 35% and GET: 46% faster, while compared to Rust's mini-redis[21], SET: 127% (=3.663/2.879) and GET: 113% (=2.455/2.167) faster.
Go-redis[23] is a framework for implementing Redis-compatible databases, and like mini-redis[21], the target of this evaluation is a sample implementation, so the implementation is simple. The speed difference from the official implementation was unexpected due to the assumption that there was little room for optimization.
Go-redis[23] and the sample implementation "go-redis-server" are implemented only with the standard library, and the key-value data management part of the database is implemented with sync.Map. It will be necessary to refer to the official implementation for performance differences with the official implementation, in parallel with identifying bottlenecks in the profiling.
Evaluation 2 - IoT Applications (ECHONET Lite)
This evaluation targets client-server function implementations of ECHONET Lite[9], a communication protocol in the IoT field. This implementation is an evaluation of an application for which the target requirement specifications (= ECHONET Lite[9] specifications) are the same, and the design and implementers are the same.
Evaluation Frameworks
The Rust implementation [12] of ECHONET Lite [9] implemented in C [11], Go [13], and Python [14] are compared. The basic ECHONET Lite [9] functions are implemented using the same design as far as possible, but there are some differences in function implementation between the languages. An overview of the differences in implemented functions is shown in the figure below.
The ECHONET Lite[9] framework implementation subject to this evaluation was implemented in C, Go, Python, and Rust, in that order, and basically follows the same design.
As explained in the ECHONET Lite[9] Specification, the basic Device function, the Controller function for operating devices, and the Database function for standard devices[10] defined in ECHONET Lite are implemented in all programming languages.
However, for the transport layer of the ECHONET Lite [9] specification, only the mandatory UDP communication function is implemented in all programming languages, while the optional TCP communicator is implemented only in C [11] and Go [13], with implementation in Rust [12] and Python [14] omitted.
Implementation Efficiency (≠ Effort) Evaluation - Python > Rust ≈ Go > C
The efficiency of implementation in Rust was evaluated in terms of lines of code (LOC), which was calculated using Tokei [16] and excludes comments and blank lines. The results of LOC calculation are shown in the figure below.
For Auto, only the source code, which is the automatically generated code defining the ECHONET Lite standard devices [10] and excluding the C language header file, was subject to evaluation. The graphs in the above table are shown below.
Taking into account the range of implemented functionality, the LOC evaluation results in the following order: Python > Rust ≈ Go > C, with the least implemented code. Below is a brief review of each programming language.
Rust
The LOC results show that (although there is no implementation of TCP functionality), the implementation rating is equivalent to Go, with a LOC efficiency of 53% of that of C. This implementation with Rust was implemented using only standard (std) libraries such as Mutex, with the exception of the UDPSocket issue described below.
Rust is a language that generally requires a learning period before it feels productive [5]. Therefore, in Rust, it is not simply ‘less LOC = more implementation effort’: even code with a low LOC often takes a long time to implement because the developer struggles with the Rust compiler. Also, the extremely high LOC of auto-generated code is due to the specification of a standard maximum line width (max_width=100) in Rust's formatter (rustfmt), which is excluded from the evaluation.
There are also restrictions due to Rust-specific move semantics and lifetime, which can make it impossible to bring in designs from other languages, which can lead to high design costs. In the current implementation, some design changes were made due to the limitations of introducing the Observer pattern for trait objects and the locking interval synchronised with the Mutex lifetime.
C
The implementation in C resulted in the largest LOC. The reason for this is that this implementation does not use external libraries and includes self contained libraries (e.g. strings, list structures, object-oriented structures, etc.) which are not needed in other languages, and wrapper classes (e.g. Mutex, Thread, etc.) for portability. The home-grown libraries are shared with other C projects and are not as heavily implemented in this application as the LOC figures.
Go
The LOC results show that the implementation efficiency is comparable to Rust, with an implementation rating of 63% of the source code compared to C. In Rust, there were issues with some of the standard libraries, but in the Go language, the implementation was carried out using only the standard libraries.
Basically, the C design is implemented in the Go language as it is in C. Unlike the implementation in Rust, the design in C can be diverted and implemented, and there is flexibility. In addition, the standard library is more extensive than that of the C language, which can be evaluated as being directly related to ‘less LOC = less implementation time’.
Python
The implementation in Python has the lowest LOC result in this implementation evaluation, because Python does not require parentheses ({}) in code block expressions, so a comparison with other languages that require parentheses (C, Rust, Go) requires a subtraction. Also, in comparison to Rust, the fact that implementations can be inherited contributes to the reduction of LOC.
Python also has the flexibility to straightforwardly transfer designs from C and Go. Apart from the execution speed issues described below, it may be the best prototyping language
Performance Evaluation - Go > C > Rust > Python
The execution performance of Rust was evaluated for the same level of functionality of ECHONET Lite[9], the target of the evaluation, with the same basic design implementation. The benchmark program evaluated was a node in which the controller and objects of ECHONET Lite [9] were implemented.
Evaluation Benchmarks
In the benchmark program evaluated, the ECHONET Lite[9] controller is a UDP client, and the object is a UDP server. The main loop, which is the basic sequence of the benchmark, consists of 12 UDP requests (Requests) from the controller and 12 UDP responses (Responses) from the object per execution.
To explain this in accordance with the ECHONET Lite [9] Specification, the ECHONET Lite controller itself is discovered by a UDP multicast request (ESV: 0x62) and the implementation-required properties of the node profile object (0x0EF001) contained in the discovered controller node (0x0EF001) are set to the following values (12). The benchmark is 10,000 iterations of the request (ESV:0x62) response (ESV:0x72) operation with the UDP protocol for the values of [10] (12 values). For the evaluation of implemented code, no optimization options were used in each environment. The environment is "Mac mini (2018) + macOS 12.6" and the details of the evaluation scripts are available here [15].
Performance evaluation results
The performance evaluation was measured by using the time command to measure the execution time of 10,000 iterations of the basic sequence shown in the evaluation benchmarks. The evaluation results are shown below, along with an overview of the compilation conditions and implementation techniques for each programming language.
In the evaluation benchmarks, Go was the fastest implementation, followed by C, Rust, and Python. Since UDP communication is asynchronous, data transmission and reception from a request (Request) to a response (Response) are implemented using the condition and channel mechanisms of each programming language. A graph of the above table is shown below.
Python was excluded from the scalability analysis due to its extremely slow execution speed. In order to compare performance with C language, the performance ratio of each programming language relative to C language is shown in the figure below, along with a brief evaluation of the performance results for each programming language.
Rust
The Rust implementation performed worse than the Go and C implementations, being only 33% as fast as C and about 20% (=33/160) as fast as Go. While the system (sys) time is comparable to Go and faster than C, the user (user) time is larger than in C and Go.
The cause needs to be investigated, but according to "The Rust Performance Book" [17] and "The Rust Language FAQ" [18], there are some standard (std) libraries, such as HashMap, that may be slow under certain circumstances.
In addition, the standard (std) library Mutex, which is used extensively in this implementation, is difficult to handle because its valid interval is the same as that of the target object, and there are parts where copy semantics are unavoidably used to avoid this. Other language implementations are zero-copy (Zero Copy), so programmatically there is room for improvement (although significant design changes are required).
In addition, the standard (std) library UDPSocket cannot support socket creation by the same port number, so a non-standard crate must be used in conjunction with the implementation of IoT-related protocols such as ECHONET Lite[9] and mDNS, which are the target of this implementation. In addition, although IPv6 functionality has been implemented, it cannot be enabled at this time due to an error.
C
Basically, the user (user) time is the fastest, but the system (sys) time is 2.5 times slower than Go and Rust. We had no concerns about the performance of the C language until this evaluation, but there may be issues with the use of standard libraries such as pthreads.
Go
The Go implementation was the fastest in this evaluation benchmark, 1.6 times (=160%) faster than the C implementation and 5 times (=4.8=160%/33%) faster than the Rust implementation. It seems that the best results were obtained by writing a straightforward program.
Python
As might be expected, the Python implementation produced the lowest results in this evaluation benchmark. The system (sys) time degraded by only 1/2 (=53%) compared to C, but the user (user) time performance is less than 1/100 (<1%), a characteristic of the interpreter execution since it is a pure Python implementation.
Conslusion
If I were to evaluate Go as the successor to C and from a Better C perspective, I would personally rank Go = (Objective-C) > Rust > C++. C++ is overspecified in terms of Better C, and Objective-C has stopped evolving since 2007 (2.0) when ARC was introduced. Swift, the supposed successor, also has an uncertain multi-platform future[23].
Regarding productivity, Rust is a language that generally requires a learning period before it feels productive [5]. It is a language that is both safe and fast [1], but due to its limitations, it is a language that forces a struggle with the compiler, both good and bad. If it does not follow Rust's semantics, it is not possible to bring in design patterns (experience) from other languages. However, its limitations make it a language that requires contending with the compiler, in both good and bad ways.
In terms of productivity, it is often desirable to use existing development assets rather than implementations that can be completed only in Rust, as in this evaluation. Interoperability of C/C++ language assets continues to be highly desired in Rust and is recognized as a major challenge [5][25][26][4][27]. While there are FFI generators such as rust-bindgen [27], they are not interoperable with existing assets, and the interoperability issue is a major barrier compared to C++, Objective-C, and Go, where interoperability with C is guaranteed at the language level and can be used simply by including header files. Interoperability issues will be a factor in the decision to adopt Rust.
In terms of design, Rust also has to bring in reference (Arc) and exclusion control (Mutex) semantics similar to the C/C++/Go languages in concurrency applications (with data sharing), and also has to avoid the semantic limitations of Rust's language specification, such as move and lifetime. There is no escape from the limitations of the Rust language specification, such as Move and Lifetime. As a result, the degree of programming flexibility is limited compared to other programming languages, and design and implementation trade-off decisions need to be made that have an impact on productivity and performance, as shown in the performance evaluation discussion in Evaluation (ii).
As for safety, Rust is a programming language that is ensured by static analysis and dynamic boundary checks. However, even in traditional C/C++, a wealth of static analysis tools (e.g. Clang Static Analyze) and dynamic analysis tools (e.g. Valgrind) can be utilised, so a comprehensive evaluation including peripheral tools will be necessary.
Finally, in the performance evaluation, issues were identified in the Rust, C, and Go implementations. In this evaluation, the Go implementation showed good implementation efficiency and stable performance. Go may be the safest candidate as a general-purpose option. In any case, we will investigate the performance issues of each language in more detail.
- [1] The Rust Programming Language
- [2] Programming Rust 第2版
- [3] Rust Survey 2021 Results | Rust Blog
- [4] Rust Survey 2020 Results | Rust Blog
- [5] Rust Survey 2019 Results | Rust Blog
- [6] Rust Survey 2018 Results | Rust Blog
- [7] データでわかるRustの開発者達 〜Rust Survey 2021の深堀 | gihyo.jp
- [8] Launching the 2022 State of Rust Survey | Rust Blog
- [9] ECHONET Lite規格書 | ECHONET
- [10] Machine Readable Appendix Release | ECHONET
- [11] uEcho for C
- [12] uEcho for Rust
- [13] uEcho for Go
- [14] uEcho for Pyton
- [15] Benchmark utility package for uEcho implementations
- [16] Tokei (時計)
- [17] The Rust Performance Book
- [18] The Rust Language FAQ
- [19] Redis
- [20] Tokio - An asynchronous Rust runtime
- [21] tokio-rs/mini-redis: Incomplete Redis client and server implementation using Tokio - for learning purposes only
- [22] Initial benchmark results · Issue #1 · tokio-rs/mini-redis · GitHub
- [23] cybergarage/go-redis: The go-redis is a database framework for implementing a Redis compatible server using Go easily.
- [23] IBMがSwift開発を終了 - Chris Bailey氏とのQ&A
- [24] Rust Design Patterns
- [25] Microsoftが「Rust」言語を導入、安全性以外の理由あり(続報):性能や安全性以外にも評価あり - @IT
- [26] Why Rust for safe systems programming – Microsoft Security Response Center
- [27] Google Online Security Blog: Supporting the Use of Rust in the Chromium Project