November 28, 2017

Getting a Little Further Than Hello World With Rust - Part 1: Ownership & Mutability

Rust has excellent documentation and these posts are not meant to replace the official documentation. I strongly suggest going through the links below, if you have not already done so, before/after/while reading these posts:

The Rust Programming Language (User’s Manual)
The Rustonomicon (Advanced and unsafe programming)
Cargo Guide (Project & dependencies management)

Project Setup

If you haven’t set up rust on your machine take a look at the post on installing Rust without root privileges.

Let us start by creating a Cargo project:

$ cargo new hello_world
     Created library `hello_world` project

$ ls -1a hello_world/
.
..
Cargo.toml
.git
.gitignore
src

Notice that cargo also has set up git for our project. So we can just go ahead and do our initial commit:

$ cd hello_world/

$ git status
On branch master

Initial commit

Untracked files:
  (use "git add <file>..." to include in what will be committed)

        .gitignore
        Cargo.toml
        src/

nothing added to commit but untracked files present (use "git add" to track)

$ git .gitignore Cargo.toml src/

$ git commit -m"Initial commit"
[master (root-commit) 661b30f] Initial commit
 3 files changed, 16 insertions(+)
 create mode 100644 .gitignore
 create mode 100644 Cargo.toml
 create mode 100644 src/lib.rs

rustfmt is a tool that formats Rust code with community standards of formatting. Let us installing it if you haven’t done so:

$ cargo install rustfmt
    Finished release [optimized] target(s) in 215.99 secs
  Installing /home/muhuk/.cargo/bin/rustfmt
  Installing /home/muhuk/.cargo/bin/cargo-fmt
warning: be sure to add `/home/muhuk/.cargo/bin` to your PATH to be able to run the installed binaries

Adding export PATH=~/lib/rust/bin:~/.cargo/bin:$PATH to ~/.profile file (or similar) would make cargo globally available.

If you want to use emacs to edit Rust code, install rust-mode. Here I am using use-package, your configuration may differ:

;; In ~/.emacs.d/init.el

(use-package cargo
  :ensure t)

(use-package flycheck-rust
  :ensure t)

(use-package rust-mode
  :ensure t
  :init
  (add-hook 'rust-mode-hook #'flycheck-rust-setup)
  :config
  (setq rust-format-on-save t))

With the setup done, let us start exploring Rust.

Safe Rust

Rust positions itself as a system programming language, just like C, C++ and Assembly. But it differs from these languages with its emphasis of safe access to memory. Rust compiler does not allow you to write code that would result in undefined behavior. Here is an example code snippet that does not compile:

let k = 1 / 0;

Compiler relies on simple rules to determine what is safe and what it should reject. In case of division by zero we have a clear case. But in other cases relaxing safety checks might allow the programmer to write code with higher performance. Rust allows this via unsafe directive. In such cases it is the programmer’s responsibility to ensure whatever is in unsafe scope is in fact safe. Anything that is not marked unsafe, including functions with unsafe blocks within, can be considered safe. More on this below.

Ownership

Rust has two traits called Copy And Drop. We will not get too much into the details of traits in this post. Let us just say traits are markers on types. Difference between Copy and Drop is important. Copy indicates that a type has no heap allocations and lives on the stack. Copy types, therefore, can be practically copied:

let a = 42;  // a is 42
let b = a;   // b is also 42, but they do not share data i.e.
             // they do not point to the same memory location

Any type that is not a Copy can be considered a Drop, even if they do not necessarily implement Drop. Destruction of a type can be customised by implementing Drop.drop() for that type. A Copy value need not be destroyed like this since it will go away once the stack is popped.

Rust does not allow .drop() to be called manually, code below does not compile:

let x = vec![1, 2, 3];
x.drop();  // ERROR: explicit destructor calls not allowed
println!("head of x is {}", x[0]);

Instead of relying on the programmer to release resources, Rust compiler looks at the structure of the code and inserts cleanup instructions automatically. Good news is this feature is very reliable and it has zero runtime cost. Bad news is you need to grok a few new concepts. Good news is they are few and reasonably simple. Namely these are:

Lifetimes & Ownership
Referencing & Borrowing

The Rust Programming Language contains detailed information on these subjects, and more. But before we can move on with our examples, a brief look into these concepts is necessary.

In Rust every value has an owner. By value I mean a mutable variable or an immutable constant[1]:

;; Integers are Copy

let x = 3;        // Can only be read, x = 5 would fail to compile.
let mut y = 4;    // Can be read and modified, notice the mut modifier.


;; String is Drop

let z = String::from("Foo");

// Line below would fail to compile as expected:
z = String::from("Bar")

// More interestingly this would also fail:
z.push('$')       // Error message: cannot borrow mutably

// Why it fails we will explore later, for now just know that
// mut modifier applies to both left hand side and right hand
// side of assignment. In other words both to the symbol and
// the value it references.

Ownership is determined in the context of a lifetime. Superficially lifetimes can be thought as code blocks (for example function bodies):

fn hello_fib(name: &str) {
    let mut i = 1;                          // start lifetime of i    -------+
    let mut j = 1;                          // start lifetime of j    -----+ |
                                            //                             | |
    loop {                                  //                             | |
        let k = i;                          // start lifetime of k    ---+ | |
        println!("Hello {} {}", name, k);   //                           | | |
        i = j;                              //                           | | |
        j = k + j;                          //                           | | |
                                            //                           | | |
        if i >= 100 {                       //                           | | |
            break;                          //                           | | |
        }                                   //                           | | |
    }                                       // end lifetime of k      ---+ | |
}                                           // end lifetime of i & j  -----+-+

Variable j is not yet defined in the line we define i, so its lifetime begins the next line. Read only value k is only valid in the scope of the loop and its lifetime ends at the end of the loop block. Every time loop is entered a new k will be allocated, but the same i and j is accessible within the loop block for the same function invocation. Nothing really ground breaking so far. Just that instead of (runtime) garbage collection, Rust compiler will add instructions to clean up once values’ lifetimes are over.

In the hello_fib function above the story of name’s lifetime is a bit more complicated. We can write the same function like this:

fn hello_fib<'a>(name: &'a str) {
    // ...
}

Here 'a denotes a lifetime. It is a lifetime parameter, like a type parameter but designates a lifetime instead of a type. Written this way it is exactly the same function as above. Just like how we omitted type annotations for i, j & k above we can let the compiler infer the lifetime of name. If there were multiple parameters, we could have specified a different lifetime for each of them.

References

In the function above name is a (read-only) reference to a string slice (or string literal). Let us look at the difference between call-by-reference and call-by-value. Passing parameters by value moves ownership from caller to callee:

fn callee(x: String) {
    println!("x is {}", x);
}

fn caller() {
    let k = String::from("Y THO");
    callee(k);
    println!("k is {}", k);         // ERROR: value used here after move
}

Because k is moved when callee is called, println! in caller would fail to compile. At that point in the program, k is not longer a valid value. If we replace String with an integer it would compile and run just fine:

fn callee(x: u8) {
    println!("x is {}", x);
}

fn caller() {
    let k = 42;
    callee(k);
    println!("k is {}", k);
}

The reason why one works and the other one does not is because u8 type is Copy but String is not. We can pass a copy of the original string like below:

fn callee(x: String) {
    println!("x is {}", x);
}

fn caller() {
    let k = String::from("Y THO");
    callee(k.clone());
    println!("k is {}", k);
}

Note that we are still moving, but instead of k, anonymous value returned by k.clone() gets moved. Therefore we can still refer to k in caller after calling callee. Cloning a read only value is waste of memory. This is also not idiomatic Rust. References solve this problem:

fn callee(x: &str) {
    println!("x is {}", x);
}

fn caller() {
    let k = String::from("Y THO");
    callee(k.as_str());
    println!("k is {}", k);
}

String.as_str returns a read-only reference of the String as a str. Differences between String and str is beyond the scope of this post. We will probably get into the details of pointer types and slices in the next post. Suffice it to say we could have used &String but &str is more efficient and idiomatic.

Unsafe Rust

Safety features of Rust can be relaxed locally using unsafe keyword. Before we look at an example I would like to mention that unsafe is, paradoxially, safe. You cannot be calling an unsafe function outside of an unsafe block. At the root of the call hierarchy there is always a safe function that contains an unsafe block, therefore the location that safety guarantees have to be ensured by the programmer is explicitly known. Above that point in call stack Rust is safe.

Here is an exampe I have found on GitHub:

// https://rustbyexample.com/unsafe.html
// http://rust-lang-ja.org/rust-by-example/unsafe.html

fn main() {
    let raw_p: *const u32 = &10;

    unsafe {
        assert!(*raw_p == 10);
    }
}

Moving the assert! outside of an unsafe block would fail to compile with the following error:

error[E0133]: dereference of raw pointer requires unsafe function or block

Notice that in this example we, as the programmer, know that dereferencing raw_p will not cause any undefined behavior or memory error. Also notice that main is safe, but this safety is something that still depends on our assumptions as the programmer.

Another usage of unsafe is for tagging entire functions as unsafe:

unsafe fn dangerously() {
    println!("I also like to live dangerously!");
}

fn main() {
    unsafe {
        dangerously(); // Compiles and runs.
    }

    dangerously();     // ERROR: call to unsafe function requires unsafe function or block.
}

As I mentioned above, dangerously can only be called in an unsafe context, within an unsafe block or an unsafe function.

I think we have covered here enough ground for a blog post. Rust is an interesting language and I expect it to stay relevant in the near future.

[1]	Immutable variable is an oxymoron.

If you have any questions, suggestions or corrections feel free to drop me a line.

Posted by Atamert Ölçgen

Filed under: Programming

Tags: hello world immutable memory safety pointer rust undefined behavior

muhuk's blog

Nature, to Be Commanded, Must Be Obeyed

Getting a Little Further Than Hello World With Rust - Part 1: Ownership & Mutability

Project Setup

Safe Rust

Ownership

References

Unsafe Rust

muhuk's blog

Nature, to Be Commanded, Must Be Obeyed

Getting a Little Further Than Hello World With Rust - Part 1: Ownership & Mutability

Project Setup

Safe Rust

Ownership

References

Unsafe Rust

Share this post: