I don't really blog anymore. Click here to go to my main website.

muhuk's blog

Nature, to Be Commanded, Must Be Obeyed

March 06, 2020

Getting a Little Further Than Hello World With Rust - Part 4: Anatomy of a Data Type

This is the fourth part of Getting a Little Further Than Hello World With Rust series. In this post we will take a closer look at Rust structs, some commonly used traits. If everything goes according to the plan variable, value, reference, pointer concepts, in Rust context, should become clearer by the time you finished reading.

Private and Immutable by Default

I love Rust language because of the little (and not so little) right decisions that were taken in its design. For example; testing is a first class citizen. It is built into the language and what comes out of the box is quite sufficient unless you need some exotic testing requirements. I love the fact that there are no testing frameworks for Rust.

Another right decision is to make things explicit. Actually these are a set of decisions. One example is; a module member is private unless you explicitly specify it public. Another one is; a variable is immutable unless you explicitly specify it as mutable.

fn main() -> () {
    let v = data_types::Vector { x: 4.0, y: 3.0, z: 5.0 };  // compilaton error
}

mod data_types {
    struct Vector {
        x: f64,
        y: f64,
        z: f64,
    }
}

Code above will not compile because Vector is not defined public:

fn main() -> () {
    let v = data_types::Vector { x: 4.0, y: 3.0, z: 5.0 };
}

mod data_types {
    pub struct Vector {
        x: f64,
        y: f64,
        z: f64,
    }
}

This code will still not compile. Compiler will complain that x, y and z fields are private and therefore they cannot be set in Vector constructor (in the line that starts with let v =). This means Vector can be used as a struct, but its fields cannot be accessed directly. There are two ways to make this code compile. One is to declare the fields public:

pub struct Vector {
    pub x: f64,
    pub y: f64,
    pub z: f64,
}

This is fine if you are looking for a lightweight data type. But allowing everyone to instantiate your struct may not be something you want. Perhaps some fields need to be private, or maybe you want to run some validations before constructing the instance. A custom constructor, commonly called new, can be used:

use data_types::Vector;

fn main() -> () {
    let v = Vector::new(4.0, 3.0, 5.0);
}

mod data_types {
    pub struct Vector {
        x: f64,
        y: f64,
        z: f64,
    }

    impl Vector {
        pub fn new(x: f64, y: f64, z: f64) -> Vector {
            Vector { x: x, y: y, z: z }
        }
    }
}

Since new function’s parameter names are same as Vector struct’s fields, we can write it more concisely like below:

impl Vector {
    pub fn new(x: f64, y: f64, z: f64) -> Vector {
        Vector { x, y, z }
    }
}

There is nothing special about calling the constructor new from Rust perspective. It is just a static method (first parameter is not &self or self or &mut self). Here is another static method just to illustrate the point:

impl Vector {
    pub fn new(x: f64, y: f64, z: f64) -> Vector {
        Vector { x, y, z }
    }

    pub fn up() -> Vector {
        Vector {
            x: 0.0,
            y: 0.0,
            z: 1.0
        }
    }
}

Needless to say we would not be able to call new (or up) outside of Vector if we did not mark them as public.

Let us consider the case where we mark the fields as public again:

pub struct Vector {
    pub x: f64,
    pub y: f64,
    pub z: f64,
}

This code would compile and run:

fn main() -> () {
    let v = Vector { x:4.0, y:3.0, z:5.0 };
    println!("vector is ({}, {}, {})", v.x, v.y, v.z);
}

But this code would not:

fn main() -> () {
    let v = Vector { x:4.0, y:3.0, z:5.0 };
    v.x = 2.0;
    println!("vector is ({}, {}, {})", v.x, v.y, v.z);
}

It would throw a compilation error that says cannot assign to `v.x`, as `v` is not declared as mutable. As if that is not enough Rust compiler would also tell us which change we should make at exactly which line of our code:

error[E0594]: cannot assign to `v.x`, as `v` is not declared as mutable
 --> src/main.rs:5:5
  |
4 |     let v = Vector { x:4.0, y:3.0, z:5.0 };
  |         - help: consider changing this to be mutable: `mut v`
5 |     v.x = 2.0;
  |     ^^^^^^^^^ cannot assign

error: aborting due to previous error

Once we change the line that starts with let v = ... as let mut v = ... our code would compile, and print out vector is (2, 3, 5) when run. Just like how things are private by default, they are immutable by default. If you want mutability, you need to explicitly declare a value as mutable.

Instance Methods

If we have a struct named Foo, then any methods defined in an implementation named Foo (impl Foo {) are associated with the instances of this struct. We can also implement traits for a type (struct), but let us discuss instance methods for now.

There are four types of methods we can define:

  • Static method.
  • Instance method that consumes (self).
  • Instance method that takes a read-only reference (&self).
  • Instance method that takes a mutable reference (&mut self).

We have already seen some examples to static methods (new & up above). Note that static methods are not limited to factory methods and constructors.

Instance methods that take a reference should be familiar to those who program in popular object oriented languages. Basically the method function borrows the objects during the call.

impl Vector {
    pub fn length(&self) -> f64 {
        (self.x * self.x + self.y * self.y + self.z * self.z).sqrt()
    }
}

Code above calculates the length of a vector. It does not consume or mutate the instance. The float that is calculated is Copy so there is no point of returning a reference here. Hence the return type is f64 instead of &f64. What if we wanted a function to return some type that is not Copy? For this let us look at another struct:

pub struct Word {
    pub s: String
}

impl Word {
    pub fn initial(&self) -> &str {
        &self.s[0..1]
    }
}

In the example above lifetime of the returned string slice (&str) is tied to the lifetime of &self. In other words the slice cannot live longer than the read-only reference passed to initial. We can then use this method as below:

let w = Word { s: "Excelsior".to_owned() };
println!("initial is {}", w.initial());

Getting back to our Vector struct, if we tried to create a similar method like below, it would not compile:

pub fn length_str(&self) -> &str {
    &format!("{}", self.length()).as_str()
}

format! macro creates a new String method. Then we are trying to return a reference to this string outside of the method. Borrow checker does not allow this as the string would be dropped once the method has finished executing. Luckily in this case what we are trying to do above is wrong, since the string is already created (on stack) we can just move it out of the method instead of trying to return a reference:

pub fn length_str(&self) -> String {
    format!("{}", self.length())
}

Then why did we return a reference from Word::initial? We could have also returned a String from initial, but it would mean creating a duplicate of the initial letter. For small footprint data cloning this way may be preferred, but for large footprint data referencing is probably better. In case of Vector::length_str we cannot avoid creating a new String, so that one is an easy decision.

In Vector::length_str we are calling another method on Vector, length. This is allowed because both methods take a read-only self-reference. Rust’s borrow checker does not allow creating a mutable reference from an immutable (read-only) one, but a using a mutable reference you can call methods that take a read-only reference.

Other than the limitation that there can only be one mutable reference (no aliasing), methods that take a mutable reference are no different than methods that take a read-only reference. Here is an example to illustrate using a mutable reference to call a method that take a read-only reference:

pub fn normalize(&mut self) {
    let l = self.length();
    self.x /= l;
    self.y /= l;
    self.z /= l;
}

Please ignore the division by zero issue in the code above.

If you have any questions, suggestions or corrections feel free to drop me a line.