blog.zgtm.de

Robert's blog

On rust's memory management

Mainly for C/C++ programmers

Robert, 31 January 2016

In this posting, I want to show how the rust memory management works: What is a reference, what is a borrow and a move, and what happens in my memory?

This text is deliberately written for people knowing C or C++. I assume knowledge of stack and heap and what a pointer is.

This text won't talk about lifetimes in rust. Especially, everything written here will in general not hold true for static variables. Maybe a follow-up posting will cover lifetimes.

Table of contents

Introduction

When I first heard of the rust language some months ago, I found it very interesting. But I did not really grasp what the new ideas in rust are.

Eventually, I found it was time, to give this language a try. So I found this talk on Youtube, whatched it, and was really excited. Trying to write some small programms, of course I found the rust book together with the language reference on the homepage.

However, reading the book left some questions: What does borrowing and moving really do? How do I allocate memory on the stack and on the heap? Which operations are cheap, which are expensive? (I.e. how smart are the pointers?)

So I bought the book "Rust Essentials" (Packt Publishing, 2015) online. This was a horrible mistake; I felt like knowing even less than before. (Seriously, I can't recommend this book. Amongst its flaws: It is not well-written, leaves many things unclear, and contains a lot of plain errors.)

In the end, I decided trying to find these things out myself. Many things can also be found in the book or in the reference. Most of the things, however, I have found by notoriously nagging the compiler. In the following text, I want to share my findings.

Notation
I will use the following naming convention through the whole text: All other variables will be declared in the corresponding sections.

Primitive types

First, lets focus on the primitive types of rust. A variable of primitive type is created as easy as

let a : type;

where type denotes the type. This declaration creates a new variable of this type on the stack. (Ignoring special cases like static variables!)

What are the primitive types in rust?

First of all, in rust, there are the primitive data types. Those are:
bool, char, i8, i16, i32, i64, u8, u16, u32, u64, isize, usize, f32 and f64

The i-types are (signed) integers of 1, 2, 4 and 8 bytes respectively, the u-types are unsigned integers of 1, 2, 4 and 8 bytes respectively, the f-types are floating point numbers of 4 and 8 bytes respectively (aka. float and double), and finally isize and usize are signed and unsigned integers of the size equal to the size of memory addresses (native size of the machine, 8 bytes on 64-bit machines.)

Alternativly, one can ommit the type if the type can be inferred. In most cases, rust can infer the type from the initialisiation of a variable:

let a;      // Declaration
a = value;  // Initialisation
or the shorthand
let a = value;  // Declaration &
                // initialisation

We can copy the values of primitive data types around as we please:

let i = 7;  // i becomes the type i32
let j = i;  // j also becomes i32
let k = i;  // k also becomes i32

However, we can not modify the variables:

i = 8;  // i is not mutable

beclause in order to modify, we need to declare them to be mutable using the keyword mut:

let mut m = 7;  // m has value 7
m = 8;          // m now has value 8

In this case m has the type mut i32.

Borrows

The first new concept in rust's memory management is the borrow.

In order to borrow something, we need the &-operator. This is just the address-of-operator you know from C/C++: It gives you the address of the variable you apply it to. Thus, it gives you a pointer to this variable. However, in rust, it is called borrowing-operator because the rust pointers (borrows) have some additional properties.

What is a pointer?

Every variable in a programm occupies some bytes of memory. An i32 integer in rust, e.g., occupies four bytes of memory. Thus, the variable has a memory address; the address specifying exactly where in memory the variable is located. This memory address is called a pointer. We say that the pointer points to the variable that is located at this memory address.

Let's say we store the value 7 at the memory address 0x001000. Then 0x001000 is a pointer and it points to the value that is stored at the address 0x001000, which in this case is 7.

Of course, we can assign this pointer to a variable, which then itself is stored in memory. By taking its memory address we get a pointer to a pointer.

In the example above, we can now take the pointer 0x001000 and store it at another memory address, say 0x001004. Then 0x001004 is a pointer and it points to the value that is stored at the address 0x001004, which in this case is 0x001000, which is another pointer pointing to the value 7.

This game can be played on, to get pointers to pointers to pointers … You get the idea!

The resulting pointer from a borrow can be stored as easy as pie:

let b = &i;  // b is a borrow of i 
             // (b contains a pointer to i)

Assuming i to be a i32 integer, b now is of type &i32 (Pointer to i32, or as rust people say: Reference to i32).

Caution: This is in contrast to C/C++, where now the type of b would be denoted by i32*. But really, the same thing is heappening here as well.

However, with this borrow b we can only read the variable a. We can not modify a, nor can we borrow something else with b. So the following will not work:

let mut m = 7;
let mut n = 9;
let b = &m;
b = &n;   // b not mutable
*b = 29;  // b not a mutable borrow

The first will of course fail, because we did not declare b to be mutable. The second, however, fails, because we did not declare b to be a mutable borrow. (More on the dereferencing operator * in the Dereferencing.)

If we want to modify the value of a through the borrow b, we need to make it mutable borrow:

let b = &mut m;  // b is a mutable borrow of m

If we on the other hand want to change what we want to borrow using b, we need to make b mutable:

let mut b = &i;  // b is a borrow of i and
                 // b is mutable

Of course, we can do both: Make b a mutable borrow that is mutable:

let mut b = &mut m;  // b is a mutable borrow
                     // of m and b is mutable

I want to emphasize here, that there is a crucial difference between a mutable borrow and a borrow that is mutable. It is the same difference, C/C++ people know as const pointer (T* const) and pointer to const (T const*).

A mutable borrow is a borrow through which the borrowed value can be changed. It therefore has to be a borrow of something that actually is mutable.

A borrow that is mutable, however, is a borrow that can be changed: The same name can be used to borrow something else later in the code.

(In the second case, the word borrow seems odd to me. How can the same borrow first be a borrow of one thing and later a borrow of something else? Thus, I like tho think of it more as a "pointer".)

Of course a borrow can be both: A mutable borrow that is mutable.

Restrictions for borrowing of mutable variables

As soon we borrow something that is mutable, however, rust becomes alert. It does that in the following way:

This has the following effects: We can borrow anything as many times as we want as long as we don't mutable borrow. We can mutable borrow anything as long as it is currently not borrowed elsewhere. We can mutable borrow things at most once.

Additionaly:

What moving means is explained below in Moving.

Borrow of values not in memory

Values (such as 7 or &a) usually don't have a memory address. However, in rust, you can borrow them anyway. This make statements like

&7
&&a // equivalent to &(&a)

perfectly valid expressions in rust. In C/C++, it would not be possible to do that. This has to do with C/C++ differentiating between lvalues and rvalues. In rust, however, everything is an expression that can be placed in memory if neccessary.

(I am not totally sure how rust does this. My guess is, that rust copies the value into memory as soon as you want to borrow it. From looking at the pointer value, I guess it might be somewhere on or near the stack.)

Dereferencing

Together with the borrowing-operator, there also is of course a dereferencing-operator. As in C/C++, it is the *-operator; It takes a pointer and gives you the thing it points to.

What is derefencing?

When we've got a pointer, we will eventually access the variable, the pointer points to. Otherwise we couldn't do much with pointers (except maybe compare them). Thus, we need to take the pointer (which, remember, is a memory address), go to that memory address and access the data that is stored at that address. This (going from a pointer to what it points to) is called derefencing.

With the derefencing operator, we can access variables via a borrow:

let b = &mut a;
*b = 29;         // Changes a to 29
Automatic dereferencing

There are some situation, in which dereferencing happens automatically. One of them is in using the println! macro. Another one is in using function pointers (see Functions).

Printing the value of a borrow using the println! macro, e.g. with

println!("{}", b);

will not print the memory address, that is stored in the borrow b. In fact, per default println! will never print borrows. Instead it will derefence as many times as necessary, until it arrives at something it will actually print.

This is, because println! can print everything for which the Display trait is implemented. Indeed, the Display trait is implemented generically for all references to types that implement the Display trait by just dereferencing the reference.

If you want to print the memory address stored in a borrow, you need to use the :p flag for pointer formating:

println!("{:p}", b);

The same holds for the print! macro. The formating flags are documented in the fmt section of the rust documentation.

By the way: Also, borrows of functions are automatically dereferenced, when being called. However, this is another mechanism called deref coercion. More details can be found in Functions.

Moving

Borrows can not be copied as easily as primitive data types. This is because the rust compiler enforces some restrictions on copying borrows that make sure that the borrowing restrictions are fulfilled and that borrows do not extend the lifetime of the borrowed object.

Immutable borrows can be copied without restrictions. (Effectively, copying an immutable borrow is the same as to explicitely make an new borrow, which is allowed within the borrowing rules, cf. Restrictions for borrowing of mutable variables.)

However, as soon as a mutable borrow is copied, the process is no longer called copying but moving. For example:

let b = &mut m;
let c = b;     // Move b to c
*b = 3;        // b can no longer be used.

Moving is the transfer of ownership in rust. The old borrow can no longer be used. Instead, the new borrow has to be used in order to access the data. The same is true for the Box type (here independent of mutability):

let mut x = Box::new(12);
let y = x;     // Move x to y
*x = 5;        // x can no longer be used.

This concept of ownership and the transfer of ownership in rust makes sure that no two threads of the programm have concurrent write access to memory. If concurrent write access is needed, the atomic reference counted smart pointer Arc can be used.

Smart pointers

C++ programmers knowing smart pointers may think: "Wait! Isn't borrowing and moving similar to having a smart pointer?"

This is only partly correct.

What rust's borrows have in common with smart pointers like C++'s unique_ptr is that they both have move semantics, that is, as soon as you move the pointer to a new one, you no longer can use the old one.

But still, rust's borrows are no objects, but instead just ordinary pointers. They do not care for memory allocation or freeing. However, the language takes care that borrowing and moving are limited! This can help the programmer to avoid common mistakes.

On the other hand, there are also real smart pointers in rust. Smart pointers care for memory allocation on the heap when created and for memory freeing when it is no longer needed.

Furthermore, there are smart pointers that can store more than one object, like the vector object Vec<T>.

Here, a small comparision of rust's pointer types and their C++ equivalents:

rustC++
Pointer type:&TT*
Smart pointers: Box<T>unique_ptr<T>
Rc<T> / Arc<T>shared_ptr<T>
Weak<T>weak_ptr<T>
Multiple Objects:Vec<T>vector<T>

Heap

So far, most of the variables we have seen, have been placed on the stack (implicitely). However, sometimes, you will want to allocate space for your data on the heap. This can either be, because you want to use the data after returning from the function (and it is too big, to just be returned) or because you do not know, how much space you will need at compile time.

To create a variable on the heap, you just need to use one of the smart pointer types seen above. The simplest one is just a Box. It can be created by using the new-constructor of the Box type:

let x = Box::new(5);  // x is of type Box<i32>

This creates a variable on the heap and stores the pointer to that variable in x. The value can be accessed, using the dereferencing operator, just as with normal pointers.

To modify the value in a box, it has to be declared as mutable box using

let mut x = Box::new(8);  // creates mutable Box

(Here, the keyword mut makes the Box object and the content of the Box mutable at the same time. Unfortunately, there seems to be no way to only make one of both mutable like it is possible for borrows.) Then we can change the value as in

*x = 7

Although this pointer behaves like a normal borrow, it is not a borrow of the type &i32, but a box of the type Box<i32>. This is a smart pointer type. That means that it behaves like a pointer: It can be dereferenced like a pointer and is moved instead of being copied. (C++ programmers might know about smart pointers, especially those of the standard library: unique_ptr and shared_ptr.)

We can obtain the address of the value on the heap, by borrowing the dereferenced box object:

let b = &(*x);  // b is of type &i32
or equivalently (and two characters shorter):
let b = &*x;  // same as &(*x)

When printing a Box as a pointer, again, magic happens:

println!("{:p}", x);

will yield the same result as

println!("{:p}", &*x);

This is because x has been automatically dereferenced and borrowed.

Box being a smart pointer makes sure, that the allocated memory is freed as soon as it is no longer needed. The concept of ownership in Box objects makes sure that a Box always points to some valid and allocated place memory.

Other kinds of smart pointers in rust are Rc<T> and Arc<T>. These are the smart pointer with reference counting and smart pointers with atomic reference counting, respectively. The latter is necessary, to have multiple threads mutually access the same shared data. For that, it requires the data it points to to be thread safe as well.

With theses type, we can only place a single object on the heap. If we need an arbitrary number of objects on the heap, Vec is probably the first choice. It works similar to C++'s std::vector class and maintains it data on the heap.

let mut v = Vec::new();
v.push(83);  // appends 83 to the vector 

Here, the type of v has been deduced by the rust compiler to be Vec<i32> from the push statement!

One thing that does not work in rust as easily as in C/C++ is dynamically allocating a fixed size array on the heap. Where in C++ you could use new[], in rust you will have to use Vec<T> for that (or to write your own type). In order to allocate space for n elements in a vector, the macro vec![value; n] can be used, where value denotes the initial value for the elements.

Under certain circumstances, it might be necessary to handle memory allocation and freeing yourself. For that, the implementation of the Box or the Vec type should serve as simple examples.

Strings

There are two types of string in rust, comparable to C-strings and the C++ std::string type, respectively.

String literals

The usual strings in rust are string literals. They are enclosed with double quotes.

let s = "I am a string";

The type of a string literal is &str. (Actually &'static str, cf. the rust book on lifetimes.) Unlike the type name suggest, a &str is not just a pointer to the first character of the string litaral in the memory. Instead, it consists of this very pointer and the string length. Thus, a &str is twice as long as a normal borrow.

String literals (their content) are not necessary placed on the stack. Furthermore, string literals in rust are always immutable.

Unlike in C/C++, the string size is not necessarily equal to the number of characters in the string, due to the strings being UTF-8 encoded. For more information on character access in string literals, I refer to section on strings indexing in the manual.

String objects

String objects in rust more similar to the more powerful string objects of other programming languages (e.g. C++'s std::string). Their type is naturally String. They are full grown objects, that store their content on the heap. Essentially, they are smart pointers similar to Vec.

Empty string objects can be constructed by the constructor String::new().

let o = String::new();
Nonempty strings objects can be constructed by the constructor String::from(&str) or by just converting string literals using the to_string()-method of the string literal:
let p = String::from("Hello world!");
let p = "Some Text".to_string();

A string literal can be appended to a string:

let q = p + " more text";

Because this is a move, s can now no longer be accessed:

println!("{}", p);  // p has been moved

Also, it is not possible to append a string object to a string literal:

let q = "More text and " + p;  // not allowed

Of course, a string objects contains a &str that is placed on the heap. This &str can be obtained explicitely using

let s = &*p;  // equvivalent to &(*p)

or implicitely using deref coercions to convert a &String to &str using

let s : &str = &p;

Arrays / Structures / Tuples / Enums

Arrays, structures, tuples and enums are all created on the stack if not explicitely created on the heap by using one of the smart pointer types.

In general, the pointer to the array, structure, or tuple will coincide with the pointer to the first element:

let ar = [5,3,7];           // &ar == &ar[0]
let t = (2,8,9);            // &t == &t.0
struct Str { s1: i32,
             s2: i32   };
let s = Str{ s1: 5, s2: 8 };  // &s == &s.s1

However, this is not garantueed by the rust language. (If you need that, use #[repr(C)].)

Also, this is not the case for enum types, comprehensibly because rust needs to store information on the current variant of the enum type.

Functions

Unlike in C/C++, functions are not directly interpretable as pointers to their memory location in the programm. However we can take the & operator to obtain this address.

Like in C/C++, borrows of functions can be called as if they were functions:

fn fun() {
    println("Hello");
}
(&fun)();  // Prints "Hello"

This is a more general feature of rust, called deref coercion (automatic dereferencing). It has the effect, that we can also call functions using pointers to pointers to functions or ever more:

(&&&fun)();  // Also prints "Hello"

Additionally, since functions are so called first class citizen in rust, function can also be copied.

When a function is copied, internally a pointer to a function is copied. Pointers to functions can be used like functions the same way as in C/C++:

funptr = fun;
funptr();      // Also prints "Hello"

The function pointer is of the same type as the function it points to. (See also the rust book on function pointers.)

Can I abuse memory management like in C?

Yes. If you really need to, you can use raw pointers without restrictions in the same way as you can in C/C++. For this, I am going to redirect you to the manual: Unsafety, Pointer types.

Summary

So in summary, we have learned the following things on rust's memory management:

Additionally, there are several restiction, on what you can do with a borrow:

Fin

I want to thank EsGeh for proofreading this posting and his many helpful comments and suggestions for the text.

If you have found something that is wrong or unclear, feel free to write me an email via blogspam@zgtm.de or just write a comment below.

Updates

7 February 2016: Corrected some typos. Changed description of Arc from smart pointer with reference counting and atomic access to smart pointer with atomic reference counting

Comments

Show comments / write comment …

The first posting

Robert, 21. January 2016

Welcome :)

Welcome to my blog. My name is Robert. I am a physicist from Göttingen, Germany.

I enjoy programming and reading articles about programming languages.

In this blog, I want to post everything that I will find worth sharing. If you like something, feel free to make a comment and/or share it. You can always email me via blogspam@zgtm.de.

Comments

Show comments / write comment …

Impressum