High-Performance Key-Value Storage on iOS (Part II)

Yanbo Sha
8 min readMar 21, 2023

--

Photo by James Wheeler on Unsplash

It’s a three-part series about how to build high-performance key-value storage. This is the second part. you can check other parts: first part, final part.

Look at the remap method which is called in the initializer, we open the local file or create one if not exist. we use ftruncate to set the file size to an exact value. after that, we call the mmap method to ask the system to allocate the specific size of memory.

the return type of the mmap is UnsafeMutableRawPointer which is like byte[] in C.
UnsafeMutableRawPointer represents a block of the memory area where we can use it to access and manipulate. Not like Data type, UnsafeMutableRawPointer doesn’t have length property. we have to use extra property to record its length, so we define a capacity for it. Also, UnsafeMutableRawPointer is a pointer that we need to move forward and backward to access memory, so we need to keep another property called original to refer to the starting point all the time.

What should we do after remap? We need to sync the local file into the memory cache. so we define another method called reload.

init(_ filePath: String, capacity: Int = 10 * 1024) {
self.capacity = capacity
self.filePath = filePath

remap()
reload()
}
private func reload() {

guard var buffer = buffer, let original = original else {
return
}

/// get data len from data_len area
let total = original.withMemoryRebound(to: UInt64.self, capacity: 8) { pointer in
return pointer.pointee
}

buffer = original
buffer += 8

while original.distance(to: buffer) < total + 8 {

let keySizeData = Data(bytes: UnsafeRawPointer(buffer), count: 8)
let keySize = keySizeData.withUnsafeBytes { $0.load(as: UInt64.self) }
buffer += 8

let dataSizeData = Data(bytes: UnsafeRawPointer(buffer), count: 8)
let dataSize = dataSizeData.withUnsafeBytes { $0.load(as: UInt64.self) }
buffer += 8

let keyData = Data(bytes: UnsafeRawPointer(buffer), count: Int(keySize))
buffer += .Stride(keySize)

if let key = String(data: keyData, encoding: .utf8) {
let data = Data(bytes: UnsafeRawPointer(buffer), count: Int(dataSize))
memoryCache[key] = data
}
buffer += .Stride(dataSize)
}
}

Let’s look at the reload method. We start reading from the 0 position of the buffer and move forward until reaching the end which is recorded in the head part.

we read the exact file size from the first 8 bytes (head). and move the pointer to the starting point of the body part. then read records one by one.

It’s worth noting that we use distance API to calculate the offset from the starting point and we need to move the pointer by calling += constantly after reading the expected data. When we finished the process, the pointer stops at the position where we can append new records.

Write Record

Now, we need to complete the core method: append. We support many types for the Goose, so we use the more common type of Data to record them like memory cache. All of the other types will be transformed into the Data type before this method is called. Here we define the append like below:

func append(_ data: Data, for key: String) {
/// TODO: Implementation
}

But we often use key and data together, so I define a simple struct wrapping them together. It’s called BodyRecord:

extension Goose {
private struct BodyRecord {
let key: String
let data: Data
}
}

Then, we complete this method:

private func append(_ record: BodyRecord) {
guard let keyData = record.key.data(using: .utf8) else {
return
}

/// cache it for fast check
memoryCache[record.key] = record.data

guard var buffer = buffer else {
return
}
/// make sure the capacity is enough
ensureCapacity(BodyRecord(key: record.key, data: record.data))

/// mmap: key_size
buffer.storeBytes(of: UInt64(keyData.count), as: UInt64.self)
buffer += 8

/// mmap: value_size
buffer.storeBytes(of: UInt64(record.data.count), as: UInt64.self)
buffer += 8

/// mmap: key
keyData.withUnsafeBytes { rawBuffer in
rawBuffer.forEach { char in
buffer.storeBytes(of: char, as: UInt8.self)
buffer += 1
}
}

/// mmap: value
record.data.withUnsafeBytes { rawBuffer in
rawBuffer.forEach { char in
buffer.storeBytes(of: char, as: UInt8.self)
buffer += 1
}
}

/// mmap: update data_len
increaseTotal(UInt64(8 + 8 + keyData.count + record.data.count))
}
private func increaseTotal(_ total: UInt64) {
original?.withMemoryRebound(to: UInt64.self, capacity: 8) { pointer in
pointer.pointee += total
}
}

It’s a little longer, but the comment is clear.

The first thing is to cache it into the memory. Then, use the storeBytes API to copy the data into the buffer, don’t forget the update the head part at the end. It’s worth noting that we can’t use Data in the storeBytes directly because the Data type holds the real content in the heap and storeBytes only copy the stack memory that a variable occupied into the buffer. so we can use storeBytes for Int, Bool, and Float but not for Data, String.

The ensureCapacity API will be detailed below, here we just need to know it’s used to make sure we have enough room for the new record, it might compact or resize the memory area.

Store Method

Okay, append method is done, we can complete all of the store methods:

extension Goose {

private func storePrimitive<T>(_ object: T?, for key: String) {
var obj = object
let data = Data(bytes: &obj, count: MemoryLayout<T>.size)
append(BodyRecord(key: key, data: data))
}

/// Signed
func store(_ object: Int?, for key: String) { storePrimitive(object, for: key) }
func store(_ object: Int8?, for key: String) { storePrimitive(object, for: key) }
func store(_ object: Int16?, for key: String) { storePrimitive(object, for: key) }
func store(_ object: Int32?, for key: String) { storePrimitive(object, for: key) }
func store(_ object: Int64?, for key: String) { storePrimitive(object, for: key) }

/// Unsigned
func store(_ object: UInt?, for key: String) { storePrimitive(object, for: key) }
func store(_ object: UInt64?, for key: String) { storePrimitive(object, for: key) }
func store(_ object: UInt32?, for key: String) { storePrimitive(object, for: key) }
func store(_ object: UInt16?, for key: String) { storePrimitive(object, for: key) }
func store(_ object: UInt8?, for key: String) { storePrimitive(object, for: key) }

/// Decimal
func store(_ object: Float?, for key: String) { storePrimitive(object, for: key) }
func store(_ object: Double?, for key: String) { storePrimitive(object, for: key) }
func store(_ object: CGFloat?, for key: String) { storePrimitive(object, for: key) }

/// Anything else (simple type)
func store<T>(_ object: T?, for key: String) { storePrimitive(object, for: key) }

/// Bool
func store(_ object: Bool?, for key: String) { storePrimitive(object, for: key) }
}

extension Goose {

func store<T>(_ object: T?, for key: String) where T : Codable {
if let object = object, let data = try? JSONEncoder().encode(object) {
append(BodyRecord(key: key, data: data))
}
}

func store(_ object: Data?, for key: String) {
if let object = object {
append(BodyRecord(key: key, data: object))
}
}

func store(_ object: String?, for key: String) {
if let object = object, let data = object.data(using: .utf8) {
append(BodyRecord(key: key, data: data))
}
}
}

Swift offers a handy syntactic sugar. we can easily add & immediately before the variable to convert it into UnsafeRawPointer.

Therefore, for all of the Primitive Types excluding data and string, we can easily build the Data with the &object and its length obtained with the MemoryLayout which is used to calculate how many bytes are used in the stack.

For the String, we can easily convert it into a Data type in the utf8 encoding format. For the Codable instance, we can use JSONEncoder to convert them into a serializable string and convert the result into the Data type.

Obtain Method

Okay, we’ve finished store methods. the obtain methods are a lot easier:

extension Goose {

private func obtainPrimitive<T>(for key: String) -> T? {
if var data = memoryCache[key] {
return data.withUnsafeMutableBytes { buffer in
return buffer.load(as: T.self)
}
}
return nil
}

/// Signed
func obtain(for key: String) -> Int? { obtainPrimitive(for: key) }
func obtain(for key: String) -> Int8? { obtainPrimitive(for: key) }
func obtain(for key: String) -> Int16? { obtainPrimitive(for: key) }
func obtain(for key: String) -> Int32? { obtainPrimitive(for: key) }
func obtain(for key: String) -> Int64? { obtainPrimitive(for: key) }

/// Unsigned
func obtain(for key: String) -> UInt? { obtainPrimitive(for: key) }
func obtain(for key: String) -> UInt8? { obtainPrimitive(for: key) }
func obtain(for key: String) -> UInt16? { obtainPrimitive(for: key) }
func obtain(for key: String) -> UInt32? { obtainPrimitive(for: key) }
func obtain(for key: String) -> UInt64? { obtainPrimitive(for: key) }

/// Decimal
func obtain(for key: String) -> Float? { obtainPrimitive(for: key) }
func obtain(for key: String) -> CGFloat? { obtainPrimitive(for: key) }
func obtain(for key: String) -> Double? { obtainPrimitive(for: key) }

/// Bool
func obtain(for key: String) -> Bool? { obtainPrimitive(for: key) }

/// Anything else (simple type)
func obtain<T>(for key: String) -> T? { obtainPrimitive(for: key) }
}

extension Goose {

/// Codable including codable Array & codable Dictionary
func obtain<T>(for key: String) -> T? where T : Codable {
if let data = memoryCache[key] {
return try? JSONDecoder().decode(T.self, from: data)
}
return nil
}

/// String
func obtain(for key: String) -> String? {
if let data = memoryCache[key] {
return String(data: data, encoding: .utf8)
}
return nil
}

/// Data
func obtain(for key: String) -> Data? {
memoryCache[key]
}
}

We just need to obtain the value of the Data type from the memory cache and convert the Data to the specific type accordingly.

EnsureCapacity Method

The last thing is to complete the ensureCapacity method mentioned above. let me show the code first:

private func ensureCapacity(_ record: BodyRecord) {

guard !available(record) else {
return
}

/// compact at first to make room for the new data
rewrite()

if available(record) {
return
}

// grow size if there's no enough room
munmap(original, capacity) /// reset
capacity *= 2 /// resize
remap() /// remap
rewrite() /// rewrite
}
private func rewrite() {

/// reset
memset(original, 0, capacity)

/// reset buffer to the head
buffer = original

/// jump to the first location after header
buffer? += 8

/// write one by one from the memory cache
for (key, data) in memoryCache {
append(BodyRecord(key: key, data: data))
}
}
private func available(_ record: BodyRecord) -> Bool {

guard let buffer = buffer, let original = original else {
return false
}

guard let keyData = record.key.data(using: .utf8) else {
return false
}

let remaining = capacity - original.distance(to: buffer)
let increment = 8 + 8 + keyData.count + record.data.count
return remaining >= increment
}

Here available and rewrite is only used in ensureCapacity.

The available method is easy to understand. let’s look at the rewrite. as the ensureCapacity wrote, we call the rewrite if there’s not enough room for the new record. What does it do?

Rewrite Method

Let’s think back to the storage strategy. There may be many records referring to the same key and all of the previous records are actually useless. so we can reset the memory area and sync the memory cache into the mmap area to record the up-to-date record only. After that, we need to check again to make sure it has enough room for the new record. otherwise, we need to resize the memory area by doubling the size.

Homework

That’s all, you can access the completed project: Goose. Before that, I hope you can complete the homework: Delete Action. It just needs a few codes to complete it.

--

--

Yanbo Sha

iOS Programmer with 10 years of experience, interested in the latest technique, previously working at Bilibili