It’s a three-part series about how to build high-performance key-value storage. This is the second part. you can check other parts: first part, final part.
Look at the remap
method which is called in the initializer, we open the local file or create one if not exist. we use ftruncate
to set the file size to an exact value. after that, we call the mmap
method to ask the system to allocate the specific size of memory.
the return type of the mmap
is UnsafeMutableRawPointer which is like byte[] in C.
UnsafeMutableRawPointer represents a block of the memory area where we can use it to access and manipulate. Not like Data type, UnsafeMutableRawPointer doesn’t have length
property. we have to use extra property to record its length, so we define a capacity
for it. Also, UnsafeMutableRawPointer is a pointer that we need to move forward and backward to access memory, so we need to keep another property called original
to refer to the starting point all the time.
What should we do after remap
? We need to sync the local file into the memory cache. so we define another method called reload
.
init(_ filePath: String, capacity: Int = 10 * 1024) {
self.capacity = capacity
self.filePath = filePath
remap()
reload()
}
private func reload() {
guard var buffer = buffer, let original = original else {
return
}
/// get data len from data_len area
let total = original.withMemoryRebound(to: UInt64.self, capacity: 8) { pointer in
return pointer.pointee
}
buffer = original
buffer += 8
while original.distance(to: buffer) < total + 8 {
let keySizeData = Data(bytes: UnsafeRawPointer(buffer), count: 8)
let keySize = keySizeData.withUnsafeBytes { $0.load(as: UInt64.self) }
buffer += 8
let dataSizeData = Data(bytes: UnsafeRawPointer(buffer), count: 8)
let dataSize = dataSizeData.withUnsafeBytes { $0.load(as: UInt64.self) }
buffer += 8
let keyData = Data(bytes: UnsafeRawPointer(buffer), count: Int(keySize))
buffer += .Stride(keySize)
if let key = String(data: keyData, encoding: .utf8) {
let data = Data(bytes: UnsafeRawPointer(buffer), count: Int(dataSize))
memoryCache[key] = data
}
buffer += .Stride(dataSize)
}
}
Let’s look at the reload
method. We start reading from the 0 position of the buffer and move forward until reaching the end which is recorded in the head part.
we read the exact file size from the first 8 bytes (head). and move the pointer to the starting point of the body part. then read records one by one.
It’s worth noting that we use distance API to calculate the offset from the starting point and we need to move the pointer by calling +=
constantly after reading the expected data. When we finished the process, the pointer stops at the position where we can append new records.
Write Record
Now, we need to complete the core method: append
. We support many types for the Goose, so we use the more common type of Data to record them like memory cache. All of the other types will be transformed into the Data type before this method is called. Here we define the append
like below:
func append(_ data: Data, for key: String) {
/// TODO: Implementation
}
But we often use key and data together, so I define a simple struct wrapping them together. It’s called BodyRecord:
extension Goose {
private struct BodyRecord {
let key: String
let data: Data
}
}
Then, we complete this method:
private func append(_ record: BodyRecord) {
guard let keyData = record.key.data(using: .utf8) else {
return
}
/// cache it for fast check
memoryCache[record.key] = record.data
guard var buffer = buffer else {
return
}
/// make sure the capacity is enough
ensureCapacity(BodyRecord(key: record.key, data: record.data))
/// mmap: key_size
buffer.storeBytes(of: UInt64(keyData.count), as: UInt64.self)
buffer += 8
/// mmap: value_size
buffer.storeBytes(of: UInt64(record.data.count), as: UInt64.self)
buffer += 8
/// mmap: key
keyData.withUnsafeBytes { rawBuffer in
rawBuffer.forEach { char in
buffer.storeBytes(of: char, as: UInt8.self)
buffer += 1
}
}
/// mmap: value
record.data.withUnsafeBytes { rawBuffer in
rawBuffer.forEach { char in
buffer.storeBytes(of: char, as: UInt8.self)
buffer += 1
}
}
/// mmap: update data_len
increaseTotal(UInt64(8 + 8 + keyData.count + record.data.count))
}
private func increaseTotal(_ total: UInt64) {
original?.withMemoryRebound(to: UInt64.self, capacity: 8) { pointer in
pointer.pointee += total
}
}
It’s a little longer, but the comment is clear.
The first thing is to cache it into the memory. Then, use the storeBytes API to copy the data into the buffer, don’t forget the update the head part at the end. It’s worth noting that we can’t use Data in the storeBytes directly because the Data type holds the real content in the heap and storeBytes only copy the stack memory that a variable occupied into the buffer. so we can use storeBytes for Int, Bool, and Float but not for Data, String.
The ensureCapacity
API will be detailed below, here we just need to know it’s used to make sure we have enough room for the new record, it might compact or resize the memory area.
Store Method
Okay, append
method is done, we can complete all of the store methods:
extension Goose {
private func storePrimitive<T>(_ object: T?, for key: String) {
var obj = object
let data = Data(bytes: &obj, count: MemoryLayout<T>.size)
append(BodyRecord(key: key, data: data))
}
/// Signed
func store(_ object: Int?, for key: String) { storePrimitive(object, for: key) }
func store(_ object: Int8?, for key: String) { storePrimitive(object, for: key) }
func store(_ object: Int16?, for key: String) { storePrimitive(object, for: key) }
func store(_ object: Int32?, for key: String) { storePrimitive(object, for: key) }
func store(_ object: Int64?, for key: String) { storePrimitive(object, for: key) }
/// Unsigned
func store(_ object: UInt?, for key: String) { storePrimitive(object, for: key) }
func store(_ object: UInt64?, for key: String) { storePrimitive(object, for: key) }
func store(_ object: UInt32?, for key: String) { storePrimitive(object, for: key) }
func store(_ object: UInt16?, for key: String) { storePrimitive(object, for: key) }
func store(_ object: UInt8?, for key: String) { storePrimitive(object, for: key) }
/// Decimal
func store(_ object: Float?, for key: String) { storePrimitive(object, for: key) }
func store(_ object: Double?, for key: String) { storePrimitive(object, for: key) }
func store(_ object: CGFloat?, for key: String) { storePrimitive(object, for: key) }
/// Anything else (simple type)
func store<T>(_ object: T?, for key: String) { storePrimitive(object, for: key) }
/// Bool
func store(_ object: Bool?, for key: String) { storePrimitive(object, for: key) }
}
extension Goose {
func store<T>(_ object: T?, for key: String) where T : Codable {
if let object = object, let data = try? JSONEncoder().encode(object) {
append(BodyRecord(key: key, data: data))
}
}
func store(_ object: Data?, for key: String) {
if let object = object {
append(BodyRecord(key: key, data: object))
}
}
func store(_ object: String?, for key: String) {
if let object = object, let data = object.data(using: .utf8) {
append(BodyRecord(key: key, data: data))
}
}
}
Swift offers a handy syntactic sugar. we can easily add & immediately before the variable to convert it into UnsafeRawPointer.
Therefore, for all of the Primitive Types
excluding data and string, we can easily build the Data with the &object and its length obtained with the MemoryLayout which is used to calculate how many bytes are used in the stack.
For the String
, we can easily convert it into a Data type in the utf8 encoding format. For the Codable
instance, we can use JSONEncoder to convert them into a serializable string and convert the result into the Data type.
Obtain Method
Okay, we’ve finished store methods. the obtain methods are a lot easier:
extension Goose {
private func obtainPrimitive<T>(for key: String) -> T? {
if var data = memoryCache[key] {
return data.withUnsafeMutableBytes { buffer in
return buffer.load(as: T.self)
}
}
return nil
}
/// Signed
func obtain(for key: String) -> Int? { obtainPrimitive(for: key) }
func obtain(for key: String) -> Int8? { obtainPrimitive(for: key) }
func obtain(for key: String) -> Int16? { obtainPrimitive(for: key) }
func obtain(for key: String) -> Int32? { obtainPrimitive(for: key) }
func obtain(for key: String) -> Int64? { obtainPrimitive(for: key) }
/// Unsigned
func obtain(for key: String) -> UInt? { obtainPrimitive(for: key) }
func obtain(for key: String) -> UInt8? { obtainPrimitive(for: key) }
func obtain(for key: String) -> UInt16? { obtainPrimitive(for: key) }
func obtain(for key: String) -> UInt32? { obtainPrimitive(for: key) }
func obtain(for key: String) -> UInt64? { obtainPrimitive(for: key) }
/// Decimal
func obtain(for key: String) -> Float? { obtainPrimitive(for: key) }
func obtain(for key: String) -> CGFloat? { obtainPrimitive(for: key) }
func obtain(for key: String) -> Double? { obtainPrimitive(for: key) }
/// Bool
func obtain(for key: String) -> Bool? { obtainPrimitive(for: key) }
/// Anything else (simple type)
func obtain<T>(for key: String) -> T? { obtainPrimitive(for: key) }
}
extension Goose {
/// Codable including codable Array & codable Dictionary
func obtain<T>(for key: String) -> T? where T : Codable {
if let data = memoryCache[key] {
return try? JSONDecoder().decode(T.self, from: data)
}
return nil
}
/// String
func obtain(for key: String) -> String? {
if let data = memoryCache[key] {
return String(data: data, encoding: .utf8)
}
return nil
}
/// Data
func obtain(for key: String) -> Data? {
memoryCache[key]
}
}
We just need to obtain the value of the Data type from the memory cache and convert the Data to the specific type accordingly.
EnsureCapacity Method
The last thing is to complete the ensureCapacity
method mentioned above. let me show the code first:
private func ensureCapacity(_ record: BodyRecord) {
guard !available(record) else {
return
}
/// compact at first to make room for the new data
rewrite()
if available(record) {
return
}
// grow size if there's no enough room
munmap(original, capacity) /// reset
capacity *= 2 /// resize
remap() /// remap
rewrite() /// rewrite
}
private func rewrite() {
/// reset
memset(original, 0, capacity)
/// reset buffer to the head
buffer = original
/// jump to the first location after header
buffer? += 8
/// write one by one from the memory cache
for (key, data) in memoryCache {
append(BodyRecord(key: key, data: data))
}
}
private func available(_ record: BodyRecord) -> Bool {
guard let buffer = buffer, let original = original else {
return false
}
guard let keyData = record.key.data(using: .utf8) else {
return false
}
let remaining = capacity - original.distance(to: buffer)
let increment = 8 + 8 + keyData.count + record.data.count
return remaining >= increment
}
Here available
and rewrite
is only used in ensureCapacity
.
The available
method is easy to understand. let’s look at the rewrite
. as the ensureCapacity
wrote, we call the rewrite if there’s not enough room for the new record. What does it do?
Rewrite Method
Let’s think back to the storage strategy. There may be many records referring to the same key and all of the previous records are actually useless. so we can reset the memory area and sync the memory cache into the mmap
area to record the up-to-date record only. After that, we need to check again to make sure it has enough room for the new record. otherwise, we need to resize the memory area by doubling the size.
Homework
That’s all, you can access the completed project: Goose. Before that, I hope you can complete the homework: Delete Action. It just needs a few codes to complete it.