Keep your Soroban contracts running lean by understanding the guest vs host execution environments.
Writing Soroban contracts is easy and fun! Writing efficient, cost effective contracts, well, that can be a chore and not actually terribly obvious to the casual observer. Let’s walk through a contrived but very practical example.
Let’s say I need to build and output a sha256
hash of some bytes data. There are lots of ways we could construct such a bytes array but not all are created equal. Let’s start with an example project.
git clone https://github.com/kalepail/soroban-guest-vs-host
cd soroban-guest-vs-host
Inside of src/lib.rs
we have three functions v1
, v2
and v3
. Let’s take a look at v1
.
pub fn v1(env: Env) -> BytesN<32> {
let mut bytes = Bytes::from_array(&env, &[u8::MAX; 5000]);
for (i, _v) in bytes.iter().enumerate() {
bytes.set(i as u32, i as u8);
}
env.crypto().sha256(&bytes)
}
Not much going on here. We take an array of 5000 items initialized to u8::MAX
values and then enumerate
over it resetting each value inside the bytes
array to i as u8
. This will give us a final bytes
array of 5000 items iteratively increasing from 0..255 and then restarting at 0 again up until the 5000th item.
We then take that bytes
array and hash it returning the final hash. Simple! Let’s run the test and see what the output and costs are. (You can read the test file in src/test.rs
)
cargo test test_v1 -- --nocapture
Finished test [unoptimized + debuginfo] target(s) in 0.06s
Running unittests src/lib.rs (target/debug/deps/guest_vs_host-f82c4c8e9dba2170)
running 1 test
=======================================================
Cpu limit: 100000000; used: 22932352
Mem limit: 41943040; used: 38303900
=======================================================
CostType cpu_insns mem_bytes
WasmInsnExec 0 0
MemAlloc 13432280 38303900
MemCpy 5557345 0
MemCmp 552 0
DispatchHostFunction 0 0
VisitObject 3660793 0
ValSer 0 0
ValDeser 0 0
ComputeSha256Hash 281382 0
ComputeEd25519PubKey 0 0
VerifyEd25519Sig 0 0
VmInstantiation 0 0
VmCachedInstantiation 0 0
InvokeVmFunction 0 0
ComputeKeccak256Hash 0 0
ComputeEcdsaSecp256k1Sig 0 0
RecoverEcdsaSecp256k1Key 0 0
Int256AddSub 0 0
Int256Mul 0 0
Int256Div 0 0
Int256Pow 0 0
Int256Shift 0 0
ChaCha20DrawBytes 0 0
=======================================================
BytesN<32>(128, 38, 229, 201, 108, 241, 229, 2, 200, 222, 179, 232, 159, 139, 139, 195, 66, 245, 3, 155, 135, 25, 17, 169, 46, 177, 14, 223, 156, 101, 66, 211)
Writing test snapshot file for test "test::test_v1" to "test_snapshots/test/test_v1.1.json".
test test::test_v1 ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 2 filtered out; finished in 0.03s
Almost 23M CPU instructions used and over 38M bytes of memory. Yikes!! That’s very high given the relatively restrictive upper bound limits of Soroban. What can we do to reduce this? 🤔
Let’s try the v2
function
pub fn v2(env: Env) -> BytesN<32> {
let mut bytes = Bytes::new(&env);
for i in 0..5000 {
bytes.push_back(i as u8)
}
env.crypto().sha256(&bytes)
}
Ah okay I see what you’re doing here, rather than create and then modify a bytes array we’ll just build one by creating a loop from 0..5000
and using that index to set the i as u8
value. Clever! Let’s see how it performs.
cargo test test_v2 -- --nocapture
Finished test [unoptimized + debuginfo] target(s) in 0.06s
Running unittests src/lib.rs (target/debug/deps/guest_vs_host-f82c4c8e9dba2170)
running 1 test
=======================================================
Cpu limit: 100000000; used: 9131770
Mem limit: 41943040; used: 12903884
=======================================================
CostType cpu_insns mem_bytes
WasmInsnExec 0 0
MemAlloc 5936846 12903884
MemCpy 1997258 0
MemCmp 552 0
DispatchHostFunction 0 0
VisitObject 915732 0
ValSer 0 0
ValDeser 0 0
ComputeSha256Hash 281382 0
ComputeEd25519PubKey 0 0
VerifyEd25519Sig 0 0
VmInstantiation 0 0
VmCachedInstantiation 0 0
InvokeVmFunction 0 0
ComputeKeccak256Hash 0 0
ComputeEcdsaSecp256k1Sig 0 0
RecoverEcdsaSecp256k1Key 0 0
Int256AddSub 0 0
Int256Mul 0 0
Int256Div 0 0
Int256Pow 0 0
Int256Shift 0 0
ChaCha20DrawBytes 0 0
=======================================================
BytesN<32>(128, 38, 229, 201, 108, 241, 229, 2, 200, 222, 179, 232, 159, 139, 139, 195, 66, 245, 3, 155, 135, 25, 17, 169, 46, 177, 14, 223, 156, 101, 66, 211)
Writing test snapshot file for test "test::test_v2" to "test_snapshots/test/test_v2.1.json".
test test::test_v2 ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 2 filtered out; finished in 0.01s
Holy popsicle sticks! That’s WAY cheaper. Down from 23M to just over 9M and 38M → ~13M. I’d say that qualifies as a tremendous reduction. Alright so let’s wrap up and just be way more careful with how we generate and fill Soroban data containers.
But wait, why is this so much more efficient? I guess I can see how creating fewer containers in memory might be more efficient but if this were a language like JS I don’t think you’d see such a dramatic difference. What’s going on?
Well without making it more complicated than even I understand what you need to understand about Soroban is that contract invocations execute as a small machine within a larger machine. A guest within a host. The host has all the power, it’s the environment, it’s where all the ledger, storage, magic and blockchainery happens. The guest is simple, boring, lean and ephemeral but its strength is that it’s really small, cheap and fast. The guest can do things by just doing Rust things and it can do interesting and useful things by pulling from and pushing to the host environment. e.g. Get this bit of data, read in the current ledger number, run a crypto function. Invoking the host is awesome but it’s expensive, so you need to be very thoughtful when, why and how you’re making those calls.
So how do you know if you’re invoking the host? Look for calls to the Env
. env.storage
? Host call. env.crypto
? Host call. Bytes::new(&env)
? Host call! And herein lies our problem. Go back to v1
.
let mut bytes = Bytes::from_array(&env, &[u8::MAX; 5000]);
for (i, _v) in bytes.iter().enumerate() {
bytes.set(i as u32, i as u8);
}
This loop is not just sweetly mutating the original bytes array. It’s actually copying the entire 5000 item array, creating a brand new 5000 item array, with just a single value changed and finally replacing the old array. Why? It’s doing this work in the host, so it has to bundle up what the guest has, ship it to the host for modification, then pull the update back into the guest. My goodness! No wonder it’s so expensive.
For v2
then it becomes clear why it’s so much better
let mut bytes = Bytes::new(&env);
for i in 0..5000 {
bytes.push_back(i as u8)
}
We aren’t starting with and passing around a 5000 item array between the host and guest. We’re starting with something empty and iteratively increasing it’s size over 5000 iterations. It still feels wasteful though. It may be less but it’s still a lot. With our new knowledge of guest and host is it possible to improve this function even further by reducing to the max all Env
usages?
Behold v3
pub fn v3(env: Env) -> BytesN<32> {
let mut bytes = [u8::MAX; 5000];
for (i, byte) in bytes.iter_mut().enumerate() {
*byte = i as u8;
}
env.crypto().sha256(&Bytes::from_array(&env, &bytes))
}
Let’s just drop the concept of host Bytes
until the very end and utilize full pure Rust arrays. We’ll create a nice big 5000 item array and iterate over it as we did in v1
but since we’re in the guest we can dereference and modify individual bytes without needing to recreate the entire array in every iteration. Let’s see how this affects the CPU and memory usage.
cargo test test_v3 -- --nocapture
Finished test [unoptimized + debuginfo] target(s) in 0.23s
Running unittests src/lib.rs (target/debug/deps/guest_vs_host-f82c4c8e9dba2170)
running 1 test
=======================================================
Cpu limit: 100000000; used: 292230
Mem limit: 41943040; used: 6400
=======================================================
CostType cpu_insns mem_bytes
WasmInsnExec 0 0
MemAlloc 7280 6400
MemCpy 2345 0
MemCmp 552 0
DispatchHostFunction 0 0
VisitObject 671 0
ValSer 0 0
ValDeser 0 0
ComputeSha256Hash 281382 0
ComputeEd25519PubKey 0 0
VerifyEd25519Sig 0 0
VmInstantiation 0 0
VmCachedInstantiation 0 0
InvokeVmFunction 0 0
ComputeKeccak256Hash 0 0
ComputeEcdsaSecp256k1Sig 0 0
RecoverEcdsaSecp256k1Key 0 0
Int256AddSub 0 0
Int256Mul 0 0
Int256Div 0 0
Int256Pow 0 0
Int256Shift 0 0
ChaCha20DrawBytes 0 0
=======================================================
BytesN<32>(128, 38, 229, 201, 108, 241, 229, 2, 200, 222, 179, 232, 159, 139, 139, 195, 66, 245, 3, 155, 135, 25, 17, 169, 46, 177, 14, 223, 156, 101, 66, 211)
Writing test snapshot file for test "test::test_v3" to "test_snapshots/test/test_v3.1.json".
test test::test_v3 ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 2 filtered out; finished in 0.02s
Hahaha, okay, yeah, alright, sure, yep, yikes. 29.2k CPU and 6.4k memory. It’s not even close, like at all. And if you look carefully you’ll see 28.1k of the CPU was the host ComputeSha256Hash
call.