Keep your Soroban contracts running lean by understanding the guest vs host execution environments.
Writing Soroban contracts is easy and fun! Writing efficient, cost effective contracts, well, that can be a chore and not actually terribly obvious to the casual observer. Let’s walk through a contrived but very practical example.
Let’s say I need to build and output a sha256 hash of some bytes data. There are lots of ways we could construct such a bytes array but not all are created equal. Let’s start with an example project.
git clone https://github.com/kalepail/soroban-guest-vs-host
cd soroban-guest-vs-hostInside of src/lib.rs we have three functions v1, v2 and v3. Let’s take a look at v1.
pub fn v1(env: Env) -> BytesN<32> {
let mut bytes = Bytes::from_array(&env, &[u8::MAX; 5000]);
for (i, _v) in bytes.iter().enumerate() {
bytes.set(i as u32, i as u8);
}
env.crypto().sha256(&bytes)
}Not much going on here. We take an array of 5000 items initialized to u8::MAX values and then enumerate over it resetting each value inside the bytes array to i as u8. This will give us a final bytes array of 5000 items iteratively increasing from 0..255 and then restarting at 0 again up until the 5000th item.
We then take that bytes array and hash it returning the final hash. Simple! Let’s run the test and see what the output and costs are. (You can read the test file in src/test.rs)
cargo test test_v1 -- --nocapture Finished test [unoptimized + debuginfo] target(s) in 0.06s
Running unittests src/lib.rs (target/debug/deps/guest_vs_host-f82c4c8e9dba2170)
running 1 test
=======================================================
Cpu limit: 100000000; used: 22932352
Mem limit: 41943040; used: 38303900
=======================================================
CostType cpu_insns mem_bytes
WasmInsnExec 0 0
MemAlloc 13432280 38303900
MemCpy 5557345 0
MemCmp 552 0
DispatchHostFunction 0 0
VisitObject 3660793 0
ValSer 0 0
ValDeser 0 0
ComputeSha256Hash 281382 0
ComputeEd25519PubKey 0 0
VerifyEd25519Sig 0 0
VmInstantiation 0 0
VmCachedInstantiation 0 0
InvokeVmFunction 0 0
ComputeKeccak256Hash 0 0
ComputeEcdsaSecp256k1Sig 0 0
RecoverEcdsaSecp256k1Key 0 0
Int256AddSub 0 0
Int256Mul 0 0
Int256Div 0 0
Int256Pow 0 0
Int256Shift 0 0
ChaCha20DrawBytes 0 0
=======================================================
BytesN<32>(128, 38, 229, 201, 108, 241, 229, 2, 200, 222, 179, 232, 159, 139, 139, 195, 66, 245, 3, 155, 135, 25, 17, 169, 46, 177, 14, 223, 156, 101, 66, 211)
Writing test snapshot file for test "test::test_v1" to "test_snapshots/test/test_v1.1.json".
test test::test_v1 ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 2 filtered out; finished in 0.03sAlmost 23M CPU instructions used and over 38M bytes of memory. Yikes!! That’s very high given the relatively restrictive upper bound limits of Soroban. What can we do to reduce this? 🤔
Let’s try the v2 function
pub fn v2(env: Env) -> BytesN<32> {
let mut bytes = Bytes::new(&env);
for i in 0..5000 {
bytes.push_back(i as u8)
}
env.crypto().sha256(&bytes)
}Ah okay I see what you’re doing here, rather than create and then modify a bytes array we’ll just build one by creating a loop from 0..5000 and using that index to set the i as u8 value. Clever! Let’s see how it performs.
cargo test test_v2 -- --nocapture Finished test [unoptimized + debuginfo] target(s) in 0.06s
Running unittests src/lib.rs (target/debug/deps/guest_vs_host-f82c4c8e9dba2170)
running 1 test
=======================================================
Cpu limit: 100000000; used: 9131770
Mem limit: 41943040; used: 12903884
=======================================================
CostType cpu_insns mem_bytes
WasmInsnExec 0 0
MemAlloc 5936846 12903884
MemCpy 1997258 0
MemCmp 552 0
DispatchHostFunction 0 0
VisitObject 915732 0
ValSer 0 0
ValDeser 0 0
ComputeSha256Hash 281382 0
ComputeEd25519PubKey 0 0
VerifyEd25519Sig 0 0
VmInstantiation 0 0
VmCachedInstantiation 0 0
InvokeVmFunction 0 0
ComputeKeccak256Hash 0 0
ComputeEcdsaSecp256k1Sig 0 0
RecoverEcdsaSecp256k1Key 0 0
Int256AddSub 0 0
Int256Mul 0 0
Int256Div 0 0
Int256Pow 0 0
Int256Shift 0 0
ChaCha20DrawBytes 0 0
=======================================================
BytesN<32>(128, 38, 229, 201, 108, 241, 229, 2, 200, 222, 179, 232, 159, 139, 139, 195, 66, 245, 3, 155, 135, 25, 17, 169, 46, 177, 14, 223, 156, 101, 66, 211)
Writing test snapshot file for test "test::test_v2" to "test_snapshots/test/test_v2.1.json".
test test::test_v2 ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 2 filtered out; finished in 0.01sHoly popsicle sticks! That’s WAY cheaper. Down from 23M to just over 9M and 38M → ~13M. I’d say that qualifies as a tremendous reduction. Alright so let’s wrap up and just be way more careful with how we generate and fill Soroban data containers.
But wait, why is this so much more efficient? I guess I can see how creating fewer containers in memory might be more efficient but if this were a language like JS I don’t think you’d see such a dramatic difference. What’s going on?
Well without making it more complicated than even I understand what you need to understand about Soroban is that contract invocations execute as a small machine within a larger machine. A guest within a host. The host has all the power, it’s the environment, it’s where all the ledger, storage, magic and blockchainery happens. The guest is simple, boring, lean and ephemeral but its strength is that it’s really small, cheap and fast. The guest can do things by just doing Rust things and it can do interesting and useful things by pulling from and pushing to the host environment. e.g. Get this bit of data, read in the current ledger number, run a crypto function. Invoking the host is awesome but it’s expensive, so you need to be very thoughtful when, why and how you’re making those calls.
So how do you know if you’re invoking the host? Look for calls to the Env. env.storage? Host call. env.crypto? Host call. Bytes::new(&env)? Host call! And herein lies our problem. Go back to v1.
let mut bytes = Bytes::from_array(&env, &[u8::MAX; 5000]);
for (i, _v) in bytes.iter().enumerate() {
bytes.set(i as u32, i as u8);
}This loop is not just sweetly mutating the original bytes array. It’s actually copying the entire 5000 item array, creating a brand new 5000 item array, with just a single value changed and finally replacing the old array. Why? It’s doing this work in the host, so it has to bundle up what the guest has, ship it to the host for modification, then pull the update back into the guest. My goodness! No wonder it’s so expensive.
For v2 then it becomes clear why it’s so much better
let mut bytes = Bytes::new(&env);
for i in 0..5000 {
bytes.push_back(i as u8)
}We aren’t starting with and passing around a 5000 item array between the host and guest. We’re starting with something empty and iteratively increasing it’s size over 5000 iterations. It still feels wasteful though. It may be less but it’s still a lot. With our new knowledge of guest and host is it possible to improve this function even further by reducing to the max all Env usages?
Behold v3
pub fn v3(env: Env) -> BytesN<32> {
let mut bytes = [u8::MAX; 5000];
for (i, byte) in bytes.iter_mut().enumerate() {
*byte = i as u8;
}
env.crypto().sha256(&Bytes::from_array(&env, &bytes))
}Let’s just drop the concept of host Bytes until the very end and utilize full pure Rust arrays. We’ll create a nice big 5000 item array and iterate over it as we did in v1 but since we’re in the guest we can dereference and modify individual bytes without needing to recreate the entire array in every iteration. Let’s see how this affects the CPU and memory usage.
cargo test test_v3 -- --nocapture Finished test [unoptimized + debuginfo] target(s) in 0.23s
Running unittests src/lib.rs (target/debug/deps/guest_vs_host-f82c4c8e9dba2170)
running 1 test
=======================================================
Cpu limit: 100000000; used: 292230
Mem limit: 41943040; used: 6400
=======================================================
CostType cpu_insns mem_bytes
WasmInsnExec 0 0
MemAlloc 7280 6400
MemCpy 2345 0
MemCmp 552 0
DispatchHostFunction 0 0
VisitObject 671 0
ValSer 0 0
ValDeser 0 0
ComputeSha256Hash 281382 0
ComputeEd25519PubKey 0 0
VerifyEd25519Sig 0 0
VmInstantiation 0 0
VmCachedInstantiation 0 0
InvokeVmFunction 0 0
ComputeKeccak256Hash 0 0
ComputeEcdsaSecp256k1Sig 0 0
RecoverEcdsaSecp256k1Key 0 0
Int256AddSub 0 0
Int256Mul 0 0
Int256Div 0 0
Int256Pow 0 0
Int256Shift 0 0
ChaCha20DrawBytes 0 0
=======================================================
BytesN<32>(128, 38, 229, 201, 108, 241, 229, 2, 200, 222, 179, 232, 159, 139, 139, 195, 66, 245, 3, 155, 135, 25, 17, 169, 46, 177, 14, 223, 156, 101, 66, 211)
Writing test snapshot file for test "test::test_v3" to "test_snapshots/test/test_v3.1.json".
test test::test_v3 ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 2 filtered out; finished in 0.02sHahaha, okay, yeah, alright, sure, yep, yikes. 29.2k CPU and 6.4k memory. It’s not even close, like at all. And if you look carefully you’ll see 28.1k of the CPU was the host ComputeSha256Hash call.