Cost Driven Smart Contract Development – Part 1

Date

February 26, 2024

Category

Blockchain

Keep your Soroban contracts running lean by understanding the guest vs host execution environments.

Writing Soroban contracts is easy and fun! Writing efficient, cost effective contracts, well, that can be a chore and not actually terribly obvious to the casual observer. Let’s walk through a contrived but very practical example.

Let’s say I need to build and output a sha256 hash of some bytes data. There are lots of ways we could construct such a bytes array but not all are created equal. Let’s start with an example project.

git clone https://github.com/kalepail/soroban-guest-vs-host
cd soroban-guest-vs-host

Inside of src/lib.rs we have three functions v1, v2 and v3. Let’s take a look at v1.

pub fn v1(env: Env) -> BytesN<32> {
    let mut bytes = Bytes::from_array(&env, &[u8::MAX; 5000]);

    for (i, _v) in bytes.iter().enumerate() {
        bytes.set(i as u32, i as u8);
    }

    env.crypto().sha256(&bytes)
}

Not much going on here. We take an array of 5000 items initialized to u8::MAX values and then enumerate over it resetting each value inside the bytes array to i as u8. This will give us a final bytes array of 5000 items iteratively increasing from 0..255 and then restarting at 0 again up until the 5000th item.

We then take that bytes array and hash it returning the final hash. Simple! Let’s run the test and see what the output and costs are. (You can read the test file in src/test.rs)

cargo test test_v1 -- --nocapture

    Finished test [unoptimized + debuginfo] target(s) in 0.06s
     Running unittests src/lib.rs (target/debug/deps/guest_vs_host-f82c4c8e9dba2170)

running 1 test
=======================================================
Cpu limit: 100000000; used: 22932352
Mem limit: 41943040; used: 38303900
=======================================================
CostType                 cpu_insns      mem_bytes      
WasmInsnExec             0              0              
MemAlloc                 13432280       38303900       
MemCpy                   5557345        0              
MemCmp                   552            0              
DispatchHostFunction     0              0              
VisitObject              3660793        0              
ValSer                   0              0              
ValDeser                 0              0              
ComputeSha256Hash        281382         0              
ComputeEd25519PubKey     0              0              
VerifyEd25519Sig         0              0              
VmInstantiation          0              0              
VmCachedInstantiation    0              0              
InvokeVmFunction         0              0              
ComputeKeccak256Hash     0              0              
ComputeEcdsaSecp256k1Sig 0              0              
RecoverEcdsaSecp256k1Key 0              0              
Int256AddSub             0              0              
Int256Mul                0              0              
Int256Div                0              0              
Int256Pow                0              0              
Int256Shift              0              0              
ChaCha20DrawBytes        0              0              
=======================================================


BytesN<32>(128, 38, 229, 201, 108, 241, 229, 2, 200, 222, 179, 232, 159, 139, 139, 195, 66, 245, 3, 155, 135, 25, 17, 169, 46, 177, 14, 223, 156, 101, 66, 211)
Writing test snapshot file for test "test::test_v1" to "test_snapshots/test/test_v1.1.json".
test test::test_v1 ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 2 filtered out; finished in 0.03s

Almost 23M CPU instructions used and over 38M bytes of memory. Yikes!! That’s very high given the relatively restrictive upper bound limits of Soroban. What can we do to reduce this? 🤔

Let’s try the v2 function

pub fn v2(env: Env) -> BytesN<32> {
    let mut bytes = Bytes::new(&env);

    for i in 0..5000 {
        bytes.push_back(i as u8)
    }

    env.crypto().sha256(&bytes)
}

Ah okay I see what you’re doing here, rather than create and then modify a bytes array we’ll just build one by creating a loop from 0..5000 and using that index to set the i as u8 value. Clever! Let’s see how it performs.

cargo test test_v2 -- --nocapture

    Finished test [unoptimized + debuginfo] target(s) in 0.06s
     Running unittests src/lib.rs (target/debug/deps/guest_vs_host-f82c4c8e9dba2170)

running 1 test
=======================================================
Cpu limit: 100000000; used: 9131770
Mem limit: 41943040; used: 12903884
=======================================================
CostType                 cpu_insns      mem_bytes      
WasmInsnExec             0              0              
MemAlloc                 5936846        12903884       
MemCpy                   1997258        0              
MemCmp                   552            0              
DispatchHostFunction     0              0              
VisitObject              915732         0              
ValSer                   0              0              
ValDeser                 0              0              
ComputeSha256Hash        281382         0              
ComputeEd25519PubKey     0              0              
VerifyEd25519Sig         0              0              
VmInstantiation          0              0              
VmCachedInstantiation    0              0              
InvokeVmFunction         0              0              
ComputeKeccak256Hash     0              0              
ComputeEcdsaSecp256k1Sig 0              0              
RecoverEcdsaSecp256k1Key 0              0              
Int256AddSub             0              0              
Int256Mul                0              0              
Int256Div                0              0              
Int256Pow                0              0              
Int256Shift              0              0              
ChaCha20DrawBytes        0              0              
=======================================================


BytesN<32>(128, 38, 229, 201, 108, 241, 229, 2, 200, 222, 179, 232, 159, 139, 139, 195, 66, 245, 3, 155, 135, 25, 17, 169, 46, 177, 14, 223, 156, 101, 66, 211)
Writing test snapshot file for test "test::test_v2" to "test_snapshots/test/test_v2.1.json".
test test::test_v2 ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 2 filtered out; finished in 0.01s

Holy popsicle sticks! That’s WAY cheaper. Down from 23M to just over 9M and 38M → ~13M. I’d say that qualifies as a tremendous reduction. Alright so let’s wrap up and just be way more careful with how we generate and fill Soroban data containers.

But wait, why is this so much more efficient? I guess I can see how creating fewer containers in memory might be more efficient but if this were a language like JS I don’t think you’d see such a dramatic difference. What’s going on?

Well without making it more complicated than even I understand what you need to understand about Soroban is that contract invocations execute as a small machine within a larger machine. A guest within a host. The host has all the power, it’s the environment, it’s where all the ledger, storage, magic and blockchainery happens. The guest is simple, boring, lean and ephemeral but its strength is that it’s really small, cheap and fast. The guest can do things by just doing Rust things and it can do interesting and useful things by pulling from and pushing to the host environment. e.g. Get this bit of data, read in the current ledger number, run a crypto function. Invoking the host is awesome but it’s expensive, so you need to be very thoughtful when, why and how you’re making those calls.

So how do you know if you’re invoking the host? Look for calls to the Env. env.storage? Host call. env.crypto? Host call. Bytes::new(&env)? Host call! And herein lies our problem. Go back to v1.

let mut bytes = Bytes::from_array(&env, &[u8::MAX; 5000]);

for (i, _v) in bytes.iter().enumerate() {
    bytes.set(i as u32, i as u8);
}

This loop is not just sweetly mutating the original bytes array. It’s actually copying the entire 5000 item array, creating a brand new 5000 item array, with just a single value changed and finally replacing the old array. Why? It’s doing this work in the host, so it has to bundle up what the guest has, ship it to the host for modification, then pull the update back into the guest. My goodness! No wonder it’s so expensive.

For v2 then it becomes clear why it’s so much better

let mut bytes = Bytes::new(&env);

for i in 0..5000 {
    bytes.push_back(i as u8)
}

We aren’t starting with and passing around a 5000 item array between the host and guest. We’re starting with something empty and iteratively increasing it’s size over 5000 iterations. It still feels wasteful though. It may be less but it’s still a lot. With our new knowledge of guest and host is it possible to improve this function even further by reducing to the max all Env usages?

Behold v3

pub fn v3(env: Env) -> BytesN<32> {
    let mut bytes = [u8::MAX; 5000];

    for (i, byte) in bytes.iter_mut().enumerate() {
        *byte = i as u8;
    }

    env.crypto().sha256(&Bytes::from_array(&env, &bytes))
}

Let’s just drop the concept of host Bytes until the very end and utilize full pure Rust arrays. We’ll create a nice big 5000 item array and iterate over it as we did in v1 but since we’re in the guest we can dereference and modify individual bytes without needing to recreate the entire array in every iteration. Let’s see how this affects the CPU and memory usage.

cargo test test_v3 -- --nocapture

    Finished test [unoptimized + debuginfo] target(s) in 0.23s
     Running unittests src/lib.rs (target/debug/deps/guest_vs_host-f82c4c8e9dba2170)

running 1 test
=======================================================
Cpu limit: 100000000; used: 292230
Mem limit: 41943040; used: 6400
=======================================================
CostType                 cpu_insns      mem_bytes      
WasmInsnExec             0              0              
MemAlloc                 7280           6400           
MemCpy                   2345           0              
MemCmp                   552            0              
DispatchHostFunction     0              0              
VisitObject              671            0              
ValSer                   0              0              
ValDeser                 0              0              
ComputeSha256Hash        281382         0              
ComputeEd25519PubKey     0              0              
VerifyEd25519Sig         0              0              
VmInstantiation          0              0              
VmCachedInstantiation    0              0              
InvokeVmFunction         0              0              
ComputeKeccak256Hash     0              0              
ComputeEcdsaSecp256k1Sig 0              0              
RecoverEcdsaSecp256k1Key 0              0              
Int256AddSub             0              0              
Int256Mul                0              0              
Int256Div                0              0              
Int256Pow                0              0              
Int256Shift              0              0              
ChaCha20DrawBytes        0              0              
=======================================================


BytesN<32>(128, 38, 229, 201, 108, 241, 229, 2, 200, 222, 179, 232, 159, 139, 139, 195, 66, 245, 3, 155, 135, 25, 17, 169, 46, 177, 14, 223, 156, 101, 66, 211)
Writing test snapshot file for test "test::test_v3" to "test_snapshots/test/test_v3.1.json".
test test::test_v3 ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 2 filtered out; finished in 0.02s

Hahaha, okay, yeah, alright, sure, yep, yikes. 29.2k CPU and 6.4k memory. It’s not even close, like at all. And if you look carefully you’ll see 28.1k of the CPU was the host ComputeSha256Hash call.

⚠️

The tl;dr is don’t use the host, ever, unless you absolutely have to, and you will, it’s where the useful stuff lives, but don’t get lazy, or you, your users, your dog, the fish in the sea, will all suffer for it.