The harrowing journey of discovering how best to test Soroban contracts for cost efficiency
Your Soroban smart contracts will only be as efficient as you test for and unfortunately it’s just not that obvious to test for the many and various limits Soroban employs. Let me save you some time by taking you on my journey to discover the most effective method for measuring your contract costs.
Part 1 covered testing via the env.budget()
util which is a great and powerful tool for quickly measuring various vm related costs. However A) it’s quite rough when talking about real network costs and B) it’s quite limited when considering the full spectrum of costs. It really only cares about CPU and memory costs, but Soroban consists of 11 distinct transaction level limits according to its documentation. That’s a lot more than is covered just in Rust via env.budget()
. To cover the rest we’ll need to be running on an actual network. Before we jump in here are a few essential links relevant to this topic.
- The project repo for today’s article
sorobill
tool- Soroban’s Fees and Metering doc
- Soroban’s Resource Limits and Fees doc
We’ll be spending all our time in that repo, so be sure you’ve got that cloned and open in order to follow along in proper context.
Testing Against Built WASM Binaries
As mentioned, in order to properly test the cost of our contracts we’ll need to run our invocations on an actual network. Before we jump into that let’s explore one more Rust trick for getting our env.budget()
to deliver more accurate results. Namely by testing built WASM binaries vs uncompiled Rust code. The rs-soroban-sdk is actually incredibly powerful and has a lot of sweet tricks up its sleeve, not least of which its ability to import and run imported contract binaries against a simulated vm env.
Head over to src/test.rs
and I’ll show you what I mean. You can see two tests, test_v1
and test_v2
. If you’re coming from Part 1 this should feel pretty familiar. If you run test_v1
you’ll see a familiar output:
cargo test test_v1 -- --nocapture
error: No such file or directory (os error 2)
--> src/test.rs:10:5
|
10 | / soroban_sdk::contractimport!(
11 | | file = "./target/wasm32-unknown-unknown/release/i_like_big_budgets.optimized.wasm"
12 | | );
| |_____^
Ah wait, F nvm, right. We have a new block of code in this test which is importing a built WASM binary contract. This file hasn’t been built yet and thus it doesn’t exist and Rust is yelling at us. Thank you Rust! Let’s get our contract built, optimized and try that above command again.
soroban contract build
soroban contract optimize --wasm target/wasm32-unknown-unknown/release/i_like_big_budgets.wasm
soroban
CLI you should follow this doc to get that setup on your system.Also make sure to install the CLI with the --features opt
flag, as we’ll be running against an optimized contract which will ensure we’re running as closely as possible to an actual production scenario.
Cool, okay, geeze. Let’s try cargo test test_v1 -- --nocapture
again. (truncated to only show Cpu and Mem usage)
Compiling i-like-big-budgets v0.1.0 (/Users/tylervanderhoeven/Desktop/Web/Soroban/soroban-i-like-big-budgets)
Finished test [unoptimized + debuginfo] target(s) in 0.40s
Running unittests src/lib.rs (target/debug/deps/i_like_big_budgets-a4ba9cdbd03ba15b)
running 1 test
=======================================================
Cpu limit: 18446744073709551615; used: 112535704
Mem limit: 18446744073709551615; used: 43294884
=======================================================
...
=======================================================
Writing test snapshot file for test "test::test_v1" to "test_snapshots/test/test_v1.1.json".
test test::test_v1 ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 1 filtered out; finished in 0.26s
Nice. So this test is just doing the classic tests against the unbuilt raw Rust. Now take a quick look at test_v2
.
let contract_id = env.register_contract_wasm(None, i_like_big_budgets::WASM);
let client = i_like_big_budgets::Client::new(&env, &contract_id);
This is building our contract client from the i_like_big_budgets::WASM
which is coming right from the mod i_like_big_budgets
from the top of the test file.
mod i_like_big_budgets {
soroban_sdk::contractimport!(
file = "./target/wasm32-unknown-unknown/release/i_like_big_budgets.optimized.wasm"
);
}
This pulls in that contract and loads its code and interface to use just like the actual Soroban vm running inside the Stellar blockchain would. Awesome! Let’s run the test_v2
test and observe any differences. (output truncated again so we can focus on Cpu and Mem differences from the previous test)
cargo test test_v2 -- --nocapture
Finished test [unoptimized + debuginfo] target(s) in 0.06s
Running unittests src/lib.rs (target/debug/deps/i_like_big_budgets-a4ba9cdbd03ba15b)
running 1 test
=======================================================
Cpu limit: 18446744073709551615; used: 120919876
Mem limit: 18446744073709551615; used: 45231943
=======================================================
...
=======================================================
Writing test snapshot file for test "test::test_v2" to "test_snapshots/test/test_v2.1.json".
test test::test_v2 ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 1 filtered out; finished in 0.33s
Cpu 112535704
vs 120919876
Mem 43294884
vs 45231943
Summarized, running built WASM is more expensive, and often pretty significantly. This makes perfect sense though when you think about it when you look carefully at the cost breakdowns.
Let’s compare the first outputs from each test side by side.
=======================================================
CostType cpu_insns mem_bytes
WasmInsnExec 0 0
MemAlloc 8873214 43294884
MemCpy 5721112 0
MemCmp 552 0
DispatchHostFunction 0 0
VisitObject 585905 0
ValSer 0 0
ValDeser 0 0
ComputeSha256Hash 89754738 0
ComputeEd25519PubKey 0 0
VerifyEd25519Sig 0 0
VmInstantiation 0 0
VmCachedInstantiation 0 0
InvokeVmFunction 0 0
ComputeKeccak256Hash 0 0
ComputeEcdsaSecp256k1Sig 0 0
RecoverEcdsaSecp256k1Key 0 0
Int256AddSub 0 0
Int256Mul 0 0
Int256Div 0 0
Int256Pow 0 0
Int256Shift 0 0
ChaCha20DrawBytes 7600183 0
=======================================================
=======================================================
CostType cpu_insns mem_bytes
WasmInsnExec 1218796 0
MemAlloc 10780932 44711897
MemCpy 5714003 0
MemCmp 696 0
DispatchHostFunction 1302310 0
VisitObject 598105 0
ValSer 0 0
ValDeser 0 0
ComputeSha256Hash 89754738 0
ComputeEd25519PubKey 0 0
VerifyEd25519Sig 0 0
VmInstantiation 3948165 520032
VmCachedInstantiation 0 0
InvokeVmFunction 1948 14
ComputeKeccak256Hash 0 0
ComputeEcdsaSecp256k1Sig 0 0
RecoverEcdsaSecp256k1Key 0 0
Int256AddSub 0 0
Int256Mul 0 0
Int256Div 0 0
Int256Pow 0 0
Int256Shift 0 0
ChaCha20DrawBytes 7600183 0
=======================================================
Things like WasmInsnExec
, DispatchHostFunction
, VmInstantiation
and InvokeVmFunction
are pretty obviously only needed when dealing with a vm instantiating a WASM contract blob and it turns out doing that is pretty expensive.
Testing Against Live Networks
Alright, let’s get into the meat of the topic at hand, testing contracts on live networks. This will be made possible thanks to the awesome stellar/quickstart
Docker container.
There’s lots of wonderful goodies, flags and features in this container. You could sync up to any one of the futurenet
, testnet
or even the public
networks. Those are all live, public networks and will take time to get in sync with. We’re in dev mode, we’re testing limits, fees and budgets not consensus or decentralization, so we’re going to use the blazing fast local
network. As for the flags, I’ve taken care of everything inside the ./docker-testnet.sh
file. Just bash
that bad boy and kick off your own private live local network running under the testnet
limits.
./docker-testnet.sh
After like a minute you should see a steady stream of stellar-core: Synced!
logs.
soroban rpc: up and ready
supervisor: 2024-02-29 16:02:20,010 INFO reaped unknown pid 245 (exit status 0)
stellar-core: Synced!; Publishing 1 queued checkpoints [15-15]: Waiting before starting Ready to run: resolve-snapshot
stellar-core: Synced!; Publishing 1 queued checkpoints [15-15]: Waiting: resolve-snapshot
stellar-core: Synced!
horizon: ingestion caught up
supervisor: 2024-02-29 16:02:25,060 INFO reaped unknown pid 252 (exit status 0)
stellar-core: Synced!; Publishing 1 queued checkpoints [23-23]: Waiting before starting Ready to run: resolve-snapshot
stellar-core: Synced!; Publishing 1 queued checkpoints [23-23]: Waiting: resolve-snapshot
stellar-core: Synced!
stellar-core: Synced!; Publishing 1 queued checkpoints [31-31]: Waiting before starting Ready to run: resolve-snapshot
stellar-core: Synced!; Publishing 1 queued checkpoints [31-31]: Waiting: resolve-snapshot
stellar-core: Synced!
At this point you’re ready to rock and roll. If not GGs, see you in the next one.
With our network up and running we’re ready to build and deploy our contract to this network for use in our budget testing. For this we have deploy.ts
and to run it, and for the rest of our project, I’ll be using my new favorite JS super tool Bun.sh. BUT, before we run it, I’ve learned my lesson, let’s ensure all our deps are installed.
bun install
bun run deploy.ts
cleaned target
created account
cargo rustc --manifest-path=Cargo.toml --crate-type=cdylib --target=wasm32-unknown-unknown --release
Compiling num-traits v0.2.17
Compiling escape-bytes v0.1.1
Compiling static_assertions v1.1.0
Compiling ethnum v1.5.0
Compiling stellar-xdr v20.1.0
Compiling soroban-env-common v20.2.2
Compiling soroban-env-guest v20.2.2
Compiling soroban-sdk v20.4.0
Compiling i-like-big-budgets v0.1.0 (/Users/tylervanderhoeven/Desktop/Web/Soroban/soroban-i-like-big-budgets)
Finished release [optimized] target(s) in 3.70s
Reading: target/wasm32-unknown-unknown/release/i_like_big_budgets.wasm (211077 bytes)
Writing to: target/wasm32-unknown-unknown/release/i_like_big_budgets.optimized.wasm...
Optimized: target/wasm32-unknown-unknown/release/i_like_big_budgets.optimized.wasm (9857 bytes)
built contract
deployed contract
✅
Assuming we got no errors we can confirm the ✅ is 👍 by running a quick invocation via the CLI. You can find your CONTRACT_ID
inside the .env.local
the deploy.ts
command built for us.
soroban contract invoke --id <CONTRACT_ID> --network kalenet --source kalecount -- run -h
Usage: run [OPTIONS]
Options:
--_txn <Option<hex_bytes>> Example:
--_txn beefface123
--set <Option<u32>> Example:
--set 1
--mem <Option<u32>> Example:
--mem 1
--cpu <Option<u32>> Example:
--cpu 1
--get <Option<u32>> Example:
--get 1
--events <Option<u32>> Example:
--events 1
-h, --help Print help (see more with '--help')
*click tongue Noice! Onwards!
With the local network running and our contract deployed we’re ready to move into index_1.ts
. As is tradition I’ll print out the code in its entirety and then walk through it bit by bit.
index_1.ts
import { Account, Keypair, Networks, Operation, SorobanRpc, TransactionBuilder, nativeToScVal, scValToNative, xdr } from "@stellar/stellar-sdk";
if (
!Bun.env.CONTRACT_ID
|| !Bun.env.SECRET
) throw new Error('Missing .env.local file. Run `bun run deploy.ts` to create it.')
const rpcUrl = 'http://localhost:8000/soroban/rpc'
const rpc = new SorobanRpc.Server(rpcUrl, { allowHttp: true })
const keypair = Keypair.fromSecret(Bun.env.SECRET)
const pubkey = keypair.publicKey()
const contractId = Bun.env.CONTRACT_ID
const networkPassphrase = Networks.STANDALONE
Some standard imports, a catch to ensure you’ve got the .env.local
built correctly and the necessary variables configured to hook up to our local network.
let i = 0
let args = [
[1500, 'u32', 'CPU'],
[200, 'u32', 'MEM'],
[20, 'u32', 'SET'],
[40, 'u32', 'GET'],
[1, 'u32', 'EVENTS'],
[Buffer.alloc(71_680), 'bytes', 'TXN'],
]
for (const [big, type, kind] of args) {
try {
console.log(`\n`);
console.log(`RUNNING TEST FOR ${kind}`);
console.log(`--------------------------`);
const args = [
xdr.ScVal.scvVoid(),
xdr.ScVal.scvVoid(),
xdr.ScVal.scvVoid(),
xdr.ScVal.scvVoid(),
xdr.ScVal.scvVoid(),
xdr.ScVal.scvVoid(),
]
const bigArgs = [...args]
bigArgs[i] = nativeToScVal(big, { type })
await run(bigArgs)
} catch (error) {
console.error(error)
} finally {
i++
console.log(`--------------------------`);
}
}
This block of code won’t make a lot of sense without the context of what we’re actually testing inside our contract as constructed in the src/lib.rs
file.
lib.rs run()
Essentially we’ve got a burner run
function which will push the limits of six different costs which together will allow us full coverage of all 11 limits. I’ll let your well educated, literate mind decipher which if statements tests which limits. With each statement wrapped in an if Option
value so we can target only testing specific limits with the same function by omitting all but the single target we’re actually interested in.
With this knowledge we can approach the index_1.ts
code block in question and begin to understand all we’re doing here is standing up a basic for loop which will try to run
each limit target individually…assuming run
will do what I think it will do…which it does!
async function run(args: xdr.ScVal[]) {
const source = await rpc
.getAccount(pubkey)
.then((account) => new Account(account.accountId(), account.sequenceNumber()))
.catch(() => { throw new Error(`Issue with ${pubkey} account. Ensure you're running the \`./docker.sh\` network and have run \`bun run deploy.ts\` recently.`) })
run
is a function which will take the args
we assembled in the previous block and build a transaction which will run a run
function invocation against your deployed <CONTRACT_ID>
on your local network running in that Docker container. With an args
value that looks like this
let args = [
[1500, 'u32', 'CPU'],
[200, 'u32', 'MEM'],
[20, 'u32', 'SET'],
[40, 'u32', 'GET'],
[1, 'u32', 'EVENTS'],
[Buffer.alloc(71_680), 'bytes', 'TXN'],
]
We will run 6 iterative invocations which will each look like [1000, None, None, None, None, None]
, [None, 200, None, None, None, None]
, and so forth allowing us to test each of the target limits from our Rust run
function. I’m not saying I’m a good programmer, but I do get the job done.
The first thing we do in the TypeScript run
function is build out the source Account
which will be used as the transaction source paying the fees and sequence number for our test transactions. Our deploy.ts
function created and used an account for the contract deployment and I put that in the .env.local
alongside the CONTRACT_ID
so we’ll just reuse that same account.
const simTx = new TransactionBuilder(source, {
fee: '0',
networkPassphrase
})
.addOperation(Operation.invokeContractFunction({
contract: contractId,
function: 'run',
args
}))
.setTimeout(0)
.build()
const simRes = await rpc.simulateTransaction(simTx)
With source
in hand things start getting interesting. We build a basic transaction with a single invokeContractFunction
operation taking in the CONTRACT_ID
, run
function and specified args
which will allow us to invoke the various limit scenarios. By itself at the point of .build()
this transaction is entirely invalid. It hasn’t even been signed yet. More importantly however it’s missing key Soroban related attributes like the footprint and any associated internal authentication. Our contract doesn’t have any additional auth outside the external tx auth so let’s just focus on the footprint.
A Soroban footprint is a signal to the Stellar protocol guiding it on the bounds of any given transaction. It includes all the resource ceilings as well as the specific ledger keys our contract invocation will read and write. Why this model? To allow for parallel transaction execution at some point in a future version of Soroban 🤞.
If you can signal to the blockchain what keys your invocation will touch as well as the upper bounds of resource consumption the network is able to align your transaction appropriately with other inbound requests to ensure there’s neither resource exhaustion nor conflicting race conditions over identical keys. If two (or more) inbound transactions are both under sum total ledger resource limits and do not compete over ledger keys they can both execute in parallel without the need to wait for global consensus.
To put it in simpler terms a truck turning right in Anchorage, Alaska shouldn’t have to wait for a plane taking off in Osaka, Japan even though that’s technically the way most blockchains operate today 🤯.
Okay so what does this have to do with our block of code above? Well in order to build a valid transaction we need to build its footprint and auth. This is non-trival in many cases as you have to essentially execute a dry-run invocation to uncover all the things your function call does in order to arrive at the appropriate resource settings and ledger key touch points. You could guess (error prone), you could max some things out (expensive), you could spend a lot of time digging deeply into your contract, the Soroban env and all the opcode associated costs (time consuming and brittle due to validator fee updates), or you could run a simulation (🎉).
Simulation is a beautiful magical tool which runs your request against a simulated network environment very nearly like the one we utilized in the Rust test against the built WASM binary. Rather than just returning a basic budget though the simulation call builds out everything we need to slap into our incomplete initial simTx
in order to submit it successfully. We’ll get to it in a minute but when you run simulation you’re running against the rpcUrl
to receive back a response in the form of something like this.
{
transactionData: "<SorobanTransactionData XDR>", // What we will put as the SorobanData of our incomplete tx
minResourceFee: "<XLM (in stroops)>", // Cost to cover this invocation
events: [ "<DiagnosticEvent XDR>", ...],
results: [
{
auth: [], // Empty in our case, will contain <SorobanAuthorizationEntry XDR> values when present
xdr: "<ScVal XDR>", // The return result of the invocation
}
],
cost: {
cpuInsns: "<number>",
memBytes: "<number>",
},
latestLedger: <number>,
}
restorePreamble
key. This is relevant for Soroban’s state archival system which we won’t get into here but which you should definitely be aware of.The magic, and really the ENTIRE POINT of this post is that everything we need to understand the cost of our contracts is included in this singular simulation response. It’s secret and safe inside the transactionData
, events
and results.xdr
XDR values but if we can get to this point and decode those values we’ll be in god’s country when it comes to reviewing our invocation costs.
As we will see however, getting to this point can be anything but trivial 😱.
I’ll quickly run through the rest of our code and then get on with next steps as we won’t get usable errors past this point when running index_1.ts
if (SorobanRpc.Api.isSimulationSuccess(simRes)) {
simRes.minResourceFee = '4294967295'
const resources = simRes.transactionData.build().resources()
const tx = SorobanRpc.assembleTransaction(simTx, simRes)
.setSorobanData(simRes.transactionData
.setResourceFee(100_000_000)
.setResources(100_000_000, resources.readBytes(), resources.writeBytes())
.build()
)
.build()
tx.sign(keypair)
const sendRes = await rpc._sendTransaction(tx)
if (sendRes.status === 'PENDING') {
await Bun.sleep(5000);
const getRes = await rpc._getTransaction(sendRes.hash)
if (getRes.resultMetaXdr) {
xdr.TransactionMeta
.fromXDR(getRes.resultMetaXdr, 'base64')
.v3()
.sorobanMeta()
?.diagnosticEvents()
.forEach((event) => {
console.log(
scValToNative(event.event().body().v0().data())
)
})
} else console.log(getRes)
} else console.log(sendRes)
} else console.log(await rpc._simulateTransaction(simTx));
}
simRes.minResourceFee = '4294967295'
? Relax. .setResourceFee(100_000_000)
?? Chill. .setResources(100_000_000, resources.readBytes(), resources.writeBytes())
??? Cool your jets.
We set our tx fee to '0'
initially and here we’re updating it to the absolute ceiling for a Stellar tx. Simulation tries its best but there can be instances where simulation may fail in its estimation. This is especially true in cases where we’re pushing things to the limit, and so we just go ahead and really over blow our resource estimations in order to ensure we don’t fail for simple fee calculation issues. We want to get to the actual resource limit errors.
This logic carries through with assembleTransaction
where we smash the invalid simTx
together with the vital simRes
info in order to get a new valid tx
ready for submission to the network! Well, almost ready, we still need to sign it. From there we send and then retrieve its hash and assuming everything goes well we’ll get back some useful information for debugging any errors.
Let’s try though. Brace for impact.
bun run index_1.ts
Okay well, there it is. Before you weep, close the tab and take up that sushi rolling hobby you’ve always said you’d be so good at, let’s remind ourselves errors is exactly what we’re after in this case. We’re trying to hit limit errors in order to understand how to adjust and improve our contracts. Our case is obviously contrived but you will run into these errors and understanding what to do with them is critically important. The sad part is the above errors are nearly useless as far as giving you the info for how to actually adjust your contract. Bummer.
Let’s start with our first error under RUNNING TEST FOR CPU
"host invocation failed\n\nCaused by:\n HostError: Error(Budget, ExceededLimit)\n DebugInfo not available\n ",
Thank you for the info, but what “Budget” exceeded what “ExceededLimit”? What do I need to adjust? Maybe events
will help me? (fwiw imo it should, but it doesn’t atm). I see two events.
1: AAAAAAAAAAAAAAAAAAAAAgAAAAAAAAADAAAADwAAAAdmbl9jYWxsAAAAAA0AAAAgGPZIazX/ejI7S/m0Qotqa3VC0emAZdPLwmyyAG/iMAoAAAAPAAAAA3J1bgAAAAAQAAAAAQAAAAYAAAADAAAF3AAAAAEAAAABAAAAAQAAAAEAAAAB
2: AAAAAAAAAAAAAAABGPZIazX/ejI7S/m0Qotqa3VC0emAZdPLwmyyAG/iMAoAAAACAAAAAAAAAAIAAAAPAAAABWVycm9yAAAAAAAAAgAAAAcAAAAFAAAADgAAAE9lc2NhbGF0aW5nIGVycm9yIHRvIFZNIHRyYXAgZnJvbSBmYWlsZWQgaG9zdCBmdW5jdGlvbiBjYWxsOiBjb21wdXRlX2hhc2hfc2hhMjU2AA==
Let’s run both of those though a soroban XDR decoder.
echo -n 'AAAAAAAAAAAAAAAAAAAAAgAAAAAAAAADAAAADwAAAAdmbl9jYWxsAAAAAA0AAAAgGPZIazX/ejI7S/m0Qotqa3VC0emAZdPLwmyyAG/iMAoAAAAPAAAAA3J1bgAAAAAQAAAAAQAAAAYAAAADAAAF3AAAAAEAAAABAAAAAQAAAAEAAAAB' | soroban lab xdr dec
error: the following required arguments were not provided:
--type <TYPE>
Usage: soroban lab xdr decode --type <TYPE> [FILES]...
For more information, try '--help'.
F.
I gotchu bro.
echo -n 'AAAAAAAAAAAAAAAAAAAAAgAAAAAAAAADAAAADwAAAAdmbl9jYWxsAAAAAA0AAAAgGPZIazX/ejI7S/m0Qotqa3VC0emAZdPLwmyyAG/iMAoAAAAPAAAAA3J1bgAAAAAQAAAAAQAAAAYAAAADAAAF3AAAAAEAAAABAAAAAQAAAAEAAAAB' | soroban lab xdr guess
DiagnosticEvent
echo -n 'AAAAAAAAAAAAAAAAAAAAAgAAAAAAAAADAAAADwAAAAdmbl9jYWxsAAAAAA0AAAAgGPZIazX/ejI7S/m0Qotqa3VC0emAZdPLwmyyAG/iMAoAAAAPAAAAA3J1bgAAAAAQAAAAAQAAAAYAAAADAAAF3AAAAAEAAAABAAAAAQAAAAEAAAAB' | soroban lab xdr dec --type DiagnosticEvent --output json-formatted
{
"in_successful_contract_call": false,
"event": {
"ext": "v0",
"contract_id": null,
"type_": "diagnostic",
"body": {
"v0": {
"topics": [
{
"symbol": "fn_call"
},
{
"bytes": "18f6486b35ff7a323b4bf9b4428b6a6b7542d1e98065d3cbc26cb2006fe2300a"
},
{
"symbol": "run"
}
],
"data": {
"vec": [
{
"u32": 1500
},
"void",
"void",
"void",
"void",
"void"
]
}
}
}
}
}
Mkay, that’s just an event holding the values of my request, not information about its actual execution. Let’s try the No.2 DiagnosticEvent
XDR event.
echo -n 'AAAAAAAAAAAAAAABGPZIazX/ejI7S/m0Qotqa3VC0emAZdPLwmyyAG/iMAoAAAACAAAAAAAAAAIAAAAPAAAABWVycm9yAAAAAAAAAgAAAAcAAAAFAAAADgAAAE9lc2NhbGF0aW5nIGVycm9yIHRvIFZNIHRyYXAgZnJvbSBmYWlsZWQgaG9zdCBmdW5jdGlvbiBjYWxsOiBjb21wdXRlX2hhc2hfc2hhMjU2AA==' | soroban lab xdr dec --type DiagnosticEvent --output json-formatted
{
"in_successful_contract_call": false,
"event": {
"ext": "v0",
"contract_id": "18f6486b35ff7a323b4bf9b4428b6a6b7542d1e98065d3cbc26cb2006fe2300a",
"type_": "diagnostic",
"body": {
"v0": {
"topics": [
{
"symbol": "error"
},
{
"error": {
"budget": "exceeded_limit"
}
}
],
"data": {
"string": "escalating error to VM trap from failed host function call: compute_hash_sha256"
}
}
}
}
}
Alright nice, kind of, this is an diagnostic error log but failed host function call: compute_hash_sha256
does not help me much as far as knowing what to do or how bad things actually are.
And tbh this is the crux of why I’m writing this post. Knowing that something failed isn’t enough. I need to know both what failed and how badly it failed. This is especially true when limits are in question. This is very especially true in the case of blockchain debugging. In most of computer programming we don’t have to worry about resource limits like this. We’re not thinking about the CPU, memory or storage costs. These aren’t your typical restrictive ceilings. We normally just pay more and move on. With blockchains and smart contracts however we operate in far harsher, foreign environments. We’re coding in shoeboxes which means we’ll be hitting errors we aren’t used to handling and which traditional tooling is often ill equipped to surface. And so I’m writing this post to show you how I’m getting around these discomforts.
Error in RUNNING TEST FOR MEM
is essentially the same as RUNNING TEST FOR CPU
with the exception that the second event gives us failed host function call: bytes_push
which is expected, but also unhelpful.
The RUNNING TEST FOR SET
and RUNNING TEST FOR GET
simulations actually suceeded however the submission failed with error AAAAAAX14WT////vAAAAAA==
which the Stellar Laboratory tells me means txSorobanInvalid
...which is about as helpful as vim is to a frontend developer.
Fun fact if you switch up the 20
SET value for 50
you may get the elusive and befuddling TRY_AGAIN_LATER
error. Which I still have no idea what it actually means.
The RUNNING TEST FOR EVENTS
is actually quite helpful and gets us all the way through the index_1.ts
with this final output, [ "total events size exceeds network config maximum", 8268n, 8198n ]
. This gives us a specific limit we’ve exceeded “total events size” and also what the limit is 8198n
and by how much we exceeded it 8268n
. This is super useful and it’s all thanks to the fact that transaction submissions in local networks send along an array of DiagnosticEvent
which we can decode to understand what happened in the case of failure. The issue here is we have to actually get to transaction submission which as we’ve noted may be no small feat when pushing limit boundaries. So our next target for testing our contracts for cost efficiency will be to ensure simulation passes so we can get the helpful event
data from transaction submission to debug our limitation woes.
Oh and also…RUNNING TEST FOR TXN
. That’s the same error as the SET
and GET
. AAAAAAX14WT////vAAAAAA==
or txSorobanInvalid
.
Next file please!
index_2.ts
Much of this file is the same as the first so I’ll only highlight the notable differences.
let args = [
[1, 1500, 'u32', 'CPU'],
[1, 200, 'u32', 'MEM'],
[20, 20, 'u32', 'SET'],
[40, 40, 'u32', 'GET'],
[0, 1, 'u32', 'EVENTS'],
[Buffer.alloc(71_680), Buffer.alloc(71_680), 'bytes', 'TXN'],
]
At the heart of this index_2.ts
file is the attempt to get simulation to pass successfully in order to then increase just the arg limits and finally then submit a valid, but maxed out on the limits, transaction. For us to accomplish this we need a transaction which does not just fail straight away. For that we’re going to introduce a two step process. Step 1 will simulate with very low, achievable limits, then step two we’ll update just the function arg values of the submission to the larger limit breaking values. It feels a little fragile, and it is, but it may allow us to achieve our goal. So if the ends justify the means then step 2 is acceptable in my book.
The initial part of the run
function is nearly identical with the major caveat that we’ll first simulate with the smallArgs
argument and then we’ll re-create a new transaction with bigArgs
but the simulated sorobanData
.
const sorobanData = simRes.transactionData
.setResourceFee(100_000_000)
.setResources(100_000_000, 133_120, 66_560)
.build();
const tx = TransactionBuilder
.cloneFrom(simTx)
.clearOperations()
.addOperation(Operation.invokeContractFunction({
contract: contractId,
function: 'run',
args: bigArgs
}))
.setSorobanData(sorobanData)
.build()
This block echos index_1.ts
and mirrors its own prior TransactionBuilder
, however rather than magically mushing together the simTx
and simRes
via assembleTransaction
we delicately snipe bits from each into a new transaction. Initially we create the sorobanData
from the simRes.transactionData
and then arbitrarily reset the ResourceFee
and Resources
to something outrageously large just to ensure we don’t get stuck there. We want to give ourselves the best chance to get to a meaningful error by getting past simulation and any nonsensical txSorobanInvalid
response. Next we create a final valid tx
by building a new Operation
with the bigArgs
rather than the initial smallArgs
that simulation ran. This is also why we need to manually reset the fee and resources numbers. The smallArgs
simulation will have estimated fees far below what we’ll actually need when running bigArgs
.
Finally we setSorobanData
on the new tx
with our modified sorobanData
from the simulation and we’ve finally got a hopefully valid transaction ready for signing and submitting. Let’s give it a go!
bun run index_2.ts
[ "operation instructions exceeds amount specified", 100035663n, 100000000n ]
[ "operation memory usage exceeds network config limit", 42131506n, 41943040n ]
errorResultXdr: "AAAAAAX14WT////vAAAAAA==",
errorResultXdr: "AAAAAAX14WT////vAAAAAA==",
[ "total events size exceeds network config maximum", 8268n, 8198n ]
errorResultXdr: "AAAAAAX14WT////vAAAAAA==",
Hmm, only marginally better tbh. If we wind up able to submit we’re able to get helpful errors but it’s still relatively simple to get identical errors for completely different underlying exceeded limits. It’s a pity simulation doesn’t give us more useful errors, I’m hoping that one day soon it will. For now though it seems you’ll need to find a way to build a valid transaction and then get your errors from the DiagnosticEvent
events
in the TransactionMeta
. Good luck 🙂 (or could there already be a better way 🤔)
I haven’t really covered yet how to go from the resultMetaXdr
to a useful error so here’s the code that gives us the output for all the DiagnosticEvents
.
xdr.TransactionMeta
.fromXDR(getRes.resultMetaXdr, 'base64')
.v3()
.sorobanMeta()
?.diagnosticEvents()
.forEach((event) => {
console.log(
scValToNative(event.event().body().v0().data())
)
})
Basically it just looks at the transaction response, looks for a resultMetaXdr
XDR string and then breaks down into that object looking for all the diagnosticEvents
inside the sorobanMeta
. It’s a little messy admittedly, but it’s there and the @stellar/stellar-sdk
library is fully typed so it’s not too bad to trace it out. You’ll notice too for the RUNNING TEST FOR SET
we never get to this stage as the transaction itself is still invalid for some reason.
¯\(ツ)/¯
Please note this method is really quite brittle. Trying to use a simulated response from a transaction that’s actually quite different from the final submission will lead to a lot of frustrations. If you touch different keys or your auth is different in any way between simulation and submission the execution will fail. You can see this in practice by noticing for the GET
and SET
tests I have to use identical values between both the small and big args. This is due to simulation actually needing to write ledger keys in those calls. If simulation says we’re only writing 1 key but submission sees from the args we’re trying to write 20 it will fail. For these reasons I cannot recommend this method. It’s fascinating and knowing how it works will help you understand how Soroban transaction simulation and submission works under the hood, but in practice it’s just really unreliable.
For my final trick though I have one more card I’d like to play to make all our troubles go away. I call it. index_3.ts
.
index_3.ts
For our final piece of work to evaluate the costs of invoking smart contract we first need to shutdown our local testnet quickstart Docker container. Once we’ve killed it let’s restart it but rather than using the ./docker-testnet.sh
let’s use ./docker-unlimited.sh
. If you take a quick peek in there you’ll notice only one seemingly insignificant difference. --limits unlimited
vs --limits testnet
. Turns out this difference makes all the difference in the world.
Our index_3.ts
begins familiarly enough. Essentially identical to index_1.ts
. It’s not until the run function we’ll notice something surprising. We’re only running simulation! And then we’re making a single final call to some await sorobill(simRes)
function. So no submission!? That’s right!
Remember all we really need to do to properly measure our limits is to get simulation to pass, the problem is we can’t when running in the “shoebox” mode that testnet, futurenet and mainnet run in. Thus we were needing to get all the way through to submission to get ahold of meaningful DiagnosticEvent
error logs. Turns out though observing successful simulation vs failed simulation or submission will go miles (or kilometers, I see you Europe!) farther towards something useful. Why? Simple. I need to know both what failed and how badly it failed.
Even in our current best case scenario with the index_2.ts
method our errors are only going to tell us what failed at the moment of failure not how badly it failed or could fail. e.g. I want to loop 1500
times which may cost 1B CPU but if the network is limited to 100M CPU it will fail at the moment I go past 100M. I may spend hours improving a contract only to discover I’m still miles behind the target. It then becomes a vain game of whac-a-mole trying to pass failures never really knowing just how far I’m off until I finally get passed the current error. To make matters worse I’ll only see one error at a time. I may spend a week improving CPU only to find the next error is memory and then some storage or return value and pretty soon I’m rolling sushi.
What I really need is a successful response way outside the bounds that I can then compare against a known list of live network limits. And so it becomes clear why --limits unlimited
becomes our silver bullet. Let’s run index_3.ts
.
bun run index_3.ts
(failed)bun run index_3.ts
(for reals)It’s so beautiful I might cry. --limits unlimited
lifts the curtain on the boundaries and just let’s me go nuts and then reasonably count the costs in post. Let’s walk through each one and observe what the issues are.
RUNNING TEST FOR CPU
--------------------------
{
cpu_insns: 108084087,
mem_bytes: 3766382,
entry_reads: 2,
entry_writes: 0,
read_bytes: 10016,
write_bytes: 0,
events_and_return_bytes: 4,
min_txn_bytes: undefined,
max_entry_bytes: undefined,
max_key_bytes: 48,
}
--------------------------
cpu_insns: 108084087,
👀
According to the docs we’re limited to 100M CPU, this is >108M. So that’ll be an issue
RUNNING TEST FOR MEM
--------------------------
{
cpu_insns: 16823577,
mem_bytes: 43363970,
entry_reads: 2,
entry_writes: 0,
read_bytes: 10016,
write_bytes: 0,
events_and_return_bytes: 4,
min_txn_bytes: undefined,
max_entry_bytes: undefined,
max_key_bytes: 48,
}
--------------------------
mem_bytes: 43363970,
🤨
Memory limit is 40MB. This is 43MB. That’ll be an issue.
RUNNING TEST FOR SET
--------------------------
{
cpu_insns: 4811546,
mem_bytes: 2385448,
entry_reads: 2,
entry_writes: 21,
read_bytes: 10016,
write_bytes: 68452,
events_and_return_bytes: 4,
min_txn_bytes: undefined,
max_entry_bytes: undefined,
max_key_bytes: 352,
}
--------------------------
RUNNING TEST FOR GET
--------------------------
{
cpu_insns: 5150343,
mem_bytes: 2596913,
entry_reads: 43,
entry_writes: 0,
read_bytes: 143508,
write_bytes: 0,
events_and_return_bytes: 4,
min_txn_bytes: undefined,
max_entry_bytes: undefined,
max_key_bytes: 352,
}
--------------------------
entry_writes: 21,
🤦♀️
entry_reads: 43,
😖
write_bytes: 68452,
🤪
read_bytes: 143508,
💔
max_key_bytes: 352,
😵💫
Docs clearly say 20 is the write count limit, 40 is the read count limit, 65 kb is the write byte limit and 130 kb is the read byte limit. We’re leaking past all of those. Also the docs don’t currently say it but storage keys have max size (max_key_bytes
) of 200 bytes.
RUNNING TEST FOR EVENTS
--------------------------
{
cpu_insns: 4174146,
mem_bytes: 1887435,
entry_reads: 2,
entry_writes: 0,
read_bytes: 10016,
write_bytes: 0,
events_and_return_bytes: 8272,
min_txn_bytes: undefined,
max_entry_bytes: undefined,
max_key_bytes: 48,
}
--------------------------
events_and_return_value_size: 8272,
😳
We have 8KB (8192 bytes) to work with not 8272. Rules is rules.
See how easy that was!? And fast too as it doesn’t actually need to submit anything to the network, just simulate. The sorobill code is also really simple, we’re just dissecting the simulation and adding up the various values.
“BUT WAIT!” You say. “What about RUNNING TEST FOR TXN
?” Good catch. You’ve probably actually also noticed the undefined
on both the min_txn_bytes
and max_entry_bytes
. Turns out to test for those limits you’ll actually need to submit the tx. Simulation doesn’t include the entry values, just the keys so we’d need to lookup the data from the TransactionMeta
of a successfully submitted transaction in order to get max_entry_bytes
. For the min_txn_bytes
we need to craft a fully valid tx. We could measure it off the pre-simulated tx but that wouldn’t be accurate as it doesn’t have the sorobanData
or any signature info. Good news however! sorobill
actually takes a simulation object as well as an optional transaction. I’ve included an index_4.ts
to illustrate this. It looks pretty similar to index_2.ts
(with the exception that we jam all the invocations into a single tx) and here’s its final output.
{
cpu_insns: 130351778,
mem_bytes: 47448018,
entry_reads: 43,
entry_writes: 21,
read_bytes: 143508,
write_bytes: 68452,
events_and_return_bytes: 8272,
min_txn_bytes: 76132,
max_entry_bytes: 66920,
max_key_bytes: 352,
}
Amazing! 😇
Can you imagine trying to piecemeal debug this unholy monstrosity of a contract invocation slowly error by error? Literally all the limits overflow! However via the unlimited quickstart and sorobill you can quickly, at a glance, see you’ll need to focus a bit of time primarily on CPU as it’s nearly 30M over. Memory may just come down naturally as you fix other problems. Ledger entry writes is an obvious area for improvement and the return value isn’t going to budge so both of those are also two areas you may want to start fixing first.
I hope it’s clear how debugging contract costs this way is astronomically superior to any of our previous methods. Actual network costs, fast, clear, cumulative, full spectrum.
Like what you’re reading? Find these posts valuable? Drop me a follow and let me know what you’d like me to write about next.