Behind Chainsight: A Look Into Snapshot Indexer Component

Through Deep Dive into Showcase, we have been able to communicate how to create project features while also explaining the features of Component. In this issue, I would like to focus on Component and introduce its features in summary again, while also discussing the other side of the story.

What is Snapshot Indexer?

Component to bring data for computation and analysis from any data source, internal or external to Internet Computer, into the Platform. The collection process is executed periodically to accumulate targeted data. Internet Computer's HTTPS outcalls allow data from the Web to be stored on the Platform with guaranteed reliability. For more information, please refer to this previous article.

todo: link (Price Oracle by Snapshot Indexer HTTPS)

How to create Snapshot Indexer?

A Manifest must be written to build the Snapshot Indexer. Depending on the destination of the data, the user specifies the Component Type from the following patterns.

Snapshot Indexer EVM: To EVM Compatibility Chain

Snapshot Indexer ICP: In Internet computer

Snapshot Indexer HTTPS: To Web (/Web2)

chainsight_deepdive_into_showcase-img_snapshot_indexer_type.png

Once you have selected the type, specify in the datasource field the data source you wish to retrieve.

# Snapshot Indexer EVM
datasource:
  location:
    id: 6b175474e89094c44da98b954eedeac495271d0f
    args:
      network_id: 1
      rpc_url: https://mainnet.infura.io/v3/...
  method:
    identifier: totalSupply():(uint256)
    interface: ERC20.json
    args: []

# Snapshot Indexer ICP
datasource:
  location:
    id: sample_snapshot_indexer_evm
  method:
    identifier: 'get_last_snapshot : () -> (record { value : text; timestamp : nat64 })'
    ...

# Snapshot Indexer HTTPS
datasource:
  url: https://api.coingecko.com/api/v3/simple/price
  headers:
		...
  queries:
		...

For more information on how to write the manifest, see this previous article.

todo: link (Price Oracle as simple data relay)

How to use Snapshot Indexer

Once Component starts working, there is nothing for the user to do to collect data. For Component to handle this data, users must care how they view the data. Snapshot Indexer provides the following interfaces common to all types.

get_last_snapshot : () -> (Snapshot) query;
get_last_snapshot_value : () -> (SnapshotValue) query;
get_snapshot : (nat64) -> (Snapshot) query;
get_snapshot_value : (nat64) -> (SnapshotValue) query;
get_snapshots : () -> (vec Snapshot) query;
get_top_snapshot_values : (nat64) -> (vec SnapshotValue) query;
get_top_snapshots : (nat64) -> (vec Snapshot) query;

Behind Snapshot Indexer

How does the component get the data?

We have described how users create and work with Snapshot Indexer. Let's take a deeper look at this component to better understand how to work with Snapshot Indexer.

What logic does Snapshot Indexer use to obtain the data in the first place? And how is the periodic execution done automatically? As previously introduced in "Behind Chainsight: How Modules are Created", the canister code of the Chainsight Component is implemented using a rust macro. This macro first generates a function to be executed periodically as a template.

#[ic_cdk::update]
#[candid::candid_method(update)]
async fn index() {
  // Get data from datasource

  // Convert to data structure for Storage

  // Write to Storage
}

After implementing the appropriate logic in this function, the mechanism is designed to be executed periodically.

As for data acquisition, as mentioned earlier, data is collected through Internet Computer's HTTPS outcalls to ensure reliability. In other words, the base calls the IC API for HTTPS outcalls. However, each Snapshot Indexer has a different implementation for pre-processing and post-processing, including proper request construction and conversion to response. Let's take a look at each of them.

In Snapshot Indexer EVM

This Component must make a call to the contract of the EVM-compatible chain. To do this, we use ic-solidity-bindgen, which generates a contract structure for making contract calls from ABI through Internet Computer's API. The structure generated code from this ABI implements functions for calling, so the following code is generated by using the function name to be called and the structure name from this ABI in macro.

let res = ERC20::new(
    Address::from_str(&get_target_addr()).expect("Failed to parse target addr to Address"),
    &web3_ctx().expect("Failed to get web3_ctx"),
)
  .total_supply(None)
  .await
  .expect("Failed to call contract");

The ic-solidity-bindgen introduced here is an OSS library built by Chainsight. If a user wants to make an EVM contract call when generating their own unique canister code, this would be very helpful.

https://github.com/horizonx-tech/ic-solidity-bindgen

In Snapshot Indexer ICP

This Component calls other canisters in Internet Computer. Calls functions specified by other canisters by using the lowest level API of Internet Computer. The following code is generated using macro from the name of the function to be called, etc. As a side note, HTTPS outcalls do not appear in this Component only, since this component communicates internally within the Internet Computer.

async fn call_target_method_to_target_canister(
    target: candid::Principal,
    call_args: CallCanisterArgs,
) -> SnapshotValue {
    let out: ic_cdk::api::call::CallResult<(SnapshotValue,)> =
        ic_cdk::api::call::call(target, "icrc1_total_supply", call_args).await;
    out.expect("failed to call").0
}

In Snapshot Indexer HTTPS

This Component goes outside the Internet Computer and communicates with regular Web servers, etc. The user-specified endpoints and parameters for them are used as macro inputs to generate the following code.

let indexer = Web2HttpsSnapshotIndexer::new(URL.to_string());
let res = indexer
    .get::<String, SnapshotValue>(HttpsSnapshotParam {
        headers: vec![("content-type".to_string(), "application/json".to_string())]
            .into_iter()
            .collect(),
        queries: HashMap::from([
            ("ids".to_string(), "dai".to_string()),
            ("vs_currencies".to_string(), "usd".to_string()),
        ]),
    })
    .await
    .expect("Failed to get by indexer");

The generated code in macro is simple as shown above. The only parameters that change depending on the user are those for the request, such as headers and queries. To avoid complicating the generation by macro, a generic implementation for requests to the Web is made and used. This is the implementation.

pub struct Web2HttpsSnapshotIndexer {
    pub url: String,
    retry_strategy: RetryStrategy,
}
pub struct HttpsSnapshotParam {
    pub queries: HashMap<String, String>,
    pub headers: HashMap<String, String>,
}

impl Web2HttpsSnapshotIndexer {
    pub fn new(url: String) -> Self {
        Self {
            url,
            retry_strategy: RetryStrategy::default(),
        }
    }

    pub async fn get<T, V>(&self, param: HttpsSnapshotParam) -> anyhow::Result<V>
    where
        V: DeserializeOwned + serde::Serialize,
    {
        use crate::web3::processors::TransformProcessor;
        let headers: Vec<HttpHeader> = param
            .headers
            .iter()
            .map(|(k, v)| HttpHeader {
                name: k.to_string(),
                value: v.to_string(),
            })
            .collect();
        let args = CanisterHttpRequestArgument {
            url: build_url(self.url.clone().as_str(), param.queries),
            method: http_request::HttpMethod::GET,
            headers,
            max_response_bytes: None,
            transform: Some(HTTPSResponseTransformProcessor::<V>::new().context()),
            body: None,
        };
        let cycles = http_request_required_cycles(&args);
        let result = retry(self.retry_strategy, || http_request(args.clone(), cycles))
            .await
            .expect("http_request failed");
        let res: V = serde_json::from_slice(&result.0.body)?;
        Ok(res)
    }
}

We have now described the differences in the data acquisition part of each type of Snapshot Indexer. Next, let's look at the data itself to be stored. Before describing the features of Chainsight, it is necessary to understand the storage available at Internet Computer. Internet Computer offers two types of storage, freely selectable.

Heap memory refers to a canister's regular Wasm memory. It is not persisted, and does not store data long-term. A canister's heap memory is cleared whenever a canister is stopped or upgraded.

Stable memory is a feature unique to the Internet Computer Protocol that provides a long-term, persistent data storage option separate from a canister's heap memory. When a canister is stopped or upgraded, the data stored in stable memory is not cleared or removed.

Source: https://internetcomputer.org/docs/current/developer-docs/production/storage

Since the collected data is assumed to be permanently stored, Snapshot Indexer selects Stable memory for storage. Therefore, Component has several considerations to take into account in order to utilize Stable memory. A structure called Snapshot is declared to store the collected data including the execution time. This Snapshot should be a structure that can be Serialized/Deserialized so that it can be stored in Stable memory, and also inter canister calls can be used to pass data to/from other Components.

#[derive(
	Debug,
	Clone,
	candid::CandidType,
	candid::Deserialize,
	serde::Serialize,
	StableMemoryStorable
)]
#[stable_mem_storable_opts(max_size = 10000, is_fixed_size = false)]
pub struct Snapshot {
    pub value: SnapshotValue,
    pub timestamp: u64,
}
type SnapshotValue = (#(#response_types),*);

Originally, an implementation must be added to perform Serialize/Deserialize to utilize Stable memory.

pub trait ic_stable_structures::Storable {
    /// Converts an element into bytes.
    fn to_bytes(&self) -> Cow<[u8]>;
    /// Converts bytes into an element.
    fn from_bytes(bytes: Cow<[u8]>) -> Self;
}
pub trait BoundedStorable: Storable {
    const MAX_SIZE: u32;
    const IS_FIXED_SIZE: bool;
}

Chainsight makes this implementation a derive macro (StableMemoryStorable) to add functionality without separate implementation.

Snapshots that can be stored after the above implementation are added to Stable memory with the execution time preserved.

pub fn add_snapshot(value: Snapshot) -> Result<(), String> {
    let res = SNAPSHOTS.with(|vec| vec.borrow_mut().push(&value));
    res.map_err(|e| format!("{:?}", e))
}

...

#[ic_cdk::update]
#[candid::candid_method(update)]
async fn index() {
	...
	let datum = Snapshot {
	    value: #response_values,
	    timestamp: current_ts_sec,
	};
	let _ = add_snapshot(datum.clone());
	...
}

Now that you understand the logic behind Snapshot Indexer's acquisition and storage of data, it is time to look at how it works! All that remains is to execute this logic on a regular basis. We use Internet Computer's Canister Timer for periodic execution. Canister Timer provides Canister with the ability to schedule a one-time or periodic execution after a specified delay. For more information, please refer to the following links

https://internetcomputer.org/docs/current/developer-docs/backend/periodic-tasks

https://medium.com/dfinity/internet-computer-canister-timers-21cd3201b831

This functionality can be incorporated into your own canister by using ic-cdk-timers from the dfinity library.

https://github.com/dfinity/cdk-rs/tree/main/src/ic-cdk-timers

Snapshot Indexer incorporates ic-cdk-timers as a template, which enables automatic data acquisition/storage execution with the execution cycle specified by the user in the manifest as a parameter.

How do you extract data from Component?

We have explained how data is acquired and stored. This section explains how this stored data can be retrieved. I mentioned earlier about the structure of the data to be stored, but I did not explain much about how it is stored. Storage is used in the form of adding stored data that contains acquired data to a variable-length array. This is the so-called Vector or Array format.

thread_local! {
  ...
	static SNAPSHOTS: std::cell::RefCell<ic_stable_structures::StableVec<Snapshot, MemoryType>>
		= std::cell::RefCell::new(
			ic_stable_structures::StableVec::init(
				MEMORY_MANAGER.with(|mm| mm.borrow().get(
					ic_stable_structures::memory_manager::MemoryId::new(0u8)
				))
			).unwrap()
		);
}

Data is stored in this variable-length array. Since we will be storing data in variable-length arrays, we have several functions for retrieving data.

Let me introduce a few of these functions. get_snapshot can be used to retrieve data existing in a specified index. get_last_snapshot can be used to retrieve the latest data. get_top_snapshot can be used to retrieve a specified number of snaps from the newest data. Since Snapshot also includes execution time in the data, a function called xxx_value is also included to retrieve this data only.

#[ic_cdk::query]
#[candid::candid_method(query)]
fn get_snapshots() -> Vec<Snapshot> {
    _get_snapshots()
}
pub fn _get_snapshots() -> Vec<Snapshot> {
    SNAPSHOTS.with(|mem| mem.borrow().iter().collect())
}
#[ic_cdk::query]
#[candid::candid_method(query)]
fn snapshots_len() -> u64 {
    _snapshots_len()
}
pub fn _snapshots_len() -> u64 {
    SNAPSHOTS.with(|mem| mem.borrow().len())
}
#[ic_cdk::query]
#[candid::candid_method(query)]
fn get_last_snapshot() -> Snapshot {
    _get_last_snapshot()
}
pub fn _get_last_snapshot() -> Snapshot {
    SNAPSHOTS
        .with(|mem| {
            let borrowed_mem = mem.borrow();
            let len = borrowed_mem.len();
            borrowed_mem.get(len - 1)
        })
        .unwrap()
}
#[ic_cdk::query]
#[candid::candid_method(query)]
pub fn get_top_snapshots(n: u64) -> Vec<Snapshot> {
    _get_top_snapshots(n)
}
#[ic_cdk::query]
#[candid::candid_method(query)]
fn get_snapshot(idx: u64) -> Snapshot {
    _get_snapshot(idx)
}
pub fn _get_snapshot(idx: u64) -> Snapshot {
    SNAPSHOTS.with(|mem| mem.borrow().get(idx)).unwrap()
}

With such an implementation, one can retrieve the latest data from the Snapshot Indexer, or five data from the beginning, or any other arbitrary range of data, and use it for calculations. You now understand everything about Snapshot Indexer! With this background, you will be able to work with this Component in a more practical way.

This time, as part of a deep dive into Component, I explained Snapshot Indexer. We hope that this explanation will give you some ideas and insights on how to use Chainsight. Please look forward to the next article.

Written by Megared@Chainsight