1 - Contributing Records
How connectors contribute entity instance records to BitBroker
All the data being managed by a BitBroker instance, enters the system via the Contribution API. The process of contributing such data is documented in detail in this section.
In this section, we will consider the basic use case of contributing entity instance records. Later sections of this documentation will detail how you can contribute live, on-demand data and timeseries data.
Contributing data is tightly bound with the concepts of
entity types and their associated
data connectors. All contributions happen in the context of these important system elements. It is vital that you fully understand these and other
key concepts before using this API to contribute records.
A quick way to get going building your own data connectors is to adapt the
example connectors which have been built for a range of data sources.
All API calls in BitBroker require
authorization. The sample calls below contain a placeholder string where you should insert your
contributor API authorization token. This authorization token should have been provided to you by the coordinator user who created your data connector within BitBroker.
The sample calls in this section will not work as-is. Contributor API calls require the use of session IDs, which are generated on-demand. Hence, the sample calls here are merely illustrative.
Contributing Records to the Catalog
We will assume for the purposes of this section that an entity type and it’s associated data connector have been created and are present within the system. Further, that the connector ID and authorization token, which were obtained when the data connecter was created, have been recorded and are available.
Data can now be contributed into the catalog by this data connector, but within the context of its parent entity type only. Hence, we say that a single connector contributes “entity instance records”. If one organization wants to contribute data to multiple entity types, then they must do this via multiple data connectors.
The process of contributing entity instance records into the catalog breaks down into three steps:
- Create a data contribution session
- Upsert and/or delete records into this session
- Close the session
These steps are achieved via an HTTP based API, which we outline in detail below. Each data connector will have a private end-point on this API which is waiting for its contributions.
It is important to understand the distinction between
data and
metadata in the context of the BitBroker instance. It is an expectation that only metadata is being contributed into the catalog and that live data is kept back for on-demand requests. This distinction is
outlined in more detail in the key concepts documentation.
It is important to understand that data connectors might be in a
live or
staged state. That is, their contribution might be approved for the live catalog, or might be being held back into a staging space only. This concept is
outlined in more detail in the key concepts documentation. There is a
mechanism available in the
Consumer API which allows data connectors to see how their records will look alongside other existing public records.
If your connector is marked as “non-live”, your data contribution will
not become visible to
consumers. If you want to make your connector “live”, you must ask the coordinator user who created the connector for you.
Sessions
Sessions are used by the Contribution API to manage inbound data coming from the community of data connectors. Sessions allow the connectors to contribute entity instance records in well-defined ways, which are respectful of the state management of the source data store.
BitBroker supports three types of sessions: stream, accrue and replace. Each one provides for different update and delete contexts.
You can only have one session open at a time. If you open a new session without closing a previous one, the previous one is implicitly closed with a rollback request.
The three types of session provide for different application logic in the following areas:
- Whether data is available to consumers whilst the session is still open or only after is it closed.
- Whether the data provided within a session adds to or replaces earlier data from your connector.
Here is the detail of how each session type functions:
Area |
Stream |
Accrue |
Replace |
Data visibility |
as soon as posted |
on session close |
on session close |
Data from previous session |
in addition to |
in addition to |
replaces entirely |
Let’s explore each of these in more detail:
Stream Sessions
Stream sessions are likely to be the default mode of operation for most data connectors. Inbound entity instance records arrive in the catalog as soon as they are posted and whilst the session remains open. They are immediately available to consumers to view via the Consumer API.
New records are in addition to existing records in the catalog and removal must be explicitly requested. Closing a stream session is a moot operation, since the session type is essentially an “open pipe” into the catalog. In fact, stream sessions can be opened and left open indefinitely.
Type |
Session |
Action |
stream |
open |
session data is already visible, in addition to previous data |
stream |
close
true |
no operation - session data is already visible, in addition to previous data |
stream |
close
false |
no operation - session data is already visible, in addition to previous data |
Accrue Sessions
Accrue sessions are useful when entity instance records should only become visible as complete sets. In this scenario, the entity instance records contributed within a session, only become visible via the Consumer API when the session is closed - and hence only as a complete set.
New records are in addition to existing records in the catalog and removal must be explicitly requested. When you close an accrue session, you must specify a commit state as true
or false
. Closing the session with true
makes the contributed records visible in the Consumer API, but closing it with false
will discard all the records contributed within that session.
Type |
Close |
Action |
accrue |
open |
session data not visible, but previous data is |
accrue |
close
true |
session data now becomes visible, in addition to previous data |
accrue |
close
false |
session data is discarded and previous data persists |
Replace Sessions
Replace sessions are useful when contributed entity instance records should completely replace the set provided in previous sessions. In this scenario, the entity instance records contributed within a session, become visible via the Consumer API when the session is closed as a complete set - but all the records contributed in earlier sessions are discarded. Replace sessions are useful when you cannot maintain state about earlier contributions, and hence each contribution is a complete statement of your record set.
New records are in replacement for existing records in the catalog and removal of these “old” records is implicit. When you close an accrue session, you must specify a commit state as true
or false
. Closing the session with true
makes the contributed records visible in the Consumer API and deletes records from previous sessions. However, closing it with false
will discard all the records contributed within that session and previously contributed records will remain untouched.
Type |
Close |
Action |
replace |
open |
session data not visible, but previous data is |
replace |
close
true |
session data now becomes visible and replaces all previous data |
replace |
close
false |
session data is discarded and previous data persists |
As you can see, picking the right session type is vitally important to ensure you make the best use of the catalog. In general, you should aim to use a stream
type session where you can, as this is the simplest.
If you don’t want clients to be able to see intermediate updates in the catalog, then accrue
and replace
may be better options. Where you don’t want to (or can’t) store any state about what you previously sent to the catalog, then replace
is probably the best option.
Using Sessions
There are only three HTTP calls which your data connectors need make in order to contribute records into the catalog.
Opening a Session
New sessions can be created by issuing an HTTP/GET
to the /connector/:cid/session/open/:mode
end-point.
In order to open a session, you must know the connector ID (cid
). This should have been communicated to you by the coordinator user who created your data connector within BitBroker.
You will also need to select one of the three session modes
from stream
, accure
and replace
. These should be specified in lowercase and without any spaces.
curl http://bbk-contributor:8002/v1/connector/9afcf3235500836c6fcd9e82110dbc05ffbb734b/session/open/stream \
--include \
--header "x-bbk-auth-token: your-token-goes-here"
This will result in a response as follows:
HTTP/1.1 200 OK
The body of this response will contain a session ID (sid
), which should be recorded as it will be needed for subsequent API calls. For example:
4527eff4-d9cf-41c0-9ecc-8e06b57fcf54
Posting Records in a Session
Once you have an open session, you can post two types of actions to it in order to manipulate your catalog entries.
upsert
to update or insert a record into the catalog
delete
to remove an existing record from the catalog
You can only make changes to your own records within the catalog. Your data connector will have no effect on records which came from other connectors - even if you share an entity type with them.
Entity instance records can be upserted or deleted by issuing an HTTP/POST
to the /connector/:cid/session/:sid/:action
end-point.
In order to post record actions, you must know the connector ID (cid
). This should have been communicated to you by the coordinator user who created your data connector within BitBroker. You must also know the session ID (sid
), which was returned in the previous step where a session was opened.
Finally, you will also need to select one of the two valid actions
from upsert
and delete
. These should be specified in lowercase and without any spaces.
curl http://bbk-contributor:8002/v1/connector/9afcf3235500836c6fcd9e82110dbc05ffbb734b/session/4527eff4-d9cf-41c0-9ecc-8e06b57fcf54/upsert \
--request POST \
--include \
--header "Content-Type: application/json" \
--header "x-bbk-auth-token: your-token-goes-here" \
--data-binary @- << EOF
[ ]
EOF
You can specify upsert and/or delete record actions, but these cannot be mixed into a single API call. However, you can upsert and delete as many times as you wish within an open session.
Your upsert and delete actions will be executed in the strict order in which they were sent. You can safely upsert and then delete an entity instance within a single session boundary, if you so wish.
Care should be taken to ensure that the session ID (
sid
) used to post updates is the ID which was returned in the last call to
open a session. If you send an old, invalid or mismatched session ID, it will result in an
HTTP/1.1 403 Forbidden
response. This will have no impact on any existing open session.
In the example above, we upsert an empty array - this is obviously not useful. Let’s now look in detail about how records are inserted, update and deleted using this API call.
Upserting records
When you post an upsert request, you should include an array of entity instances in JSON format within your post body. Each record can contain the following attributes:
Attribute |
Necessity |
Validation Rules |
id |
required |
String between 1 and 64 characters long |
name |
required |
String between 1 and 64 characters long |
entity |
required |
An object conforming to the entity schema for this entity type |
instance |
optional |
An object containing other, ancillary information |
Only expected attributes will be stored within the catalog. Any other attributes which are sent will simply be ignored.
It is important to understand the difference between the three classes of attributes which you can be present within each entity instance record:
Global Attributes
These attributes are required to be present for entity instance in the system, regardless of its entity type. This set consists of only these attributes:
Attribute |
Description |
id |
Your domain key for this entity instance |
name |
A human-readable name describing this entity instance |
Entity Attributes
These attributes are required to be present for entity instance in the system, of a given entity type. This set of attributes will have been communicated to you by the coordinator user who created your connector within BitBroker. It will presented in the form of a JSON schema.
Instance Attributes
These attributes only exist for a given entity instance in the system. This is a free format object which can be used to store additional or ancillary information.
This simple hierarchy of three classes (global
, entity
and instance
) is designed to give consumers maximum assurance about which data can be expected to be available to them:
- They can always expect to find the
global
data present
- They have firm expectations about data availability within an
entity
type
- They understand that
instance
data is ad-hoc and cannot be relied upon
If any record within the posted record set contains a validation error, then the entire set will be rejected. The call will return an
HTTP/1.1 400 Bad Request
response and the body of the response will contain details of every record validation error which was encountered in the
standard validation error format.
The catalog will decide whether to insert or update your record based upon the domain key which you supplied in the id
field of each posted record. If a record already exists with this key, it will be updated - otherwise it will be inserted.
Your records are scoped to be within your data connector space only. You cannot affect records delivered by another data connector, even within the same entity type and even if you have a clashing key space.
Here is the post body for an example upsert request for a set of three records:
[
{
"id": "GB",
"name": "United Kingdom",
"entity": {
"area": 242900,
"calling_code": 44,
"capital": "London",
"code": "GB",
"continent": "Europe",
"currency": {
"code": "GBP",
"name": "Sterling"
},
"population": 66040229
}
},
{
"id": "IN",
"name": "India",
"entity": {
"area": 3287263,
"calling_code": 91,
"capital": "New Delhi",
"code": "IN",
"continent": "Asia",
"currency": {
"code": "INR",
"name": "Indian Rupee"
},
"population": 1344860000
},
"instance": {
"independence": 1947
}
},
{
"id": "BR",
"name": "Brazil",
"entity": {
"area": 8547403,
"calling_code": 55,
"capital": "Brasilia",
"code": "BR",
"continent": "South America",
"currency": {
"code": "BRL",
"name": "Brazilian Real"
},
"population": 209659000
},
"instance": {}
}
]
Whenever records are upserted into the catalog, it will return a report to the caller with information about how each posted record was processed. For example, for the three records above, you might get a report such as:
{
"GB": "5ebb30afaa6ce33843b00bbff63f63b90e91028c",
"IN": "917d0311c687e5ffb28c91a9ea57cd3a306890d0",
"BR": "d5fa7d9d8e4625399da7771fc0e3e87886f2a5ac"
}
In the report, you will see a row for every record that was posted, alongside the BitBroker key which is
being used for this entity instance. This is the key which consumers will use in order to retrieve this record via the Consumer API.
There is no expectation that you need to store this consumer key, if you do not wish to do so. You should continue to simply use your own domain key for your catalog interactions.
Deleting records
When deleting records from the catalog, you need to simply post an array of your domain keys for the records to be removed. These should be the same domain keys you specified when you upserted the records. For example, to remove two of the records upserted in the previous step, the post body would need to be:
Whenever records are deleted from the catalog, it will return a report to the caller with information about how each posted ID was processed. For example, for the two IDs above, you might get a report such as:
{
"GB": "5ebb30afaa6ce33843b00bbff63f63b90e91028c",
"BR": "d5fa7d9d8e4625399da7771fc0e3e87886f2a5ac"
}
In the report, you will see a row for every ID that was posted, alongside the BitBroker key which was
being used for this (now removed) entity instance. This is the key which consumers will have used in order to retrieve this record via the Consumer API.
If you post a domain key to delete a record which does not exist in the catalog, this will simply be ignored.
Closing a Session
After entity instance records have been posted, you can be close a session by issuing an HTTP/GET
to the /connector/:cid/session/:sid/close/:commit
end-point.
In order to post record actions, you must know the connector ID (cid
). This should have been communicated to you by the coordinator user who created your data connector within BitBroker. You must also know the session ID (sid
), which was returned in the previous step where a session was opened.
Finally, you will also need to select one of the two valid commits
from true
and false
. These should be specified in lowercase and without any spaces.
curl http://bbk-contributor:8002/v1/connector/9afcf3235500836c6fcd9e82110dbc05ffbb734b/session/4527eff4-d9cf-41c0-9ecc-8e06b57fcf54/close/true \
--include \
--header "x-bbk-auth-token: your-token-goes-here"
This will result in a response as follows:
HTTP/1.1 200 OK
The exact mechanics of closing a session depends on the type of session that specified when it was opened. This was covered in detail in the earlier section on session types.
2 - Hosting a Webhook
How to use webhooks to incorporate live and on-demand data
It is an expectation that the BitBroker catalog contains information which is useful to enable search and discovery of entity instances. Hence, it contains key metadata - but it does not normally contain actual entity data. This is pulled on-demand via a webhook hosted by the data connector who contributed the entity record.
The distinction between data and metadata is covered in more detail in the key concepts documentation. Depending on how data and metadata is balanced in a BitBroker instance, there may or may not be a requirement to host a webhook.
In this section, we will outline how to implement a webhook within a data container.
A quick way to get going integrating a webhook into your own data connector is to adapt the
example connectors which have been built for a range of data sources.
It is permitted and valid for one webhook to be servicing the needs of multiple data connectors. Sufficient inbound information will be provided to allow the webhook to be clear about which entity instance data is being request.
Registering your Webhook
The first step is to register your webhook with BitBroker. This is done when the connector is created or can be done later by updating the connector. These actions are part of the Coordinator API and hence can only be performed by a coordinator user on your behalf.
Your webhook should be an HTTP server which is capable of receiving calls from the BitBroker instance. You can host this server in any manner you like, however the coordinator of your BitBroker may have their own hosting and security requirements of it.
You need to maintain your webhook so that it is always available to its connected BitBroker instance. If your webhook is down or inaccessible when BitBroker needs it, this will result in a poor experience for consumers using the Consumer API. In this scenario, they will only see partial records. Information about misbehaving data connectors will be available to coordinator users.
Required End-points
You are required to implement two end-points as part of your webhook deployment.
Whilst BitBroker advertises its own key space to its
consumers, there is no need for data connectors to take heed of these. They can continue to concern themselves with only their own domain key space. When BitBroker makes requests of your webhook, it will only ever use its own key space.
Whenever your webhook is called, it will be in the context of an
on-demand request - meaning that the call is in the direct line of response to a waiting user of the
Consumer API. Hence, you should endeavor to respond to webhook calls in a timely manner. Information about poorly performing data connectors will be available to coordinator users.
Entity End-point
The entity end-point is used by BitBroker to get a full data record for an entity instance which you previously submitted into the catalog.
The entity end-point has the following signature:
HTTP/GET /entity/:type/:id
Where:
Attribute |
Presence |
Description |
type |
always |
The entity type ID, for this entity instance |
id |
always |
Your own domain key, which you previously submitted into the catalog |
The entity type is presented here to allow for scenarios where one webhook is servicing the needs of multiple data connectors.
In response to this call, you should return a JSON object consisting of an entity
and instance
attribute only - all other attributes will be ignored. The object you return will be merged with the catalog record, which you provided earlier. Hence, there is no need to resupply the catalog information you have already submitted in previous steps.
For example, consider this (previously submitted) catalog record:
{
"id": "GB",
"name": "United Kingdom",
"type": "country",
"entity": {
"area": 242900,
"calling_code": 44,
"capital": "London",
"code": "GB",
"continent": "Europe",
"currency": {
"code": "GBP",
"name": "Sterling"
},
"population": 66040229
},
"instance": {
"independence": 1066
}
}
If there is a call for the detail of this record made on the Consumer API, the system will callback on the entity end-point as follows:
HTTP/GET /entity/country/GB
Then the webhook should respond with any extra / live / on-demand entity
and instance
data:
{
"entity": {
"inflation": 4.3
},
"instance": {
"temperature": 18.8
}
}
The system will then merge this live information with the catalog record to send a combined record to the consumer.
{
"id": "GB",
"name": "United Kingdom",
"type": "country",
"entity": {
"area": 242900,
"calling_code": 44,
"capital": "London",
"code": "GB",
"continent": "Europe",
"currency": {
"code": "GBP",
"name": "Sterling"
},
"population": 66040229,
"inflation": 4.3 // this has been merged in
},
"instance": {
"independence": 1066,
"temperature": 18.8 // this has been merged in
}
}
Timeseries End-point
The timeseries end-point is used by BitBroker to get a timeseries information associated with an entity instance previously submitted into the catalog.
Not all entity type will have timeseries associated with them. When they do, then this callback is vital, since no timeseries data points are held within the catalog itself. Only the existence of timeseries and key metadata about them is stored.
The timeseries end-point has the following signature:
HTTP/GET /timeseries/:type/:id/:tsid?start=:start&end=:end&limit=:limit
Where:
Attribute |
Presence |
Description |
type |
always |
The entity type ID, for this entity instance |
id |
always |
Your own domain key, which you previously submitted into the catalog |
tsid |
always |
The ID of the timeseries associated with this entity instance |
start |
sometimes |
The earliest timeseries data point being requested When present, an ISO 8601 formatted date |
end |
sometimes |
The latest timeseries data point being requested When present, an ISO 8601 formatted date |
limit |
always |
The maximum number of timeseries points to return An integer greater than zero |
Further information about the possible URL parameters supplied with this callback are:
Attribute |
Information |
start |
Should be treated as inclusive of the range being requested When not supplied, assume a start from the latest timeseries point |
end |
Should be treated as exclusive of the range being requested When present, this will always after the start Never present without start also being present When not supplied, defer to the limit count |
limit |
Takes precedence over the start and end range The end may not be reached, if limit is breached first |
Then the webhook should respond timeseries data points as follows:
[
{
"from": 1910,
"to": 1911,
"value": 5231
},
{
"from": 1911,
"to": 1912,
"value": 6253
},
// other timeseries points here
]
Where:
Attribute |
Necessity |
Description |
from |
required |
An ISO 8601 formatted date |
to |
optional |
When present, an ISO 8601 formatted date |
value |
required |
A valid JSON data type or object |
You should return your timeseries points with the latest first. Taking the first item of a returned array, should always represent the latest data point.
Specifying both from
and to
is rare - in most cases, only a from
will be present. You can place any data type which makes sense for your timeseries in the value
attribute. But this should be consistent across all the timeseries points you return.