SISCI-API specification

dolphin_logo.png

Preface

Clusters of commodity processors interconnected by a fast shared network are an attractive option to build economically large multiprocessor systems, clusters and embedded type control systems.

The Dolphin Express SCI and Dolphin Express DX interconnect technology provides a non coherent distributed shared memory architecture.

ESPRIT Project 23174 (Software Infrastructure for Shared-Memory Cluster Interconnects, "SISCI") has set itself as a goal to define a common Application Programming Interface ("API") to serve as a basis for porting major applications to heterogeneous multi vendor shared memory platforms.

The SISCI software and underlying drivers simplifies the process of building shared memoory based applications. The built in resource management enables multiple concurrent SISCI programs to coexist and operate independent of each other.

The functional specification of the API presented in this document is defined in ANSI C.

Introduction

This introductory chapter defines some terms that are used throughout the document and briefly describes the cluster architecture that represents the main objective of this functional specification.

Dolphins implementation of the SISCI API comes with an extensive set of example and demo programs that can be used as a basis for a new program or to fully understand the detailed aspect of SISCI programming. Newcomers to the SISCI API is receommend to study these programs carefully before making important architecture decitions. The demo and expample programs are found in the source in DIS/src/SISCI/cmd/example

The SISCI API is available in user space. Dolpin is offering the same functionality in kernel space, defined by the GENIF Kernel interface. The definition of the GENIF intrface can be found in the source DIS/src/IRM_DSX/drv/src/genif.h The SISCI driver itself, found in DIS/src/SISCI/src is a good example how to use the GENIF Kernel interface.

Dolphin is offering extensive support and assistance to migrate your application to the SISCI API. Please contact your sales representative for more information or email sisci-support@dolphinics.com.

Basic Concepts

Processor

Host

DX adapter

SCI adapter

Fabric

Local segment

Remote segment

Connected segment

Mapped segment

Reflective Memory

IO Device.

Peer to Peer communicaition

Cluster Architecture

The basic elements of a cluster are a collection of hosts interconnected. A host may be a single processor or an SMP containing several CPUs. Adapters connect a host to a fabric. It is possible for a host to be connected to several fabrics. This allows for the construction of complex topologies (e.g. a two dimensional mesh). It may also be used to add redundancy and/or improve the bandwidth. Usually such architectures are obtained by using several adapters on one host.

Adapters often contain subunits that implement specific SISCI API functions such as transparent remote memory access, DMA, message mailboxes or interrupts. Most subunits have CSR registers that may be accessed locally via the host adapter interface or remotely over the SCI or DX fabric.

SISCI API Data types

This Application Programming Interface covers different aspects of the shared memory technology and how it can be accessed by a user. The API items can then be grouped in different categories. The specification of functions and data types, presented in Chapter 3, is then preceded by a short introduction of these categories. For an easier consultation of the document, for each category a list of concerned API items is provided.

Data formats

Parameters and return values of the API functions, other than the data types introduced in the following sections, are expressed in the machine native data types. The only assumption is that ints are at least 32 bits and shorts are at least 16 bits. If, in the future, a specific size or endianness is needed, the Shared-Data Formats shall be used.

Descriptors

Working with remote shared memories, DMA transfers and remote interrupts, the major communication features that this API offers, requires the use of logical entities like devices, memory segments, DMA queues. Each of these entities is characterize by a set of properties that should be manage as a unique object in order to avoid inconsistencies. To hide the details of the internal representation and management of such properties to an API user, a number of descriptors have been defined and made opaque: their contents can be referenced with a handle and can be modified only through the functions provided by the API.

The descriptors and their meaning are the following:

sci_desc
It represents an SISCI virtual device, that is a communication channel with the driver. Many virtual devices can be opened by the same application. It is initialized by calling the function SCIOpen.

sci_local_segment
It represents a local memory segment and it is initialized when the segment is allocated by calling the function SCICreateSegment.

sci_remote_segment
It represents a segment residing on a remote node. It is initialized by calling either the function SCIConnectSegment or the function SCIConnectSCISpace.

sci_map
It represents a memory segment mapped in the process address space. It is initialized by calling either the function SCIMapRemoteSegment or the function SCIMapLocalSegment.

sci_sequence
It represents a sequence of operations involving communication with remote nodes. It is used to check if errors have occurred during a data transfer. The descriptor is initialized when the sequence is created by calling the function SCICreateMapSequence.

sci_dma_queue
It represents a chain of specifications of data transfers to be performed using the DMA engine available on the adapter. The descriptor is initialized when the chain is created by calling the function SCICreateDMAQueue.

sci_local_interrupt
It represents an instance of an interrupt that an application has made available to remote nodes. It is initialized when the interrupt is created by calling the function SCICreateInterrupt.

sci_remote_interrupt
It represents an interrupt that can be triggered on a remote node. It is initialized by calling the function SCIConnectInterrupt.

sci_block_transfer
It represents an asynchronous transfer of a block of data. It is initialized when the function SCITransferBlockAsync is invoked.

Each of the above descriptors is an opaque data type and can be referenced only via a handle. The name of the handle type is given by the name of the descriptor type with a trailing _t.

No automatic cleanup of the resources represented by the above descriptors is performed, rather it should be provide by the API client*. Resources cannot be released (and the corresponding descriptors deallocated) until all the dependent resources have been previously released. The dependencies between resource classes can be derived by the function specifications.

Flags

Nearly all the functions included in this API accept a flags parameter in input. It is used to obtain from a function invocation an effect that slightly differs from its default semantics (e.g. choosing between a blocking and a non-blocking version of an operation).

In Chapter 3 each function specification is followed by a list of accepted flags. Only the flags that change the default behaviour are defined. Several flags can be ORed together to specify a combined effect. The flags parameter, represented with an unsigned int, has then to be considered a bitmask.

Most of the functions do not accept any flag. The parameter is nonetheless left in the specification, because it could become useful in view of future extensions, and the implementation shall check it to be 0.

A flag value starts with the prefix SCI_FLAG_.

Errors

Most of the API functions return an error code as an output parameter to indicate if the execution succeeded or failed. The error codes are collected in an enumeration type called sci_error_t. Each value starts with the prefix SCI_ERR_. The code denoting success is SCI_ERR_OK and an application should check that each function call returns this value.

In Chapter 3 each function specification is followed by a list of possible errors that are typical for that function. There are however common or very generic errors that are not repeated every time, unless they do not have a particular meaning for that function:

SCI_ERR_NOT_IMPLEMENTED

SCI_ERR_ILLEGAL_FLAG

SCI_ERR_FLAG_NOT_IMPLEMENTED

SCI_ERR_ILLEGAL_PARAMETER

SCI_ERR_NOSPC

SCI_ERR_API_NOSPC

SCI_ERR_HW_NOSPC

SCI_ERR_SYSTEM

Each function requiring a local adapter number can generate the following errors:

SCI_ERR_ILLEGAL_ADAPTERNO

SCI_ERR_NO_SUCH_ADAPTERNO

SCI_ERR_NO_SUCH_NODEID

SCI_ERR_ILLEGAL_NODEID

Other data types

Besides the data types specified in the previous sections others are used:

General functions

In order to use correctly the network, an application is required to execute some operations like opening or closing a communication channel with the SISCI driver. For using effectively the network an application may also need some information about the local or a remote node.

Shared Memory

The SCI and DX Interface implements a remote shared memory approach in the data transfers between processors: an application can map into its own address space a memory segment actually residing on another node; then read and write operations from or to this memory segment are automatically and transparently converted by the hardware in remote operations. This API provides full support for creating and exporting local memory segments, for connecting to and mapping remote memory segments, for checking whether errors have occurred during a data transfer.

The functions included in this category actually concern three different aspects:

Memory management:

Connection management:

Shared memory operations:

Memory and connection management functions affect the state of a local segment, whose state diagram is shown in figure below.

state_diagram_loc_seg.png

State diagram for a local segment

The state of a remote segment, shown in figure below, depends on what happens on the network or on the node where the segment physically resides. The transitions sci_segment_cb_reason_t are marked with callback reasons between the remote segment states.

state_diagram_rem_seg.png

State diagram for a remote segment. The transitions are marked with callback reasons. SCIDisconnectSegment can be called from each state to exit the state diagram.

Direct Memory Access (DMA)

The drawback of the shared memory approach to data transfers is that the CPU is busy reading or writing data from or to remote memory (programmed I/O). An alternative is to use the DMA engine available on the adapter. The application (i.e. the CPU) specifies a queue of data transfers and passes it to the DMA engine. Then the CPU is free either to wait for the completion of the transfer or to do something else. In the latter case it is possible to specify a callback function that is invoked when the transfer has finished. DMA has high startup cost compared to using PIO and is normally only recommended for larger transfers.

Interrupts

Triggering an interrupt on a remote node should be considered a fast way to notify an application running remotely that something has happened. An interrupt is identified by a unique number and this is practically the only information an application gets when it is interrupted, either synchronously or asynchronously.

Device to Device transfers

SISCI supports setting up general IO devices to communicate directly, device to device - peer to peer communication. The alternative model, using the main memory as the intermediate buffer has significant overhead. The devices communicating can be placed in the same host or in different hosts interconnected by the shared memory fabric.

Please contact Dolphin for more information.

Privileged Operations

One of the guidelines of the specification of this API has been to encapsulate most of the details of the underlying hardware and of the low-level software. If this approach helps in avoiding inconsistent use of the available resources, it could reduce the flexibility in using the technology and this might be unacceptable for certain applications. It could then be useful to have some functions to access low-level features of the technology. The compromise is to enable this low-level functions only to some .expert. users, while not to provide them to a general user. For the moment the API provides the possibilities to connect to a remote address space window using the direct IO address and to access local and remote CSR spaces.

Generated on Wed Oct 20 10:11:59 2010 for SISCI-API by  doxygen 1.6.3