Following SCAPE Azure v.1.0 components are the building blocks of implemented architecture:
- Authentication is in charge of all the User Authentication (e.g. user profile and authentication)
With Service Authentication we want to ensure that external services can communicate securely with internal services currently running in SAZ.
- SCAPE Azure Execution Layer is responsible for running and managing all the operations and logging within SAZ.
- Content Representation Layer is metadata layer which is describing Data, Reports, Logs and Workflows. It maps stored data and metadata in SQL Azure.
- Tools and Resources Layer represents our Action services and tools we are using for Characterization, Conversion, Comparison and Reporting.
- Data store is virtually unlimited storage in BLOB.
|Figure 1. Architecture components of SCAPE Azure v.1.0.|
SCAPE Azure v.1.0 is implemented so that data is stored in the Azure BLOB storage, Tables, and SQL Azure. Conversion functions leverage SharePoint 2010 Word Automation Services. The service communication is facilitated through WCF Services. It is also possible to communicate directly with the BLOB storage via REST.
Figure 2 shows the details of the implemented architecture. Data is placed in the blob storage. Conversion and comparison functions are implemented as worker roles. SharePoint is placed in a VM environment and the Word Automation Services are leveraged to convert document formats. Reporting services are under development. They will aggregate processing information, ranging from system performance related to ingest, conversion, and comparison, to qualitative data about the quality of the conversion, based on different techniques.
|Figure 2. System architecture of SCAPE Azure v.1.0.|
Legend for the Figure 2, describing all the scenarios that are supported by SCAPE Azure Architecture:
[A] Client has direct access to BLOB storage (virtually unlimited storage) via a REST API. This improves responsiveness since there is no need for access services hosted on the server
[B] SQL Azure database stores user profiles and profile management information. Alternative database solutions can be employed either locally or within the cloud, e.g. MySQL)
[C] A worker role (processing node) encapsulates a number of discrete actions such as data processing functions, diagnostics, analysis, QA methods, etc.
- The actions of a worker role can be exposed externally and internally
- Scalability is attained by replicating Worker Roles. One can instantiate any number of processing nodes for conversion, analysis, comparison, QA or other operations on the data
- External endpoints can make use of the Azure load balancer. For internal endpoints, the most applicable solution can be employed by the Domain Model
[D] Temporary local storage within the worker roles can be employed when performing analysis or conversion. That eliminates the need for continuous communication with the Blob storage system and improves performance and reliability
[E] Using a WCF endpoint as a proxy it is possible to run legacy 32 bit applications within worker roles enabling legacy software to be employed and scaled as necessary. Worker roles run within 64-bit environment by default
[F] VM roles can be hosted with the same scaling and redundancy capabilities as other types of roles within Azure:
- SharePoint Word Automation Services (WAS) have been enabled within the SCAPE portal
- Worker roles make calls to SharePoint service hosted on the VM. The exposed SharePoint service, with access to WAS, is only available via an internal endpoint although could be made external if necessary)
- The SharePoint Service initiates WAS by retrieving document from the BLOB storage area and upon transformation transfers converted document back to BLOB storage
[G] A Second VM hosts OmniPage (note this VM is not graphically represented on Figure 2) to perform the OCR duties before analysis is performed and results transferred to BLOB storage
Figure 2 glossary:
- SQL Azure – cloud-based, scale-out version of MS SQL Server
- Web Role – we used it for frontend (Silverlight client) and overall logic of the system
- Worker Role – we used them for execution of Action services and tools (something like computation nodes in your system)
- Word Automation Services – SharePoint services for batch document conversion
- SharePoint WCF Service – collection of SharePoint Services accessible via Windows Communication Foundation (WCF)
- Virtual Machine Role – virtual machine within Worker Role
This study was performed on an Azure “medium” virtual machine, with two CPU cores and 3.5 GB of memory. Microsoft doesn’t specify the CPU it uses for virtual machines, but the system properties in Windows reported an “AMD Opteron 4171 HE 2.09 GHz 3.50 GB.”
The virtual machines are running Windows Server 2008 R2 SP1.