Introduction
Azure Service Fabric (ASF) was picked as a microservice provider in the Azure cloud ecosystem. The ASF implementation was made in the middle of the project, so new functionality employed Service Fabric while the existing solution relied on other cloud computing resources. The application domain boundaries were still taking shape as work was conducted. In the end, not all microservice principles were incorporated into the solution. The guiding principle in implementation was modularity of the code. In addition, the solution uses events with the actor model.
Project technology stack
- Front end: React
- Back end: Azure cloud
- App services
- Azure Functions
- Service bus
- Cosmo db
- Azure Service Fabric.
- Storage tables, blobs, and file shares
- C# and .NET core for reliable service development
- MS Azure Service Fabric SDK
- Visual Studio 2019 with .NET 5.0 Runtime and .NET cross-platform development
Azure Service Fabric as a microservice
Azure Service Fabric is a microservice and orchestrator SDK. It can execute containers, executables, and native services. It can run both Java and .NET, and has built-in support for the actor model. ASF supports stateful service and fast recovery within 15 seconds. Simply, any computing instance is an executable program with a library to support communication within the ASF application. One of the key advantages of ASF is flexibility in increasing the processing speed of operations via a change in the configuration.
Microsoft uses ASF internally in multiple cloud services, including:
- SQL Database
- Cosmo DB
- IoT Hub
- Event Hub
- Azure DevOps
- Event Grid
The ASF application consists of services which can wrap:
- Web API
- Actors
- Any executables
- Containers
A service can be further divided into partitions. Increasing the number of partitions of service allows one to increase processing parallelism of the application in ASF. Each partition can have 1, 3 (quorum) or more replicas or instances of stateless services. When there is more than one replica present in a partition, one is always an active instance and the remaining replicas are passive instances.
Finally, in the Service Fabric cluster, all applications are running on physical or virtual machines called nodes. A single cluster can host multiple ASF applications. ASF distributes partitions dynamically as nodes’ number varies due to scaling or failures detection.
Kubernetes as alternative to Azure Service Fabric
Another option for a microservices system is Kubernetes. The table summarizes some differences between Kubernetes and Service Fabric:
Azure Service Fabric | Kubernetes |
Supports stateless and stateful services out of box | Stateless with support of stateful with 3rd party |
Within single application allows mix of executables, containers and native services | Containers |
Built in support for Actor Model | No built in support |
Failure detection in 30 seconds for the worst-case scenario, on average failure recovery takes 2 seconds | Dead node detection in 5 minutes |
Native services can control scaling, configuration and state programmatically | Available through third parties |
In general when implementing ASF based application one will benefit from:
- reliable state
- easier management of the services and nodes through application manifest
- flexibility to run computation in form of executables, native service or container
- easier communication between services through remoting
- quicker response during recovery of a service.
Project
In the ASF implementation project, Web API is a stateless service while other actors and services are stateful. There are no executables or containers involved since all the logic is written natively by leveraging the ASF API.
Web API scales dynamically to any number of nodes available. Other reliable services and actors have a defined instance count set to 10. No node has a replication set.
Project programming model
Actor model
Actor is an entity that:
- executes code containing business logic
- has exclusive access to the storage
- processes incoming messages when ready
- can send messages to other actors
ASF actor model framework
ASF uses reliable service as a basis for the actor model. ASF provides a framework that enables the actor model development through:
- Reliable service
- Reliable state
- Service remoting
Project
The project implementation uses reliable service and remoting. The state is maintained via a DB repository. An actor has a one-to-one relationship with a Cosmo DB container.
Dispatcher
In the implementation, event dispatcher is an ASF reliable service that listens to the serialized events on the message broker, the Azure Service Bus. It has a collection of event handlers. Each type of event handler listens to a specific service bus topic message called an event. Upon the arrival of a message, the event handler activates an actor relevant to the event and invokes the actor’s operation via ASF service remoting. The stream of message balancing is achieved at the dispatcher and then at each actor level.
Authentication and authorization
Service Fabric functionality is exposed via Web API. Service fabric Web API clients are other Web APIs. Consequently, the authentication employed is OAuth with client credential flow, popularly known as JWT Bearer token authentication. The server-side implementation relies on the library Microsoft.Identify.Web to configure web API authentication with:
- AD instance name
- Domain
- Tenant ID
- Client ID
The client-side connection configuration relies on the JWT token in the authorization header. The client obtains an authentication token from authority with these settings:
- Authority
- Secret
- Client ID
Visual Studio solution overview
Solution consists of:
- Standard Service Fabric startup project containing application manifest, application parameters, and published profiles
- Service Fabric API project along with controllers which are the entry point to the solution
- Various actor projects responsible for operations on a specific actor
- Event dispatcher project responsible for serializing and deserializing events
Data flow
- Controller
- Validation
- Actor activation
- Serialization of event and sending to the service bus topic
- Dispatcher
- Handler registrations
- Deserialization of event
- Handling events based on event type
- Actor activation based on event type
- Actors
- Repository
Development and debugging
Visual Studio has a template for the creation of Service Fabric solutions and projects. The addition of a new actor project to a solution simultaneously generates updates in the Service Fabric project. Since each newly added actor or reliable service project is an executable program to run in Service Fabric, it has its own types registrations not limited to:
- Reliable services
- Actors
- Instrumentation
The entire Service Fabric solution is composed of executable processes projects that are reliable services or actors. Consequently, to achieve debugging flow from API through actors, reliable services, and an underlying call to repository requires setting a breakpoint in each new process. In other words, the debugger will not be able to step into service remote method call of the service or actor.
All ASF projects were connected to dev instances of Azure App Insights to help troubleshoot issues in the development process.
Once the “run in debug” button is hit in Visual Studio, code deployment kicks off to the Service Fabric local cluster.
Troubleshooting production
All troubleshooting starts and ends in Azure App Insights. All of the solution API, services, and actors share the same app insights token. All events and trace information is aggregated at a single point. Lastly, any issue correlation starts with a search in App Insights.
Lessons learned
Benefits
- Scaling out
- Communication between service instances
- Service discovery
- Telemetry
- Provision and upgrade
- Local development
- Downtimes
Challenges
- For the development and debugging of ASF, you are limited to Visual Studio only.
- Service Fabric Explorer is sophisticated, so new developers to technology had an initial barrier to entry.
- The debugging process is a handful where you cannot execute the next line from a different reliable service.
- Conducting business flow tests on the project was challenging.
- Data was complicated and had numerous validations.
- The business process is a waterfall. In order to test the final step of the process, it requires the completion of all previous steps.
- While working with a local cluster with 3 partitions, it increases the number of executables by a factor of 3. So, a solution consisting of 11 reliable services becomes 33 processes.
- The configuration of the Service Fabric solution is not trivial since there were multiple end points calling each other.
Conclusion
The development team and customer had an overall positive experience implementing the solution with Azure Service Fabric. The framework provided enough built-in features that allowed for focusing on business functionality while providing a highly scalable solution. And that is the beauty of the framework. Developers focus on what they do the best while increasing performance is abstracted from daily development concerns.