Understanding Argo Workflows: Practical Guide [2024]

What Is Argo Workflows? 

Argo Workflows is an open-source container-native workflow engine that can orchestrate parallel jobs on Kubernetes. It is part of the Argo project, a widely used GitOps platform for Kubernetes, which has achieved Graduated status in the Cloud Native Computing Foundation (CNCF). Argo Workflows allows users to define workflows using YAML, enabling the execution of tasks in a defined order or simultaneously. 

Argo Workflows is well suited for handling complex job orchestration, offering scalability and flexibility for various use cases, including data processing, machine learning pipelines, and CI/CD automation. By leveraging Kubernetes, it ensures effective management of resources and seamless integration with cloud-native environments.
You can get Argo Workflows from the official GitHub repo.

Argo Workflows Concepts 

Here are some of the central concepts in Argo Workflows. 

Workflows

A workflow is a series of tasks that are executed according to a specified order, which can be either sequential or parallel. Each task represents an individual step in the workflow process and can perform a variety of functions, such as running a container, executing a script, or manipulating resources. 

Workflows are defined using a sequence of steps or a Directed Acyclic Graph (DAG) structure, which ensures that tasks are executed in the correct order based on their dependencies. This allows users to create intricate workflows with conditional logic, loops, and branching paths. 

Templates

Templates in Argo Workflows are reusable, parameterized components that define the details of each task within a workflow. A template specifies key elements such as the container image to use, the commands to execute, and any required inputs or outputs. By encapsulating these details, templates provide a modular approach to workflow design, promoting reuse and consistency across different workflows. 

There are several types of templates available, including container templates for running Docker containers, script templates for executing scripts, and resource templates for creating and managing Kubernetes resources. The flexibility and reusability of templates make it easier to manage and scale workflows, as common patterns and tasks can be defined once and reused.

Argo Workflows UI

The Argo Workflows UI is a web-based interface that provides users with tools for managing and monitoring their workflows. It offers real-time visualization of workflows, displaying the status of each task and the overall progress. Users can use the UI to start new workflows, view logs, resubmit failed tasks, and track resource utilization. 

The UI also supports advanced debugging features, allowing users to drill down into individual tasks to diagnose and resolve issues quickly. This visual representation and interactive capability make it much easier to understand the execution flow, identify bottlenecks, and ensure that workflows are running as expected.

How Does Argo Workflows Work? 

Argo Workflows uses Kubernetes to manage and orchestrate the execution of containerized tasks. Users define their workflows using YAML files, specifying each step in a sequence or parallel configuration. These workflows are then submitted to the Argo Workflows controller, a Kubernetes Custom Resource Definition (CRD) controller that interprets the workflow definition and manages its execution.

Here’s an overview of the process:

  1. Workflow definition: Users create a YAML file that defines the workflow, specifying tasks, dependencies, and execution order. This definition includes the structure of the workflow as a sequence of steps or Directed Acyclic Graph (DAG).
  2. Submission to controller: The workflow YAML is submitted to the Argo Workflows controller within a Kubernetes cluster. The controller reads the workflow specification and creates corresponding Kubernetes resources to manage the execution of each task.
  3. Task execution: Each task in the workflow is executed as a Kubernetes pod. The controller handles the scheduling and resource allocation for these pods, ensuring they run according to the defined workflow order and dependencies.
  4. Monitoring and management: Throughout the execution, the controller monitors the status of each task, managing retries for failed tasks based on user-defined policies. Users can interact with the Argo Workflows UI or CLI to monitor progress, view logs, and manage running workflows.
  5. Completion and cleanup: Upon successful completion of all tasks, the controller cleans up the resources used, ensuring efficient resource utilization within the Kubernetes cluster.
expert-icon-header

Tips from the expert

Itiel Shwartz

Co-Founder & CTO

Itiel is the CTO and co-founder of Komodor. He’s a big believer in dev empowerment and moving fast, has worked at eBay, Forter and Rookout (as the founding engineer). Itiel is a backend and infra developer turned “DevOps”, an avid public speaker that loves talking about things such as cloud infrastructure, Kubernetes, Python, observability, and R&D culture.

In my experience, here are tips that can help you better utilize Argo Workflows:

Leverage workflow templates for DRY principles:

Utilize Argo’s ability to create reusable workflow templates. By defining common task sequences as templates, you can avoid repetition, promote consistency, and ease maintenance across multiple workflows.

Use parameterization for flexibility:

Design your workflows to accept parameters. This makes your workflows more versatile and adaptable to different datasets, environments, or conditions, reducing the need for multiple hardcoded workflows.

Adopt artifact repositories:

Use artifact repositories like Minio or S3-compatible storage for managing workflow outputs. This ensures persistent storage and easy access to workflow results, enhancing data sharing and collaboration.

Employ conditional logic and loops:

Make use of Argo’s conditional logic and looping features to handle dynamic and iterative tasks within workflows. This is particularly useful for complex data processing or ML model training pipelines.

Implement retry strategies:

Define retry strategies for tasks to handle transient failures gracefully. This includes setting retry limits, backoff intervals, and handling different types of failure scenarios.

Argo Workflow vs Airflow

Argo Workflows and Apache Airflow are both popular tools for workflow orchestration, but they cater to different environments and use cases:

  • Architecture: Argo Workflows is built specifically for Kubernetes and uses Kubernetes CRDs to define and manage workflows. It leverages the native Kubernetes environment, making it suitable for cloud-native applications. Apache Airflow is not tied to Kubernetes but can be deployed on various platforms. Airflow uses its own scheduler and executor to manage tasks, and it relies on a central database to track the state of workflows.
  • Deployment: Argo Workflows operates directly within a Kubernetes cluster, requiring only Kubernetes for deployment. It integrates seamlessly with other Kubernetes tools, making setup and maintenance easier for teams already using Kubernetes. Airflow similarly offers a Kubernetes Operator which lets it operate entirely within a Kubernetes cluster. However, while Airflow is known as a Kubernetes-friendly solution, it is not Kubernetes-native, which might make it more difficult to manage compared to Argo Workflows.
  • Scalability and flexibility: Argo Workflows is highly scalable due to its deep integration with Kubernetes. It can scale workflows horizontally by running tasks as individual pods, enabling efficient resource management and large-scale parallelism. Airflow can also scale but typically requires more manual tuning of the scheduler and worker nodes. Argo’s Kubernetes-native approach provides a more straightforward path to scaling workflows across a cluster, whereas Airflow’s scalability depends on its distributed architecture and task queue configurations.
  • Use cases: Argo Workflows excels in cloud-native environments, especially where containerized applications are deployed. It is well-suited for tasks like continuous integration/continuous deployment (CI/CD) pipelines, data processing, and machine learning workflows. Airflow is more commonly used for traditional data engineering tasks, such as ETL (Extract, Transform, Load) jobs, and scheduling tasks that require complex dependencies across different systems.  

Use Cases for Argo Workflows

Argo Workflows is suitable for the following applications.

Infrastructure Automation

Argo Workflows can be used to automate various infrastructure management tasks, such as provisioning resources, deploying applications, and managing updates. By defining workflows that encapsulate these processes, DevOps teams can ensure that infrastructure changes are executed in a controlled and repeatable manner. 

For example, a workflow can be created to automate the deployment of a multi-tier application, including setting up the necessary Kubernetes resources, configuring networking, and deploying application components. This automation reduces manual effort and minimizes the risk of errors in infrastructure management.

Batch Processing and Data Ingestion

The tool allows users to define workflows that automate the ingestion of large datasets, processing them in parallel to speed up data transformation and analysis. By orchestrating tasks such as data extraction, transformation, and loading (ETL), Argo Workflows ensures that large-scale data processing pipelines are efficient and reliable.

The ability to scale tasks horizontally across a Kubernetes cluster makes it particularly well-suited for handling large volumes of data.

Machine Learning Model Training

Training machine learning models often involves complex pipelines that include data preprocessing, model training, evaluation, and deployment. Argo Workflows enables the automation of these pipelines, helping data scientists define and manage their workflows easily. 

With support for parallel task execution, it can handle hyperparameter tuning and model training across multiple configurations simultaneously. This not only accelerates the model development process but also ensures reproducibility and consistency across different runs.

Tutorial: Getting Started with Argo Workflows 

This tutorial is adapted from the official Argo documentation

Prerequisites

Before you begin, ensure you have a Kubernetes cluster and kubectl configured to access it. For testing purposes, you can use a local Kubernetes cluster with one of the following tools:

  • minikube
  • kind
  • k3s or k3d
  • Docker Desktop

These tools allow you to set up a Kubernetes cluster locally on your development machine, providing a suitable environment to install and test Argo Workflows.

Install Argo Workflows

To install Argo Workflows, follow these steps:

1. Define the version you want to install by setting the ARGO_WORKFLOWS_VERSION environment variable. Specify the desired version number. For example:

    ARGO_WORKFLOWS_VERSION="v3.5.8"

    2. Apply the quick-start manifest to set up Argo Workflows in your Kubernetes cluster:

      kubectl create namespace argo
      kubectl apply -n argo -f
      "https://github.com/argoproj/argo-workflows/releases/download/${ARGO_WORKFLOWS_VERSION}/quick-start-minimal.yaml"

      This command creates a new namespace called argo and deploys Argo Workflows using a minimal configuration.

      Install the CLI

      To install the Argo Workflows CLI, follow these steps:

      1. Visit the Argo Workflows GitHub Releases page to download the latest version of the CLI compatible with your operating system.
      2. Extract the downloaded file and move the binary to a directory included in your system’s PATH. For example, on Linux or macOS, you can do this with the following commands:
        • tar -zxvf argo-linux-amd64.tar.gz
        • sudo mv ./argo /usr/local/bin/
      3. Verify the installation by running argo version. This command should display the version of the Argo Workflows CLI, confirming that it has been successfully installed.

      Using the CLI, you can easily interact with Argo Workflows by submitting workflow specifications, listing current workflows, retrieving details of specific workflows, and viewing logs. The CLI provides syntax checking, user-friendly output, and simplifies the interaction process compared to using kubectl commands.

      Submit an Example Workflow

      You can submit workflows to Argo in different ways, such as through the CLI or the UI. 

      To submit via the CLI:

      1. Use the following command to submit a “Hello World” example workflow. The --watch flag monitors the workflow execution and reports its status:
      argo submit -n argo --watch 
      https://raw.githubusercontent.com/argoproj/argo-workflows/main/examples/hello-world.yaml
      1. To list all submitted workflows, run argo list -n argo. This command shows all workflows with unique names starting with hello-world- followed by random characters.
      2. To review the details of the latest workflow run, use argo get -n argo @latest.
      3. To observe the logs of the latest workflow run, use argo logs -n argo @latest.

      To submit via the UI:

      1. Forward the server’s port to access the Argo Workflows UI:
      kubectl -n argo port-forward service/argo-server 2746:2746
      1. Open your browser and navigate to https://localhost:2746. Note that the URL uses https and not http. You may encounter a TLS error due to the self-signed certificate, which you will need to manually approve.
      2. In the UI, select + Submit New Workflow and then select Edit using full workflow options. An example workflow should already be present in the text field. Click on + Create to initiate the workflow.

      Simplifying Kubernetes Management with Komodor

      Komodor is the Continuous Kubernetes Reliability Platform, designed to democratize K8s expertise across the organization and enable engineering teams to leverage its full value.

      Komodor’s platform empowers developers to confidently monitor and troubleshoot their workloads while allowing cluster operators to enforce standardization and optimize performance. Specifically when working in a hybrid environment, Komodor reduces the complexity by providing a unified view of all your services and clusters.

      By leveraging Komodor, companies of all sizes significantly improve reliability, productivity, and velocity. Or, to put it simply – Komodor helps you spend less time and resources on managing Kubernetes, and more time on innovating at scale.

      If you are interested in checking out Komodor, use this link to sign up for a Free Trial.