bomonike

anyscale-ray.png We dynamically setup and scale AI apps running locally and within AWS, Azure, GCP, and other clouds, using free open-source software (Ray.io, Anyscale, PyTorch)

Overview

This article defines how you can run a large set of MCP servers serving lots of AI Agents in production mode.

The contribution of this article is use of CI/CD pipline that automates setup using Enterprise Anyscale (Ray.io) locally (on a MacMini) and within AWS, Azure, GCP, Digital Ocean, Hertzer (and other) clouds.

Why Ray / Anyscale?

Ray enables developers to run Python code at scale on Kubernetes clusters by abstracting orchestration on individual machines.

Workloads to use Ray for:

Ray is a high-performance distributed execution framework targets large-scale machine learning and reinforcement learning applications. Ray’s MLOps ecosystem includes features for:

Anyscale.com is the company offering a paid edition of Ray built on top of Ray to provide:

Enterprise governance and security:

Observability & Monitoring: https://docs.anyscale.com/monitoring/tracing/ (OTel integration)

Our project is to scale MCP servers:

Dive in

  1. Signup at ray.io
  2. Take the Intro to Ray by Max (author of OReilly book “Ray: A Distributed Computing Framework for Python”).
    • Project code is at https://github.com/ray-project/ray and https://github.com/ray-project/ray-docker
  3. Anyscale provides free tutorial and free certification (just 10 questions)

  4. Setup our own Ray.io instance locally on Wilson’s MacMini.
  5. Install MCP with agents within our local Ray.io instance.
    • Anthropic Reference MCP
    • Azure MCP https://github.com/Azure/azure-mcp?tab=readme-ov-file
  6. Setup in cloud Digital Ocean or Hetzner.

  7. What useful app? __ written in Python in Docker image
    • https://res.cloudinary.com/dcajqrroq/image/upload/v1716481274/odoo-docker-officialapps-240522_tkt77p.png
    • Calendaring AI - what are available?
    • For Provost office to scour market trends - YouTube video about coding univ camps

Marketing:

  1. Blog for Anyscale as Partners/resellers
  2. Blog for MCP
  3. Article - LinkedIn, DEv.to, Medium
  4. YouTube video
  5. Udemy
  6. Market to colleges

Customers

Customer cases studies:

Ray.io by Anyscale hosted platform

Ray was first developed in 2016 by UC Berkeley’s RISELab (the successor to the AMPLab that created Apache Spark and Mesos). Thir code is now open-sourced (with 36.8k stars) at:

Anyscale leaders

https://www.glassdoor.com/Reviews/Anyscale-CA-Reviews-E3377996.htm rates 4.8/5 stars among 100 employees. Employer pays 99% of health insurance premiums but does not match 401k.

100% approve of CEO Keerti Melkote who joined in July 2024 after being CTO & CEO of Aruba Networks thru to purchase by HPE.

Since 2019, Ion Stonica is co-founder of Ray and the Professor of Electrical Engineering and Computer Science at University of California, Berkeley, after being a CEO & co-founder at Databricks in 2013. VIDEO

https://www.linkedin.com/company/joinanyscale/posts/?feedView=all

https://www.builtinsf.com/company/anyscale at 415-267-9902, SOMA neighborhood: 55 Hawthorne St 9th Floor, San Francisco, CA 94105 Map (across the street from 3-Michelin star Asian restaurant BenuSF.com)


yaml file

VIDEO “Getting Started with Ray Clusters” by Saturn Cloud Mar 2, 2023 shows setup within AWS.

KubeRay

The KubeRay project (https://ray-project.github.io/kuberay/) is used to deploy (and manage) Ray clusters on Kubernetes. KubeRay packages key Ray components into “pods.” From https://bit.ly/ray-arch:

ray-kuberay-1713x628.jpg

A central component of KubeRay is the “KubeRay Operator” responsible for starting and maintaining the lifetime of other Ray pods – headnode pod, worker node pods, and the autoscaler pod (responsible for increasing or decreasing the size of the cluster). In particular, for online serving/service scenarios (which is becoming more popular now), the KubeRay operator is responsible for making sure the Ray Headnode pod is highly available.

RayService for Observability

In KubeRay, creating a RayService will first create a RayCluster and then create Ray Serve applications once the RayCluster is ready.

RayService is a Custom Resource Definition (CRD) designed for Ray Serve.

https://docs.ray.io/en/latest/cluster/kubernetes/troubleshooting/rayservice-troubleshooting.html#observability

Metrics - Logs - Traces - Dashboard - Triggers - Alerts

Components

anyscale-arch-7680x4320.png

Compute Observability

ray-map-798x1058.png

Metrics are reported during training using ray.train.report NOT after every epoch.

The purpose of the Ray take batch method in Ray’s Dataset API is retrieves a specified number of rows from a distributed dataset as a single batch.

Ray Core

Ray Core Quickstart is a general-purpose framework to scale out Python apps with distributed parallelism. Ray Core provides asynchronous/concurrent program execution on a cluster scale, by spanning multiple machines and heterogeneous computing devices, but abstracted away from developers.

The basic application concepts a developer should understand in order to develop and use Ray programs:

Ray’s AIR (AI Runtime)

Built on top of Ray Core, Ray’s AI libraries Quikstart target Workload Optimization. By order of usage during dev lifecycle) scales ML workloads:

https://github.com/ray-project/ray-educational-materials/blob/main/Introductory_modules/Quickstart_with_Ray_AIR_Colab.ipynb

The ScalingConfig utility is used to configure the number of training workers in a Trainer or Tuner.

The ResNet neural network model is used in the PyTorch implementation of the MNIST Classifer.

Sample Code Programming Ray

Ray is a distributed execution framework to scale Python applications.

pip install 'ray[default]'

! pip install -U ray==2.3.0 xgboost_ray==0.1.18

So Ray is “invasive”. Once Ray is used, you’re all in.

Ray is coded as a wrapper around app functions implemented in C++ and Python:

Ray Cluster

In the diagram from this VIDEO:

ray-remote-2202x1114.png

Obserability would involve these metrics displayed over time in dashboards:

Python Code Example

Here’s a Basic Ray Task Example in Python, using Ray.io for distributed computing, incorporating key concepts from the documentation:

import time  # and other standard libraries
import ray

# Initialize Ray runtime (automatically uses available cores):
ray.init(num_cpus=4)

# Convert function to distributed task:
@ray.remote(num_cpus=2, num_gpus=0)
def process_data(item):
    time.sleep(item / 10)  # Simulate work
    return f"Processed {item}"

# Launch parallel tasks
results = ray.get([process_data.remote(i) for i in range(5)])

print(results)
# Output: ['Processed 0', 'Processed 1', 'Processed 2', 'Processed 3', 'Processed 4']

Actor Pattern Example: Usage:

tracker = DataTracker.remote()
ray.get([tracker.increment.remote() for _ in range(10)])
print(ray.get(tracker.get_count.remote()))  # Output: 10
@ray.remote
class DataTracker:
    def __init__(self):
        self.count = 0
    
    def increment(self):
        self.count += 1
    
    def get_count(self):
        return self.count

Advanced Pattern (Data Sharing)

# Store large data in shared memory:
data_ref = ray.put(list(range(1000)))

@ray.remote
def process_chunk(ref, start, end):
    data = ray.get(ref)
    return sum(data[start:end])

# Process chunks in parallel:
results = ray.get([
    process_chunk.remote(data_ref, i*100, (i+1)*100)
    for i in range(10)
])
print(sum(results))  # Sum of all chunks

For cluster deployment, add ray.init(address=’auto’) to connect to existing clusters. The examples demonstrate task parallelism, stateful actors, and data sharing - fundamental patterns for distributed computing with Ray.

https://courses.anyscale.com/courses/take/intro-to-ray/lessons/60941259-introduction-to-ray-serve

VIDEO: Create Docker file for Python program

Debugging and monitoring

Quickstart

Ray Cloud Quickstart enables use of GPUs being managed as clusters of containers managed by Kubernetes.

“Ray meets the needs of ML/AI applications—without requiring the skills and DevOps effort typically required for distributed computing.

Social

Anyscale People

Robert Nishihara

On video October 5 2020 Dean Wampler</a> (deanwampler.com) Head of DevRel at Anyscale, said “The biggest use uer of Ray is Ant Financial in China. It’s like the Stripe/Paypal of China, running thousands of nodes.” He’s now IBM’s chief technical representative to the AI Alliance (@the-aialliance). * Author of “Programming Scala, Third Edition”, 2021 * polyglotprogramming.com/talks

Jules Damji, pormerly at Databrickss, is Lead Developer Advocate at Anyscale, gave these talks:

Competitors:

Simulator

odoo-docker-officialapps-240522.png

Dataset Preprocessors

The Ray Data library provides preprocessors, scalers, and encoders:

A custom preprocessor to output transformed datasets:

import ray
from ray.data.preprocessors import MinMaxScaler
ds = ray.data.range(10)
preprocessor = MinMaxScaler(["id"])
ds_transformed = preprocessor.fit_transform(ds)
print(ds_transformed.take())

Videos

Jonathan Dinu has several videos (until 2021) on the University of Jonathan channel:

Books

Agents

https://youtube.com/shorts/svm_uGBeIm0?si=MfN9c9qaJT74utZs

https://www.youtube.com/watch?v=IfjGP9jIaQ0 Ray AIR Robert Demo 2022