AI/ML Resources

DZone's Featured AI/ML Resources

Shipping Responsible AI Without Slowing Down

By Sarvenaz Ranjbar

In software engineering, launch day rarely fails because a unit test was missing; in machine learning (ML), that’s not the case. Inputs far from training data, adversarial prompts, proxies that drift away from human goals, or an upstream artefact that isn’t what it claims to be can all sink a release. The question is not “can every failure be prevented?” but “can failures be bounded, detected quickly, and recovered from predictably?” Two research threads shape this approach. The first maps where ML goes wrong in production: robustness gaps, weak runtime monitoring, misalignment with real human objectives, and systemic issues across the stack (supply chain, access, blast radius). The second focuses on how teams make decisions that stand up to scrutiny: a deliberative loop that’s open, informed, multi-vocal, and responsive. Put together, the operating model feels like standard software engineering — just opinionated for ML. ML Safety Contract ML safety work can be organized into four clauses. When these are wired into the process, systems become more trustworthy, responsible, and accountable. ClausesDefinitionRobustnessDistribution shift, tail inputs, and obvious misuse should be tested — not just benchmark deltas. “Once-a-year” scenarios should be first-class in evaluation.MonitoringDetection should be treated as a product feature. Systems should recognise when they are out of depth and degrade gracefully without heroics.AlignmentThe human objective should be stated in plain language, the proxies being optimised should be acknowledged, and guardrails should be set for behaviour that must never occur.Systemic safetyPipelines should be reproducible, artefacts signed, access tight, and rollbacks as easy as deploys. The goal is to connect these clauses to machinery already trusted — CI/CD, SRE practices, and product reviews — so no parallel process is created that people route around. The Loop: From Idea to Incident and Back A lightweight safety review should run monthly or on any significant capability change. It acts as a decision log with real inputs. Pre-reads explain what’s changing and why, show evaluation dashboards, and call out potential impact. Product, ML, SRE, security, and support bring different failure modes to the table. Disagreement is documented briefly. Outcomes are actionable: thresholds to set, tests to add, rollouts to stage, owners to assign. Decisions are published because traceability is part of the safety surface. On sprint cadence, that review pairs with two touchpoints: a CI gate that blocks on safety regressions like any other SLO, and a post-incident loop that upgrades evaluations with whatever just failed. The loop doesn’t slow shipping; it prevents shipping the same mistake twice. What Lands in the Repo Three small artifacts make the contract real and reviewable: Human objective. A one-sentence statement at the top of the model card, followed by the proxies being optimized and how those proxies can fail when over-optimized. This paragraph aligns engineers, PMs, and reviewers.Deliberation note. deliberation-note.md should live next to the model. In plain language, it states the change, alternatives considered, who might be affected (including non-users), and what feedback changed the plan. It is short and versioned with code.Policy-as-code SLOs. Gates and alerts should be deterministic. Example SLO policy: YAML # safety-slos.yaml slos: - name: ood_recall_7d objective: "Detect distribution shift before harm." target: ">= 0.90" window: "7d" action_on_breach: "degrade_to_safe_mode" - name: decision_ece_p95_24h objective: "Keep calibration error low on high-impact endpoints." target: "<= 0.05" window: "24h" action_on_breach: "route_to_human_review" - name: never_event_violations objective: "Zero violations of policy-defined 'never events'." target: "== 0" window: "rolling" action_on_breach: "kill_switch" The Pipeline Already Trusted The release path stays familiar: Evaluate. A robustness pack should be run: tail scenarios, simple adversarial sweeps suited to the domain, and checks for hidden functionality (e.g., backdoors in weights or data). Red-team prompts or misuse cases should reflect the product surface.Gate. Two things are required: a green SLO diff and the deliberation note. Artifacts should be signed, and the build reproducible.Deploy. Canary by tenant or traffic slice with clean isolation. A “safe baseline” should be kept warm so rollback is lossless.Observe. OOD and drift signals, calibration telemetry for decision endpoints, and privacy-aware abuse logs should stream into the same on-call rotation as other SEVs.Respond. The playbook should be baked: repeated OOD triggers auto-degrade; any “never event” trips the kill switch. Post-incident, the failure should be converted into a test and added to the robustness pack. A Concrete Rollout Story Consider a claims-triage classifier. Before Launch The human objective is defined (“route risky claims to expert review without delaying legitimate claims”), proxies are listed (AUROC, latency, review rate), and never events are codified (“never auto-deny when uncertainty exceeds threshold Z”). A small robustness pack is assembled: rare claim types, obvious prompt/payload abuse for text components, and a basic backdoor scan on third-party artifacts. The Safety Review trims scope: expert-only for two jurisdictions at first, with per-tenant throttles. Launch Week A canary goes to 10% of traffic for two enterprise tenants. OOD detectors track feature drift; calibration metrics automatically drive the review threshold. A few hours in, OOD triggers for unfamiliar supplier codes. The system degrades to human-review-only for that segment; SREs confirm rather than scramble. After the Incident Those supplier codes are added to the eval pack, and a brittle feature transform that amplified drift is relaxed. The deliberation note is updated with the change and rationale. The next rollout is wider, safer, and documented. What Changes for the Team Observability and predictability around ML behavior increase. Three shifts matter most: (1) objectives and constraints become explicit and reviewable, (2) failures are promoted to code (tests/SLOs) instead of tribal memory, and (3) the kill switch is practiced like disaster recovery. Engineering ships with more confidence; stakeholders can see decisions and the reasoning. Closing Most teams already run CI/CD, SRE, and security reviews. Making ML “safe enough to ship” means threading robustness, monitoring, alignment, and systemic safety through those same muscles, backed by a decision loop that can be explained later. It isn’t slower; it’s saner. More

Running AI/ML on Kubernetes: From Prototype to Production — Use MLflow, KServe, and vLLM on Kubernetes to Ship Models With Confidence

By Boris Zaikin

CORE

Editor's Note: The following is an article written for and published in DZone's 2025 Trend Report, Kubernetes in the Enterprise: Optimizing the Scale, Speed, and Intelligence of Cloud Operations. After training a machine learning model, the inference phase must be fast, reliable, and cost efficient in production. Serving inference at scale, however, brings difficult problems: GPU/resource management, latency and batching, model/version rollout, observability, and orchestration of ancillary services (preprocessors, feature stores, and vector databases). Running artificial intelligence and machine learning (AI/ML) on Kubernetes gives us a scalable, portable platform for training and serving models. Kubernetes schedules GPUs and other resources so that we can pack workloads efficiently and autoscale to match traffic for both batch jobs and real-time inference. It also coordinates multi-component stacks — like model servers, preprocessors, vector DBs, and feature stores — so that complex pipelines and low-latency endpoints run reliably. Containerization enforces reproducible environments and makes CI/CD for models practical. Built-in capabilities like rolling updates, traffic splitting, and metrics/tracing help us run safe production rollouts and meet SLOs for real-time endpoints. For teams that want fewer operations, managed endpoints exist, but Kubernetes is the go-to option when control, portability, advanced orchestration, and real-time serving matter. Let's look into a typical ML inferencing setup using KServer on Kubernetes below: Figure 1. ML inference setup with KServe on Kubernetes Clients (e.g., data scientists, apps, batch jobs) send requests through ingress to a KServe InferenceService. Inside, an optional Transformer pre-processes inputs, the Predictor (required) loads the model and serves predictions, and an optional explainer returns insights. Model artifacts are pulled from model storage (as seen in the diagram) and served via the chosen runtime (e.g., TensorFlow, PyTorch, scikit-learn, ONNX, Triton). Everything runs on Knative/Kubernetes with autoscaling and routing, using the CPU/GPU compute layer from providers such as NVIDIA/AMD/Intel on AWS, Azure, Google Cloud, or on-prem. Part 1: MLFlow and KServe With Kubernetes Let's dive into the practical implementation of an AI/ML scenario. We will use a combination of MLFlow to orchestrate ML processes, scikit-learn to train ML models, and KServe to inference our model in Kubernetes clusters. Introduction to MLFlow MLflow is an open-source ML framework, and we use it to bring order to the chaos that happens when models move from experiments to production. It helps us track runs (parameters, metrics, and files), save the exact environment and code that produced a result, and manage model versions so that we know which one is ready for production. In plain terms, MLflow fixes three common problems: Lost experiment dataMissing environment or code needed to reproduce resultsConfusion about which model is "the" production model; its main pieces — Tracking, Projects, Models, and the Model Registry — map directly to those needs We can also use MLflow to package and serve models (locally as a Docker image or via a registry), which makes it easy to hand off models to a serving platform like Kubernetes. Using MLflow and KServe on Kubernetes MLflow offers a straightforward way to serve models via a FastAPI-based inference server, and mlflow models build-docker lets you containerize that server for Kubernetes deployment. However, this approach can be unsuitable for production at scale; FastAPI is lightweight and not built for extreme concurrency or complex autoscaling patterns, and manual management of numerous inference replicas creates significant operational overhead. KServe (formerly KFServing) delivers a production-grade, Kubernetes-native inference platform with high-performance, scalable, and framework-agnostic serving abstractions for popular ML libraries such as TensorFlow, XGBoost, scikit-learn, and PyTorch. We've created a short step-by-step guide on how to train an ML model with MLflow and scikit-learn, and how to deploy to Kubernetes using KServe. This guide walks you through a complete MLflow workflow to train a linear-regression model with MLflow tracking and perform hyperparameter tuning to determine the best model: Prerequisites – Install Docker, kubectl, and a local cluster (Kind or Minikube) or use a cloud Kubernetes cluster. See Kind/Minikube quickstarts.Install MLflow + MLServer support – Install MLflow with the MLServer extras (pip install mlflow[mlserver]) and review MLServer examples for MLflow.Train and log a model – Train and save the model with mlflow.log_model() (or mlflow.sklearn.autolog()), following the MLflow tutorial.Smoke-test locally – Serve with MLflow/MLServer to validate invocations before Kubernetes: mlflow models serve -m models:/<name> -p 1234 --enable-mlserver. See MLflow models/MLServer examples.Package or publish Option A – Build a Docker image: mlflow models build-docker -m runs:/<run_id>/model -n <your/image> --enable-mlserver → push to a registry.Option B – Push artifacts to remote storage (S3/GCS) and use the storageUri in KServe. Documents and examples can be found here.Deploy to KServe – Create a namespace and apply an InferenceService pointing to your image or storageUri. See KServe's InferenceService quickstartand repo examples. Below is an example (Docker image method + Kubernetes) InferenceService snippet: YAML apiVersion: "serving.kserve.io/v1beta1" kind: InferenceService metadata: name: mlflow-wine-classifier namespace: mlflow-kserve-test spec: predictor: containers: - name: mlflow-wine-classifier image: "<your_docker_user>/mlflow-wine-classifier" ports: - containerPort: 8080 protocol: TCP env: - name: PROTOCOL value: "v2" Verify and productionize – Check Pods (kubectl get pods -n <ns>), call the endpoint, then add autoscaling, metrics, canary rollouts, and explainability as needed (KServe supports these features). The official MLflow documentation also has a good step-by-step guide that covers how to package the model artifacts and dependency environment as an MLflow model, validate local serving with mlserver using mlflow models serve, and deploy the packaged model to a Kubernetes cluster with KServe. Part 2: Managed AutoML: Azure ML to AKS For this example, we selected Azure. However, Azure is just one of many tool providers that can work in this scenario. Azure Machine Learning is a managed platform for the full ML lifecycle — experiment tracking, model registry, training, deployment, and MLOps — that helps teams productionize models quickly. Defining a reliable ML process can be difficult, and Automated ML (AutoML) can simplify that work by automating algorithm selection, feature engineering, and hyperparameter tuning. For low-latency, real-time inference at scale, you can run containers on Kubernetes, the de facto orchestration layer for production workloads. We pick Azure Kubernetes Service (AKS) when we need custom runtimes, strict performance tuning (GPU clusters, custom drivers), integration with existing Kubernetes infrastructure (service mesh, VNETs), or advanced autoscaling rules. If we prefer a managed, low-ops path and don't need deep cluster control, Azure ML's managed online endpoints are usually faster to adopt. We run AutoML in Azure ML to find the best model, register it, and publish it as a low-latency real-time endpoint on AKS so that we keep full control over runtime, scaling, and networking: Prerequisites – Acquire an Azure subscription, an Azure ML workspace, the Azure CLI/ML CLI or SDK, and an AKS Cluster (create one or attach an existing cluster).Run AutoML and pick the winner – Submit an AutoML job (classification/regression/forecast) from the Azure ML studio or SDK and register the top model in the Model Registry.Prepare scoring + environment – Add a minimal score.py (load model, handle request) and an environment spec (Conda/requirements); you can reuse examples from the azureml-examples repo.Attach AKS and deploy – Attach your AKS compute to the workspace (or create AKS), then deploy the registered model as an online/real-time endpoint using the Azure ML CLI or Python SDK.Test and monitor – Call the endpoint, add logging/metrics and autoscaling rules, and use rolling/canary swaps for safe updates. As an example of how AutoML works, I will provide a typical AI/ML pipeline below: Figure 2. Example AI/ML pipeline This ML pipeline contains steps to select, clean up, and transform data from datasets; to split data for training, selecting the ML algorithm, and testing the model; and finally, to score and evaluate the model. All those steps can be automated with AutoML, including several options to deploy models to the AKS/Kubernetes Real-Time API endpoint. Part 3: Serving LLMs on Kubernetes Let's have a look into the combination of LLMs and Kubernetes. We run LLMs on Kubernetes to get reliable, scalable, and reproducible inference: Kubernetes gives us GPU scheduling, autoscaling, and the orchestration primitives to manage large models, batching, and multi-instance serving. By combining optimized runtimes, request batching, and observability (metrics, logging, and health checks), we can deliver low-latency APIs while keeping costs and operational risks under control. To do so, we can use the open-source framework vLLM, which is used when we need high-throughput, memory-efficient LLM inference. On Kubernetes, we run vLLM inside containers and couple it with a serving control plane (like KServe) so that we get autoscaling, routing, canary rollouts, and the standard InferenceService CRD without re-implementing ops logic. This combination gives us both the low-level performance of vLLM and the operational features of a Kubernetes-native inference platform. Let's see how we can deploy LLM to Kubernetes using vLLM and KServe: Prepare cluster and KServe – Provision a Kubernetes cluster (AKS/GKE/EKS or on-prem) and install KServe, following the quickstart.Get vLLM – Clone the vLLM repo or follow the docs to install vLLM and test vllm serve locally to confirm that your model loads and the API works.Create a vLLM ServingRuntime/container – Build a container image or use the vLLM ServingRuntime configuration that KServe supports (the runtime wraps vllm serve with the correct arguments and environment variables).Deploy an InferenceService – Apply a KServe InferenceService that references the vLLM serving runtime (or your image) and model storage (S3/HF cache). KServe will create pods, handle routing, and expose the endpoint.Validate and tune – Hit the endpoint (through ingress/port-forward), measure latency/throughput, and tune vLLM batching/token-cache settings and KServe autoscaling to balance latency and GPU utilization. Last but not least, we can run vLLM, KServe, and BentoML together to get high-performance LLM inference and production-grade ops. Here is a short breakdown: vLLM – the high-throughput, GPU-efficient inference engine (token generation, KV-cache, and batching) — the runtime that actually executes the LLMBentoML – the developer packaging layer that wraps model loading, custom pre-/post-processing, and a stable REST/gRPC API, then builds a reproducible Docker image or artifactKServe – the Kubernetes control plane that deploys the container (Bento image or a vLLM serving image) and handles autoscaling, routing/ingress, canaries, health checks, and lifecycle management How do they fit together? We package our model and request logic with BentoML (image or Bento bundle), which runs the vLLM server for inference. KServe then runs that container on Kubernetes as an InferenceService (or ServingRuntime), giving autoscale, traffic controls, and observability. Pros and Cons of Kubernetes Inference Frameworks for ML We already had a look at the KServe library. However, there are other powerful alternatives. Let's look at the table below: Table 1. KServe alternative tools and libraries LibraryOverviewProsCons Seldon Core Kubernetes-native ML serving and orchestration framework offering CRDs for deployments, routing, and advanced traffic control Kubernetes-first (CRDs, Istio/Envoy integrations); rich routing (canary, A/B); built-in telemetry and explainer integrations; supports multiple runtimes Steeper learning curve; more operational surface to manage; heavier cluster footprint BentoML (with Yatai) Python-centric model packaging and serving; Yatai/Helm lets you run Bento services on Kubernetes as deployments/CRDs Excellent developer ergonomics and reproducible images; fast local dev loop; simple CI/CD image artifacts Less cluster-native controls out of the box (needs Yatai/Helm); autoscaling and advanced Kubernetes ops require extra setup NVIDIA Triton Inference Server High-performance GPU-optimized inference engine supporting TensorRT, TensorFlow, PyTorch, ONNX, and custom back ends Exceptional GPU throughput and mixed-framework support; batch and model ensemble optimizations; production-grade performance tuning Less cluster-native controls out of the box (needs Yatai/Helm); autoscaling and advanced Kubernetes ops require extra setup Conclusion Our goal is to run reliable, low-latency AI/ML in production while keeping control of cost, performance, and repeatability. Kubernetes gives us the orchestration primitives we need — GPU scheduling, autoscaling, traffic control, and multi-service coordination — so that models and their supporting services can run predictably at scale. Paired with optimized runtimes, serving layers, and inference engines, we get both high-inference performance and production-grade operational controls. The result is portable, reproducible deployments with built-in observability, safe rollout patterns, and better resource efficiency. Start small, validate with a single model and clear SLOs, pick the serving stack that matches your performance and ops needs, then iterate. Kubernetes lets you grow from prototype to resilient, scalable serving. This is an excerpt from DZone's 2025 Trend Report, Kubernetes in the Enterprise: Optimizing the Scale, Speed, and Intelligence of Cloud Operations.Read the Free Report More

LLMs at the Edge: Decentralized Power and Control

By Bhanuprakash Madupati

AI Readiness: Why Cloud Infrastructure Will Decide Who Wins the Next Wave

By Aharon Twizer

AI Infrastructure for Agents and LLMs: Options, Tools, and Optimization

By Vidyasagar (Sarath Chandra) Machupalli FBCS

CORE

Why the Principle of Least Privilege Is Critical for Non-Human Identities

Attackers only really care about two aspects of a leaked secret: does it still work, and what privileges it grants once they are in. One of the takeaways from GitGuardian’s 2025 State of Secrets Sprawl Report was that the majority of GitLab and GitHub API keys leaked in public had been granted full read and write access to the associated repositories. Once an attacker controls access to a repository, they can do all sorts of nasty business. Both platforms allow for fine-grained access controls, enabling developers to tightly restrict what every token can and can't do. The question is then, why are teams not following the principle of least privilege for their projects? And what can be done to better secure the enterprise against overpermissioned NHIs? What Is the Principle of Least Privilege? When you join an organization, your IAM team works to ensure you have the right amount of permissions. As a new employee, the team most likely doesn't want to give you immediate access to all production environments or unfettered access to sensitive data. You should be given just the minimum amount of access you need in order to do your work. We generally refer to this as the principle of least privilege. Wikipedia defines this as: "The practice of limiting access rights for users, accounts, and computing processes to only those resources absolutely required to perform their intended function." The principle of least privilege is designed to reduce risk from both internal and external threats. These threats come mainly in the form of identity-based attacks, where an adversarial actor assumes the identity of someone, or something, to abuse the privileges they possess. Limiting access rights means limiting the damage that can be done, containing the blast radius. The concept originated in traditional IT environments focused on human access. In the modern enterprise, humans are far from the majority when it comes to accessing services and resources. Non-human identities (NHIs) now outnumber humans by at least 100-to-1 in most organizations, many of which are only now beginning to recognize the urgent need to better manage these identities and their access at scale. At the same time, teams are finding classic approaches designed around human access, like privileged access management (PAM) and identity governance and administration (IGA), are a poor fit for NHIs. Understanding how to secure machine identities requires a different mindset and different tooling. Machines Are Not Humans A human user might attempt to access a system, and when they discover they can't, they request additional permissions. This is most often done through an access ticket or approval workflow that has organizational checks and balances. It includes oversight from managers and security teams. Machine identities cannot participate in that type of loop. If a machine identity is underprivileged, it will simply fail in production. And if it is overprivileged, it will keep working, even if it poses a serious security risk. Developers often default to broad access grants for one reason: breaking the application or CI/CD pipeline carries a significantly higher risk from their perspective. This is often the unfortunate view from the business as well; the need for uptime comes first, and they only worry about widely scoped permissions during or after a breach. If developers are unsure which permissions a machine identity requires, they would rather grant too much than risk breaking a deployment pipeline or production system. This mindset, while understandable, is dangerous. The Realities Of Large-Scale Systems And Access Even with good intentions, developers may not know what permissions a machine will need until it runs in a real-world context. A service might need to write to a logging bucket, or a Lambda function might require access to several different APIs depending on runtime conditions. The result is often trial-and-error. To avoid failure, engineers grant elevated permissions early, with plans to restrict them later. Unfortunately, that tightening of access rarely happens. Overpermissive machine identities can give attackers lateral movement across environments or full access to critical systems. And these overprivileged identities are rarely reviewed after they are created. Humans Get Audited; NHIs Usually Don't Human access is subject to regular audit. Enterprises check group memberships, login patterns, and behavior anomalies. Periodic reviews ensure that unneeded access is locked down and the principle of least privilege is adhered to. Machine identity access, on the other hand, is often invisible. Machine identities are created during infrastructure provisioning or CI/CD setup, and then left alone. They are rarely reviewed unless something goes wrong. Until then, NHIs are largely forgotten about, in many cases entirely, their long-lived, overpermissioned access keys unaccounted for in any coherent system. The underlying problem of managing permissions is one of visibility. This lack of visibility creates an opportunity for attackers. A leaked API key or cloud token belonging to a machine identity can provide persistent, unnoticed access for months or even years. But even if every NHI and its secrets are accounted for, the sheer number of NHIs makes the traditional auditing approach untenable. Scale Makes The Problem Worse Each microservice, deployment tool, infrastructure component, and cloud workload needs access. Multiply this across multiple pipelines, applications, and teams, and it is easy to see how enterprises have gotten to the 100-to-1 NHI-to-human ratio in just a few short years. Agentic AI Is Intensifying The Problem The rise of agentic AI systems, which operate on behalf of users or orchestrate tasks across systems, has dramatically increased the risk landscape. These agents often inherit the full set of permissions from the human user or other machine accounts they operate through. This delegation model is usually very wide and poorly scoped. When an AI agent acts across systems, it may do so using permissions that were never intended for it. The result is a hidden layer of overprivileged identities that are difficult to monitor and control. This scale overwhelms traditional IAM processes. Manual reviews and spreadsheets cannot keep up. Static policies are often outdated or misapplied, and audit logs become nearly impossible to interpret when thousands of service accounts behave in similar ways. To solve this problem in a scalable and sustainable way, organizations need an approach that enforces permissions without slowing down developers. The IAM team should not be burdened with constant manual oversight. Policies must be enforceable through automation, ensuring consistency across environments. Finally, access must be auditable, with clear and contextual visibility into what each identity, human or machine, is able to do, and why. You Need An Inventory And Visibility First The first step toward enforcing least privilege for machine identities is to understand what identities exist and what they can do. This means having a full inventory of every machine identity, the secrets they use, and the actions they are permitted to take across systems. Without this visibility, organizations cannot implement controls or reduce exposure. Unfortunately, today, many security teams struggle to answer the basic question: "What can this service account access?" An Optimally Permissive Future The principle of least privilege is more important than ever, but applying it to machine identities requires a different strategy. Unlike humans, machines cannot ask for more access, cannot be easily audited with traditional tooling, and operate at a scale that does not fit with human-centric IAM platforms. Security teams need tools that provide full inventory, permission insights, and automation.

By Dwayne McDaniel

Integrating AI Into Test Automation Frameworks With the ChatGPT API

When I first tried to implement AI in a test automation framework, I expected it to be helpful only for a few basic use cases. A few experiments later, I noticed several areas where the ChatGPT API actually saved me time and gave the test automation framework more power: producing realistic test data, analyzing logs in white-box tests, and handling flaky tests in CI/CD. Getting Started With the ChatGPT API ChatGPT API is a programming interface by OpenAI that operates on top of the HTTP(s) protocol. It allows sending requests and retrieving outputs from a pre-selected model as raw text, JSON, XML, or any other format you prefer to work with. The API documentation is clear enough to get started, with examples of request/response bodies that made the first call straightforward. In my case, I just generated an API key in the OpenAI developer platform and plugged it into the framework properties to authenticate requests. Building a Client for Integration With the API I built the integration in both Java and Python, and the pattern is the same: Send a POST with JSON and read the response, so it can be applied in almost any programming language. Since I prefer to use Java in automation, here is an example of what a client might look like: Java import java.net.http.*; import java.net.URI; import java.time.Duration; public class OpenAIClient { private final HttpClient http = HttpClient.newBuilder() .connectTimeout(Duration.ofSeconds(20)).build(); private final String apiKey; public OpenAIClient(String apiKey) { this.apiKey = apiKey; } public String chat(String userPrompt) throws Exception { String body = """ { "model": "gpt-5-mini", "messages": [ {"role":"system","content":"You are a helpful assistant for test automation..."}, {"role":"user","content": %s} ] } """.formatted(json(userPrompt)); HttpRequest req = HttpRequest.newBuilder() .uri(URI.create("https://api.openai.com/v1/chat/completions")) .timeout(Duration.ofSeconds(60)) .header("Authorization", "Bearer " + apiKey) .header("Content-Type", "application/json") .POST(HttpRequest.BodyPublishers.ofString(body)) .build(); HttpResponse<String> res = http.send(req, HttpResponse.BodyHandlers.ofString()); if (res.statusCode() >= 300) throw new RuntimeException(res.body()); return res.body(); } } As you probably have already noticed, one of the query parameters in the request body is the GPT model. Models differ in speed, cost, and capabilities: some are faster, while others are slower; some are expensive, while others are cheap, and some support multimodality, while others do not. Therefore, before integrating with the ChatGPT API, I recommend that you determine which model is best suited for performing tasks and set limits for it. On the OpenAI website, you can find a page where you can select several models and compare them to make a better choice. It will also probably be good to know that custom client implementation can also be extended to support server-sent streaming events to show results as they’re generated, and the Realtime API for multimodal purposes. This is what you can use for processing logs and errors in real time and identifying anomalies on the fly. Integration Architecture In my experience, integration with the ChatGPT API only makes sense in testing when applied to the correct problems. In my practice, I found three real-world scenarios I mentioned earlier, and now let’s take a closer look at them. Use Case 1: Test Data Generation The first use case I tried was a test data generation for automation tests. Instead of relying on hardcoded values, ChatGPT can provide strong and realistic data sets, ranging from user profiles with household information to unique data used in exact sciences. In my experience, this variety of data helped uncover issues that fixed or hardcoded data would never catch, especially around boundary values and rare edge cases. The diagram below illustrates how this integration with the ChatGPT API for generating test data works. At the initial stage, the TestNG Runner launches the suite, and before running the test, it goes to the ChatGPT API and requests test data for the automation tests. This test data will later be processed at the data provider level, and automated tests will be run against it with newly generated data and expected assertions. Java class TestUser { public String firstName, lastName, email, phone; public Address address; } class Address { public String street, city, state, zip; } public List<TestUser> generateUsers(OpenAIClient client, int count) throws Exception { String prompt = """ You generate test users as STRICT JSON only. Schema: {"users":[{"firstName":"","lastName":"","email":"","phone":"", "address":{"street":"","city":"","state":"","zip":""}]} Count = %d. Output JSON only, no prose. """.formatted(count); String content = client.chat(prompt); JsonNode root = new ObjectMapper().readTree(content); ArrayNode arr = (ArrayNode) root.path("users"); List<TestUser> out = new ArrayList<>(); ObjectMapper m = new ObjectMapper(); arr.forEach(n -> out.add(m.convertValue(n, TestUser.class))); return out; } This solved the problem of repetitive test data and helped to detect errors and anomalies earlier. The main challenge was prompt reliability, and if the prompt wasn’t strict enough, the model would add extra text that broke the JSON parser. In my case, the versioning of prompts was the best way to keep improvements under control. Use Case 2: Log Analysis In some recent open-source projects I came across, automated tests also validated system behavior by analyzing logs. In most of these tests, there is an expectation that a specific message should appear in the application console or in DataDog or Loggly, for example, after calling one of the REST endpoints. Such tests are needed when the team conducts white-box testing. But what if we take it a step further and try to send logs to ChatGPT, asking it to check the sequence of messages and identify potential anomalies that may be critical for the service? Such an integration might look like this: When an automated test pulls service logs (e.g., via the Datadog API), it groups them and sends a sanitized slice to the ChatGPT API for analysis. The ChatGPT API has to return a structured verdict with a confidence score. In case anomalies are flagged, the test fails and displays the reasons from the response; otherwise, it passes. This should keep assertions focused while catching unexpected patterns you didn’t explicitly code for. The Java code for this use case might look like this: Java //Redaction middleware (keep it simple and fast) public final class LogSanitizer { private LogSanitizer() {} public static String sanitize(String log) { if (log == null) return ""; log = log.replaceAll("(?i)(api[_-]?key\\s*[:=]\\s*)([a-z0-9-_]{8,})", "$1[REDACTED]"); log = log.replaceAll("([A-Za-z0-9-_]{20,}\\.[A-Za-z0-9-_]+\\.[A-Za-z0-9-_]+)", "[REDACTED_JWT]"); log = log.replaceAll("[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+", "[REDACTED_EMAIL]"); return log; } } //Ask for a structured verdict record Verdict(String verdict, double confidence, List<String> reasons) {} public Verdict analyzeLogs(OpenAIClient client, String rawLogs) throws Exception { String safeLogs = LogSanitizer.sanitize(rawLogs); String prompt = """ You are a log-analysis assistant. Given logs, detect anomalies (errors, timeouts, stack traces, inconsistent sequences). Respond ONLY as JSON with this exact schema: {"verdict":"PASS|FAIL","confidence":0.0-1.0,"reasons":["...","..."]} Logs (UTC): ---------------- %s ---------------- """.formatted(safeLogs); // Chat with the model and parse the JSON content field String content = client.chat(prompt); ObjectMapper mapper = new ObjectMapper(); JsonNode jNode = mapper.readTree(content); String verdict = jNode.path("verdict").asText("PASS"); double confidence = jNode.path("confidence").asDouble(0.0); List<String> reasons = mapper.convertValue( jNode.path("reasons").isMissingNode() ? List.of() : jNode.path("reasons"), new com.fasterxml.jackson.core.type.TypeReference<List<String>>() {} ); return new Verdict(verdict, confidence, reasons); } Before implementing such an integration, it is important to remember that logs often contain sensitive information, which may include API keys, JWT tokens, or user email addresses. Therefore, sending raw logs to the cloud API is a security risk, and in this case, the data sanitization must be performed. That is why, in my example, I added a simple LogSanitizer middleware to sanitize sensitive data before sending these logs to the ChatGPT API. It is also important to understand that this approach does not replace traditional assertions, but complements them. You can use them instead of dozens of complex checks, allowing the model to detect abnormal behavior. The most important thing is to treat the ChatGPT API verdict as a recommendation and leave the final decision to the automated framework itself based on the specified threshold values. For example, consider a test a failure only if the confidence is higher than 0.8. Use Case 3: Test Stabilization One of the most common problems in test automation is the occurrence of flaky tests. Tests can fail for various reasons, including changes to the API contract or interface. However, the worst scenario is when tests fail due to an unstable testing environment. Typically, for such unstable tests, the teams usually enable retries, and the test is run multiple times until it passes or, conversely, fails after three unsuccessful attempts in a row. But what if we give artificial intelligence the opportunity to decide whether a test needs to be restarted or whether it can be immediately marked as failed or vice versa? Here’s how this idea can be applied in a testing framework: When a test fails, the first step is to gather as much context as possible, including the stack trace, service logs, environment configuration, and, if applicable, a code diff. All this data should be sent to the ChatGPT API for analysis to obtain a verdict, which is then passed to the AiPolicy. It is essential not to let ChatGPT make decisions independently. If the confidence level is high enough, AiPolicy can quarantine the test to prevent the pipeline from being blocked, and when the confidence level is below a specific value, the test can be re-tried or immediately marked as failed. I believe it is always necessary to leave the decision logic to the automation framework to maintain control over the test results, while still using AI-based integration. The main goal for this idea is to save time on analyzing unstable tests and reduce their number. Reports after processing data by ChatGPT become more informative and provide clearer insights into the root causes of failures. Conclusion I believe that integrating the ChatGPT API into a test automation framework can be an effective way to extend its capabilities, but there are compromises to this integration that need to be carefully weighed. One of the most important factors is cost. For example, in a set of 1,000 automated tests, of which about 20 fail per run, sending logs, stack traces, and environment metadata to the API can consume over half a million input tokens per run. Adding test data generation to this quickly increases token consumption. In my opinion, the key point is that the cost is directly proportional to the amount of data: the more you send, the more you pay. Another major issue I noticed is the security and privacy concerns. Logs and test data often contain sensitive information such as API keys, JWT tokens, or users' data, and sending raw data to the cloud is rarely acceptable in production. In practice, this means either using open-source LLMs like LLaMA deployed locally or providing a redaction/anonymization layer between your framework and the API so that sensitive fields are removed or replaced before anything leaves your testing environment. Model selection also plays a role. I've found that in many cases the best strategy is to combine them: using smaller models for routine tasks, and larger ones only where higher accuracy really matters. With these considerations in mind, the ChatGPT API can bring real value to testing. It helps generate realistic test data, analyze logs more intelligently, and makes it easier to manage flaky tests. The integration also makes reporting more informative, adding context and analytics that testers would otherwise have to research manually. As I have observed in practice, utilizing AI effectively requires controlling costs, protecting sensitive data, and maintaining decision-making logic within an automation framework to enable effective regulation of AI decisions. It reminds me of the early days of automation, when teams were beginning to weigh the benefits against the limitations to determine where the real value lay.

By Serhii Romanov

Your SDLC Has an Evil Twin — and AI Built It

You think you know your SDLC like the back of your carpal-tunnel-riddled hand: You've got your gates, your reviews, your carefully orchestrated dance of code commits and deployment pipelines. But here's a plot twist straight out of your auntie's favorite daytime soap: there's an evil twin lurking in your organization (cue the dramatic organ music). It looks identical to your SDLC — same commits, same repos, the same shiny outputs flowing into production. But this fake-goatee-wearing doppelgänger plays by its own rules, ignoring your security governance and standards. Welcome to the shadow SDLC — the one your team built with AI when you weren't looking: It generates code, dependencies, configs, and even tests at machine speed, but without any of your governance, review processes, or security guardrails. Checkmarx’s August Future of Application Security report, based on a survey of 1,500 CISOs, AppSec managers, and developers worldwide, just pulled back the curtain on this digital twin drama: 34% of developers say more than 60% of their code is now AI-generated. Only 18% of organizations have policies governing AI use in development. 26% of developers admit AI tools are being used without permission. It’s not just about insecure code sneaking into production, but rather about losing ownership of the very processes you’ve worked to streamline. Your “evil twin” SDLC comes with: Unknown provenance → You can’t always trace where AI-generated code or dependencies came from. Inconsistent reliability → AI may generate tests or configs that look fine but fail in production. Invisible vulnerabilities → Flaws that never hit a backlog because they bypass reviews entirely. This isn’t a story about AI being “bad”, but about AI moving faster than your controls — and the risk that your SDLC’s evil twin becomes the one in charge. The rest of this article is about how to prevent that. Specifically: How the shadow SDLC forms (and why it’s more than just code)The unique risks it introduces to security, reliability, and governanceWhat you can do today to take back ownership — without slowing down your team How the Evil Twin SDLC Emerges The evil twin isn’t malicious by design — it’s a byproduct of AI’s infiltration into nearly every stage of development: Code creation – AI writes large portions of your codebase at scale. Dependencies – AI pulls in open-source packages without vetting versions or provenance. Testing – AI generates unit tests or approves changes that may lack rigor. Configs and infra – AI auto-generates Kubernetes YAMLs, Dockerfiles, Terraform templates. Remediation – AI suggests fixes that may patch symptoms while leaving root causes. The result is a pipeline that resembles your own — but lacks the data integrity, reliability, and governance you’ve spent years building. Sure, It’s a Problem. But Is It Really That Bad? You love the velocity that AI provides, but this parallel SDLC compounds risk by its very nature. Unlike human-created debt, AI can replicate insecure patterns across dozens of repos in hours. And the stats from the FOA report speak for themselves: 81% of orgs knowingly ship vulnerable code — often to meet deadlines. 33% of developers admit they “hope vulnerabilities won’t be discovered” before release. 98% of organizations experienced at least one breach from vulnerable code in the past year — up from 91% in 2024 and 78% in 2023. The share of orgs reporting 4+ breaches jumped from 16% in 2024 to 27% in 2025. That surge isn’t random. It correlates with the explosive rise of AI use in development. As more teams hand over larger portions of code creation to AI without governance, the result is clear: risk is scaling at machine speed, too. Taking Back Control From the Evil Twin You can’t stop AI from reshaping your SDLC. But you can stop it from running rogue. Here’s how: 1. Establish Robust Governance for AI in Development Whitelist approved AI tools with built-in scanning and keep a lightweight approval workflow so devs don’t default to Shadow AI. Enforce provenance standards like SLSA or SBOMs for AI-generated code. Audit usage & tag AI contributions — use CodeQL to detect AI-generated code patterns and require devs to mark AI commits for transparency. This builds reliability and integrity into the audit trail. 2. Strengthen Supply Chain Oversight AI assistants are now pulling in OSS dependencies you didn’t choose — sometimes outdated, sometimes insecure, sometimes flat-out malicious. While your team already uses hygiene tools like Dependabot or Renovate, they’re only table stakes that don’t provide governance. They won’t tell you if AI just pulled in a transitive package with a critical vulnerability, or if your dependency chain is riddled with license risks. That’s why modern SCA is essential in the AI era. It goes beyond auto-bumping versions to: Generate SBOMs for visibility into everything AI adds to your repos. Analyze transitive dependencies several layers deep. Provide exploitable-path analysis so you prioritize what’s actually risky. Auto-updaters are hygiene. SCA is resilience. 3. Measure and Manage Debt Velocity Track debt velocity — measure how fast vulnerabilities are introduced and fixed across repos. Set sprint-based SLAs — if issues linger, AI will replicate them across projects before you’ve logged the ticket. Flag AI-generated commits for extra review to stop insecure patterns from multiplying. Adopt Agentic AI AppSec Assistants — The FOA report highlights that traditional remediation cycles can’t keep pace with machine-speed risk, making autonomous prevention and real-time remediation a necessity, not a luxury. 4. Foster a Culture of Reliable AI Use Train on AI risks like data poisoning and prompt injection. Make secure AI adoption part of the “definition of done.” Align incentives with delivery, not just speed. Create a reliable feedback loop — encourage devs to challenge governance rules that hurt productivity. Collaboration beats resistance. 5. Build Resilience for Legacy Systems Legacy apps are where your evil twin SDLC hides best. With years of accumulated debt and brittle architectures, AI-generated code can slip in undetected. These systems were built when cyber threats were far less sophisticated, lacking modern security features like multi-factor authentication, advanced encryption, and proper access controls. When AI is bolted onto these antiquated platforms, it doesn't just inherit the existing vulnerabilities, but can rapidly propagate insecure patterns across interconnected systems that were never designed to handle AI-generated code. The result is a cascade effect where a single compromised AI interaction can spread through poorly-secured legacy infrastructure faster than your security team can detect it. Here’s what’s often missed: Manual before automatic: Running full automation on legacy repos without a baseline can drown teams in false positives and noise. Start with manual SBOMs on the most critical apps to establish trust and accuracy, then scale automation. Triage by risk, not by age: Not every legacy system deserves equal attention. Prioritize repos with heavy AI use, repeated vulnerability patterns, or high business impact. Hybrid skills are mandatory: Devs need to learn how to validate AI-generated changes in legacy contexts, because AI doesn’t “understand” old frameworks. A dependency bump that looks harmless in 2025 might silently break a 2012-era API. Conclusion: Bring the ‘Evil Twin’ Back into the Family The “evil twin” of your SDLC isn’t going away. It’s already here, writing code, pulling dependencies, and shaping workflows. The question is whether you’ll treat it as an uncontrolled shadow pipeline — or bring it under the same governance and accountability as your human-led one. Because in today’s environment, you don’t just own the SDLC you designed. You also own the one AI is building — whether you control it or not. Interested to learn more about SDLC challenges in 2025 and beyond? More stats and insights are available in the Future of Appsec report mentioned above.

By Eran Kinsbruner

Exploring Text-to-Cypher: Integrating Ollama, MCP, and Spring AI

When text-to-query approaches (specifically, text2cypher) first entered the scene, I was a bit uncertain how it was useful, especially when existing models were hit-or-miss on result accuracy. It would be hard to justify the benefits over a human expert in the domain and query language. However, as technologies have evolved over the last couple of years, I've started to see how a text-to-query approach adds flexibility to rigid applications that could previously only answer a set of pre-defined questions with limited parameters. Options further expanded when the Model Context Protocol (MCP) emerged, provisioning reusable methods for connecting to various technologies and services in a consistent manner. This blog post will explore how to build an application that supports text-to-cypher through the Neo4j MCP Cypher server, the Ollama local large language model, and Spring AI. Neo4j MCP Cypher Server Neo4j offers several MCP servers anyone can use to connect and execute various functionality with Neo4j. The full list is available in a developer guide, but the one that especially interested me first was the Cypher server. This one provides three tools your application (or an LLM) can utilize to retrieve the database schema (essentially, the graph data model), run read queries in Neo4j, or run write queries in Neo4j. Connecting to the server and using an LLM means you can send natural-language questions as input, and these tools can generate the appropriate Cypher query, execute it in Neo4j, and return the results. There are a few steps to get this set up, so let's start there. Project Setup and MCP Config First, you will need a Spring AI application, or you can follow along with my example code repository. If you create your own, ensure you have a dependency on a large language model. Today, I am using Ollama, which is a local model that runs on my machine and does not send data to a public vendor. You will also need the Spring AI MCP client dependency that will turn the app into a client that can connect to an MCP server. XML <dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-starter-mcp-client</artifactId> </dependency> A Little Note About Ollama I have worked mostly with OpenAI's models for applications and Anthropic's models for code. However, I hadn't done more than dabble with Ollama (see basic chat app repo). Several developers have asked me about Ollama, so with this application, I really wanted to explore it a bit more. As with most LLM vendors now, there are several model families offered, each with different specializations (even if it's for a general purpose). Ollama does this, as well, so I started with a quick Google search (is that old-school or what?) on which Ollama models were the best. What came back were some nice lists for various tasks, so then it came down to testing them out. There is also the variable of constructing a good prompt, so testing may require a few trial-and-error iterations. I started with Mistral (7 billion parameters), which is the default for Spring AI, and was met with mediocre results. Next, I tried Gemma, but Gemma does not support tool access, so that model would not work to integrate MCP and use the server's tools. Finally, I plugged in qwen3 (30 billion parameters) and the results felt solid. The only downside with this model is that it recently incorporated a "thinking" mode, which includes the model's chain-of-thought (or the logic processing it does behind the scenes to answer the question). And, currently, Spring AI does not offer a config to disable the "thinking" mode; however, Ollama does provide a way to disable it via an argument that I haven't yet found a way to configure in Spring either. I'll keep playing with that! For each model I tested, I had to pull (download) it using this command: Shell ollama pull <modelName> #example: ollama pull qwen3:30b Once installed, I could run my Spring app and test it. Now to configure MCP! MCP Configuration The Neo4j Cypher MCP server's GitHub repository provides a README that explains how to connect using a few different methods (Docker, Claude Desktop, etc), but since Spring MCP supports the Claude Desktop config format, I went with that. Create a file in the src/main/resources folder called mcp-servers.json and use the JSON below. Note: if you have a different Neo4j database you want to connect to, update the env portion with your credentials. The database provided here is a public database you are welcome to access, as well! JSON { "mcpServers": { "goodreads-neo4j": { "command": "uvx", "args": [ "[email protected]", "--transport", "stdio" ], "env": { "NEO4J_URI": "neo4j+s://demo.neo4jlabs.com", "NEO4J_USERNAME": "goodreads", "NEO4J_PASSWORD": "goodreads", "NEO4J_DATABASE": "goodreads" } } } } Next, we need to add a few configuration properties to the `application.properties` file. Properties files spring.ai.ollama.chat.model=qwen3:30b spring.ai.mcp.client.stdio.enabled=true spring.ai.mcp.client.stdio.servers-configuration=classpath:mcp-servers.json logging.level.org.springframework.ai.mcp=DEBUG The first property specifies the model (if not using the default, Mistral). Then the remaining properties point to the MCP server JSON config file, allow standard IO access, and set the logging level to DEBUG for MCP. To test this much, I built a quick endpoint in the controller class to fetch the list of tools available from the MCP server. Testing MCP Connection Create a controller class in the src/main/java... right next to the main class file (I called mine AiController.java) and add two annotations to it. Java @RestController @RequestMapping("/") public class AiController { //code we will write next } Within the class definition, inject a ChatClient that will connect to the Ollama LLM and a SyncMcpToolCallbackProvider that will allow access to the MCP server's tools. Both need to be added to the constructor, as well. Java @RestController @RequestMapping("/") public class AiController { private final ChatClient chatClient; private final SyncMcpToolCallbackProvider mcpProvider; public AiController(ChatClient.Builder builder, SyncMcpToolCallbackProvider provider) { this.chatClient = builder .defaultToolCallbacks(provider.getToolCallbacks()) .build(); this.mcpProvider = provider; } //code we will write next } Finally, we can add an endpoint that calls the MCP provider and lists the tools. Java @RestController @RequestMapping("/") public class AiController { //injections //constructor @GetMapping("/debug/tools") public String debugTools() { var callbacks = mcpProvider.getToolCallbacks(); StringBuilder sb = new StringBuilder("Available MCP Tools:\n"); for (var callback : callbacks) { sb.append("- ").append(callback.getToolDefinition().name()).append("\n"); } return sb.toString(); } } Test the application by running it (using ./mvnw spring-boot:run or in an IDE) and hitting the endpoint, as shown below. Shell % http ":8080/debug/tools" #Console output: Available MCP Tools: - spring_ai_mcp_client_goodreads_neo4j_get_neo4j_schema - spring_ai_mcp_client_goodreads_neo4j_read_neo4j_cypher - spring_ai_mcp_client_goodreads_neo4j_write_neo4j_cypher Since the connection to the Neo4j Cypher MCP server is working, the next piece is to write the logic for text-to-cypher! Building Text-to-Cypher Following the test method, we can add a new method to create the text-to-cypher functionality. Java //annotations public class AiController { //injections //constructor //debugTools() method @GetMapping("/text2cypher") public String text2cypher(@RequestParam String question) { //code we will write next } } We defined the /text2cypher endpoint that will take a String question as input and return a String answer as output. The code within the method will require a couple of things. A prompt to direct the LLM's actions, steps, and requested output.A call to the LLM. Here is the method I constructed, but feel free to adjust or explore alternatives in the cypherPrompt: Java @GetMapping("/text2cypher") public String text2cypher(@RequestParam String question) { String cypherPrompt = """ Question: %s Follow these steps to answer the question: 1. Call the get_neo4j_schema tool to find nodes and relationships 2. Generate a Cypher query to answer the question 3. Execute the Cypher query using the read_neo4j_cypher tool 4. Return the Cypher query you executed 5. Return the results of the query """.formatted(question); return chatClient.prompt() .user(cypherPrompt) .call() .content(); } The prompt will outline what steps we want the LLM to follow to complete each text-to-cypher request and what kinds of output we want it to provide. For instance, we want the results of the query it runs against the database, but we also want the Cypher query it will run so that we can cross-check the information. The return statement calls the chatClient we injected earlier, adds a user message for the defined prompt, calls the LLM, and returns the content key in the chat response. Running the Application Next, we can run the application and test the text-to-cypher endpoint. Here are a few example calls (the repository's README offers a few more): Shell http ":8080/text2cypher?question=What entities are in the database?" http ":8080/text2cypher?question=What books did Emily Dickinson write?" http ":8080/text2cypher?question=Which books have the most reviews?" Note: Results will be prefaced by a <think></think> block that includes all of the chain-of-thought logic the LLM processes to provide the answer. Wrapping Up! In this blog post, we stepped through how I built a Spring AI application with Spring AI, MCP, and Ollama. First, we had to set up the project by including the MCP client dependency, pulling an Ollama model, and configuring the application to connect to the server. We tested the connection with a /debug/tools method in the controller class. Then, we defined an endpoint where the application connected to the Neo4j MCP Cypher server and executed text-to-cypher with the Ollama AI model by calling the MCP tools available. Finally, we tested the application with a few different questions to check its logic. This application allows us to ask natural-language questions that the LLM can convert to a Cypher query, run against the database, and return the query it ran along with the results. Happy coding! Resources Code repository (today's code): Spring AI MCP demoCode repository: Neo4j MCP Cypher serverDocumentation: Spring AI MCP clientDocumentation: Spring AI - Ollama chat model

By Jennifer Reif

CORE

Blueprint for Agentic AI: Azure AI Foundry, AutoGen, and Beyond

In 2025, AI isn’t just about individual models doing one thing at a time, but it’s about intelligent agents working together like a well-coordinated team. Picture this: a group of AI systems, each with its own specialty, teaming up to solve complex problems in real time. Sounds futuristic? It’s already happening — thanks to multi-agent systems. Two tools that are making this possible in a big way are Azure AI Foundry and AutoGen. In this post, we’ll dive into how you can bring these two powerful platforms together — using Azure’s scalable infrastructure and AutoGen’s agent collaboration capabilities — to create smarter, more connected, and more efficient AI workflows. The Challenge: From Isolated Models to Intelligent Teams Traditional AI development often involves training and deploying individual models for specific tasks. While effective, this approach can lead to: Siloed intelligence: Models operate independently and do not share knowledge.Manual orchestration: Developers must connect models manually, consuming time and increasing complexity.Limited autonomy: Systems struggle to adapt to new, unforeseen situations. Multi-agent systems introduce a powerful new paradigm where distinct AI agents, which have specialized roles, communicate and collaborate toward a shared goal. This shift unlocks new levels of autonomy, adaptability, and problem-solving potential. Meet the Architects: Azure AI Foundry and AutoGen Before we dive into the "how," let's understand our key players: 1. Azure AI Foundry: Your Enterprise AI Blueprint Azure AI Foundry is Microsoft's new platform designed to help organizations build, deploy, and manage custom AI models at scale. Think of it as your enterprise-grade foundation for AI. It provides: Scalable infrastructure: Compute, storage, and networking tailored for AI workloads.Robust MLOps: Tools for model training, versioning, deployment, and monitoring.Security and compliance: Enterprise-level features to meet stringent requirements.Model catalog: A centralized repository for managing and discovering models, including foundation models. Azure AI Foundry offers the stable, secure, and performant environment needed to host sophisticated AI solutions. 2. AutoGen: Empowering Conversational AI Agents AutoGen, developed by Microsoft Research, is a framework that simplifies the orchestration, optimization, and automation of LLM-powered multi-agent conversations. It allows you to: Define agents: Create agents with specific roles (e.g., "Software Engineer," "Data Analyst," "Product Manager").Enable communication: Agents can send messages, execute code, and perform actions in a conversational flow.Automate workflows: Design complex tasks that agents can collectively solve, reducing human intervention.Integrate tools: Agents can leverage external tools and APIs, expanding their capabilities. AutoGen brings collaborative intelligence to your AI solutions. The Synergy: Azure AI Foundry + AutoGen for Smarter Workflows By combining Azure AI Foundry and AutoGen, you get the best of both worlds: Scalable and secure agent deployment: Deploy your AutoGen-powered multi-agent systems on Azure AI Foundry's robust infrastructure, ensuring high availability and enterprise-grade security.Centralized model management: Leverage Azure AI Foundry's model catalog to manage the LLMs that power your AutoGen agents.Streamlined MLOps for agents: Apply MLOps practices to your agent development, from versioning agent configurations to monitoring their performance in production.Accelerated development: Focus on designing intelligent agent interactions, knowing that the underlying infrastructure is handled by Azure AI Foundry. Building Your First Collaborative AI Workflow: A Simple Example Let's walk through a conceptual example: an AI team designed to analyze a dataset and generate a summary report. Scenario: We want an AI workflow that can: Read a CSV file.Perform basic data analysis (e.g., descriptive statistics, identify trends).Generate a concise, insightful summary. This is a perfect task for collaborative agents! Workflow Overview Code Snippet (Conceptual) First, ensure you have the necessary libraries installed: pip install autogen openai azure-ai-ml. (Note: Replace your-api-key and your-endpoint with your actual Azure OpenAI Service credentials for the LLMs that power your agents.) Python # Assuming you've configured Azure AI Foundry with Azure OpenAI Service # for your LLM endpoints. # This setup would typically be handled via environment variables or a configuration file. import autogen from autogen import UserProxyAgent, AssistantAgent import os # --- Configuration for AutoGen with Azure OpenAI Service --- # These values would come from your Azure AI Foundry deployment or environment variables config_list = [ { "model": "your-gpt4-deployment-name", # e.g., "gpt-4" or "gpt-4-32k" "api_key": os.environ.get("AZURE_OPENAI_API_KEY"), "base_url": os.environ.get("AZURE_OPENAI_ENDPOINT"), "api_type": "azure", "api_version": "2024-02-15-preview", # Check latest supported version }, # You can add more models/endpoints here for different agents if needed ] # --- 1. Define the Agents --- # User Proxy Agent: Acts as the human user, can execute code (if enabled) # and receives messages from other agents. user_proxy = UserProxyAgent( name="Admin", system_message="A human administrator who initiates tasks and reviews reports. Can execute Python code.", llm_config={"config_list": config_list}, # This agent can also use LLM for conversation code_execution_config={ "work_dir": "coding", # Directory for code execution "use_docker": False # Set to True for sandboxed execution (recommended for production) }, human_input_mode="ALWAYS", # Always ask for human input for critical steps is_termination_msg=lambda x: "TERMINATE" in x.get("content", "").upper(), ) # Data Analyst Agent: Specializes in data interpretation and analysis. data_analyst = AssistantAgent( name="Data_Analyst", system_message="You are a meticulous data analyst. Your task is to analyze datasets, extract key insights, and present findings clearly. You can ask the Coder for help with programming tasks.", llm_config={"config_list": config_list}, ) # Python Coder Agent: Specializes in writing and executing Python code. python_coder = AssistantAgent( name="Python_Coder", system_message="You are a skilled Python programmer. You write, execute, and debug Python code to assist with data manipulation and analysis tasks. Provide clean and executable code.", llm_config={"config_list": config_list}, ) # Report Writer Agent: Specializes in summarizing information and generating reports. report_writer = AssistantAgent( name="Report_Writer", system_message="You are a concise and professional report writer. Your goal is to synthesize information from the data analyst into a clear, summary report for the Admin.", llm_config={"config_list": config_list}, ) # --- 2. Initiate the Multi-Agent Conversation --- # Example task: Analyze a simulated sales data CSV # In a real scenario, this CSV would be pre-loaded or retrieved from a data source. initial_task = """ Analyze the following hypothetical sales data CSV (assume it's available as 'sales_data.csv'): 'date,product,region,sales\n2023-01-01,A,East,100\n2023-01-02,B,West,150\n2023-01-03,A,East,120\n2023-01-04,C,North,200\n2023-01-05,B,West,130\n2023-01-06,A,South,90' Perform the following: 1. Load the data into a pandas DataFrame. 2. Calculate total sales per product and per region. 3. Identify the best-selling product and region. 4. Summarize your findings in a clear, concise report, suitable for a business stakeholder. """ # Create a dummy CSV for the coder agent to work with with open("coding/sales_data.csv", "w") as f: f.write("date,product,region,sales\n2023-01-01,A,East,100\n2023-01-02,B,West,150\n2023-01-03,A,East,120\n2023-01-04,C,North,200\n2023-01-05,B,West,130\n2023-01-06,A,South,90") # --- 3. Orchestrate the Group Chat --- groupchat = autogen.GroupChat( agents=[user_proxy, data_analyst, python_coder, report_writer], messages=[], max_round=15, # Limit rounds to prevent infinite loops speaker_selection_method="auto" # AutoGen decides who speaks next ) manager = autogen.GroupChatManager(groupchat=groupchat, llm_config={"config_list": config_list}) print("Starting agent conversation...") user_proxy.initiate_chat( manager, message=initial_task, ) print("\nAgent conversation finished.") # The final report will be in the conversation history of the user_proxy agent. # You would then extract it from `user_proxy.chat_messages` # for further processing or storage in Azure AI Foundry. Deployment on Azure AI Foundry (Conceptual Flow) Once your AutoGen workflow is refined, you'd typically: Containerize your agents: Package your AutoGen agents and their dependencies into a Docker image.Define a model in Azure AI Foundry: Register your LLM endpoint (Azure OpenAI Service) as a model in Azure AI Foundry's model catalog.Create an endpoint/deployment: Deploy your containerized AutoGen application as an online endpoint (e.g., Azure Kubernetes Service or Azure Container Instances) within Azure AI Foundry. This exposes an API that you can call to trigger your multi-agent workflow.Monitor and manage: Use Azure AI Foundry's MLOps capabilities to monitor the performance of your deployed agents, track costs, and update agent configurations or underlying LLMs as needed. Here is the workflow: Benefits of this Integrated Approach Accelerated problem-solving: Agents quickly collaborate to solve complex tasks.Reduced human effort: Automate multi-step processes that previously required manual orchestration.Enhanced adaptability: Agents can be designed to learn and adjust their strategies based on outcomes.Scalability and reliability: Leverage Azure's enterprise-grade infrastructure for your AI solutions.Improved governance: Centralized management of models and deployments within Azure AI Foundry. Conclusion The future of AI is collaborative. Bringing together the MLOps capabilities of Azure AI Foundry with the intelligent multi-agent orchestration of AutoGen can let you unlock powerful, autonomous AI workflows that drive efficiency and innovation.

By Anand Singh

LLMs for Debugging Code

Large language models (LLMs) are transforming software development lifecycles, with their utility in code understanding, code generation, debugging, and many more. This article provides insights into how LLMs can be utilized to debug codebases, detailing their core capabilities, the methodologies used for training, and how the applications might evolve further in the future. Despite the issues with LLMs like hallucinations, the integration of LLMs into development environments through sophisticated, agentic debugging frameworks proves to improve developers’ efficiency. Introduction The Evolving Role of LLMs in Coding LLMs have already proven their capabilities beyond their initial applications in natural language processing to achieve remarkable performance in diverse code-related tasks, including code generation and translation. They power AI coding assistants like GitHub Copilot and Cursor, and have demonstrated near-human-level performance on standard benchmarks such as HumanEval and MBPP. LLMs can generate comprehensive code snippets from textual descriptions, complete functions, and provide real-time syntax suggestions, thereby streamlining the initial stages of code creation. However, there is a clear expansion use-case into more complex, iterative processes required throughout the software development lifecycle. The Criticality of Code Debugging Debugging is a time-consuming yet fundamental part of software development, involving error identification, localization, and repair. These errors range from simple syntax mistakes to complex logical flaws. Traditional debugging methods are often challenging, especially for junior programmers who may struggle with opaque compiler messages and complex codebases. The efficiency of the debugging process directly impacts development timelines and software quality, highlighting the need for more advanced and intuitive tools. Core Capabilities of LLMs Code Understanding and Analysis Beyond the extensive pre-training on vast code corpora to understand natural language, LLMs are trained specifically with large coding databases to recognize common programming patterns and infer the intended meaning of code segments. This foundational capability allows them to analyze code for both syntax errors and logical inconsistencies. Bug Localization and Identification A primary application of LLMs in debugging is their capacity to assist in identifying and localizing bugs. Recent advancements in LLM-based debugging have moved beyond just line-level bug identification. More recent approaches can predict bug locations with finer granularity beyond the line level, down to the token level. We can employ various techniques for both bug identification and the bug-fixing process. This is achieved by leveraging encoder LLMs such as CodeT5, which allow for a more precise pinpointing of the problematic code segments. Code Fixing LLMs can provide suggestions on how to fix buggy code. More recently, LLM-agents are also able to propose the code changes directly. They may also employ an iterative process of improving and repairing source code. There's also growing interest in self-repair techniques, where the LLM runs the code it generated, observes the results, and then makes adjustments based on what went wrong. This loop helps improve the reliability and quality of the final code. This self-correcting mechanism mimics aspects of human debugging, where developers test, observe failures, and then modify their code. For example, how a developer might prompt an LLM for a bug fix: Python # User Prompt: # "The following Python function should calculate the factorial of a number, # but it's returning an incorrect result for inputs greater than 0. # Can you identify the bug and fix it?" def factorial(n): if n == 0: return 1 else: return n * factorial(n) # Bug is here - infinite recursion! # --- LLM's Suggested Fix --- def factorial(n): if n == 0: return 1 else: return n * factorial(n - 1) RAG on Code Base and Q&A Forums LLMs can perform an efficient retrieval-augmented generation (RAG) on the internal code-base and Q&A forums (internal and external) and provide a relevant and concise summary, which can be very helpful in the debugging journey. This can be about understanding a build or a runtime error on local change, questions about design and access patterns, or instantaneously performing RAG to understand and provide an overview and relevant pointers. LLMs can then combine this with their understanding of the code base in order to produce hints for further debugging and possible fixes. This points to a future where advances in debugging won't just rely on better code-oriented models, but also on creative approaches that connect natural language understanding with code reasoning, enabling LLMs to interpret and solve coding issues in a more conceptual, human-like manner. Test Case Generation for Debugging LLMs can provide support for efficient debugging mechanisms through robust test case generation. They can create unit test cases with diverse test inputs, which is fundamental for effectively detecting bugs. Several AI coding tools exemplify this capability, allowing developers to generate well-structured test cases simply by providing natural language prompts. Python // User Prompt: // "Generate a JUnit test case for the following 'Calculator.add' method. // Include a test for positive numbers, negative numbers, and zero." class Calculator: def add(self, a, b): return a + b // LLM-Generated Test Case: import unittest class TestCalculator(unittest.TestCase): def test_add_positive_numbers(self): calculator = Calculator() self.assertEqual(5, calculator.add(2, 3), "Adding two positive numbers") def test_add_negative_numbers(self): calculator = Calculator() self.assertEqual(-5, calculator.add(-2, -3), "Adding two negative numbers") def test_add_with_zero(self): calculator = Calculator() self.assertEqual(2, calculator.add(2, 0), "Adding a number to zero") if __name__ == '__main__': unittest.main() Approaches to Build Debug-Capabilities Data Refinement and Supervised Fine-Tuning It is a very important step to perform domain-specific training on high-quality debugging datasets in the required languages to enhance LLM to help with effective debugging and produce optimal performance. Supervised fine-tuning (SFT) needs to be run on public and internal code bases to understand the design, build, and testing patterns. Research indicates that larger LLMs, particularly those exceeding 70 billion parameters, show extraordinary debugging abilities and superior bug-fixing accuracy in contrast to smaller models (e.g., 7B-scale models). Natural Language as an Intermediate Representation (NL-DEBUGGING) The NL-DEBUGGING framework represents a significant advancement by introducing natural language as an intermediate representation for code debugging. This approach involves translating code into natural language understanding to facilitate a deeper level of semantic reasoning and guide the debugging process. This enables LLMs to come up with diverse strategies and solutions for debugging. Popular natural-language representations include sketches, pseudocodes, and key thought points. Advanced Prompt Engineering Strategies The design of prompts is a critical factor in effectively adapting LLMs for bug-fixing tasks. Providing comprehensive context, such as the original source code, significantly improves the quality and relevance of error explanations generated by LLMs. Various prompt engineering strategies can be employed to optimize performance, including one-shot prompting, assigning specific roles to the LLM (e.g., instructing it to "Act like a very senior Python developer"), and decomposing complex tasks into smaller, manageable sub-tasks. It may also be effective to do negative prompting, which explicitly states what the desired output should not contain. Multi-LLM and Agentic Debugging Flow To overcome the inherent limitations of single LLMs and move beyond the simplistic "prompt in, code out" model that often falls short for complex debugging scenarios, researchers are developing multi-LLM and agentic debugging frameworks. Different LLMs have distinct roles, such as a "Code Learner" and a "Code Teacher," which integrate compiler feedback to improve the accuracy of bug identification and the fix. For example, using Claude for code retrieval, and GPT-4 for deep analysis. In addition, iterative refinement may be employed where LLMs are designed to correct or debug their own outputs. Agentic debugging flow Limitations and Challenges Shallow Code Understanding and Semantic Sensitivity One of the key limitations of today’s large language models in debugging is that they often lack a deep understanding of how code actually works. Their comprehension relies heavily on lexical and syntactic features, rather than a semantic understanding of program logic. Studies show that LLMs can lose the ability to debug the same bug in a substantial percentage of faulty programs (e.g., 78%) when small non-semantic changes (removal of dead-code, comments/variable naming update, etc.) are applied. LLMs may also struggle to discard irrelevant information, treating dead code as if it actively contributes to the program's semantics, which can lead to misdiagnosis during fault localization. Performance on Complex and Logical Errors While LLMs demonstrate promise, their overall debugging performance still falls short of human capabilities. Analysis shows that certain categories of errors still remain significantly challenging for LLMs - specifically, logical errors and bugs involving multiple intertwined issues are considerably more difficult for LLMs to understand/debug compared to simpler syntax and reference errors. Context Window Constraints and Scalability Issues Modern software repositories are often extensive, spanning thousands to millions of tokens. Effective debugging in such environments requires LLMs to process and understand entire codebases comprehensively. LLMs struggle to maintain reliable performance at extreme context sizes, even though recent advancements have made it possible to pass large contexts. Performance has been observed to degrade as context lengths increase, limiting their ability to fully grasp and debug large, multi-file projects in their entirety. The Problem of Hallucinations and Inconsistent Output A critical vulnerability in LLMs is their propensity to generate "hallucinations" — plausible-sounding but factually incorrect or entirely fabricated content. This often means developers need to double-check and sometimes spend extra time debugging the code or fixes suggested by the AI. Hallucinations can stem from various sources, including poorly written prompts, unclear contextual information provided to the model, or the use of outdated model versions. Test Coverage Issues While they can produce executable and varied test cases, they often struggle with the more strategic and logical aspect of testing: identifying which specific statements, branches, or execution paths need to be covered. This limitation is very relevant for debugging, as effective debugging often relies on precisely crafted test cases that isolate and expose specific problematic code paths. The "Debugging Decay" Phenomenon Research shows that AI debugging effectiveness follows an exponential decay pattern. After a few iterations, an LLM's ability to find bugs drops significantly (by 60-80%), making continuous, unguided iterations computationally expensive and inefficient. This suggests that human intervention is necessary to reset and guide the debugging process rather than relying on prolonged, independent AI iterations. Conclusion LLMs are set to revolutionize code debugging by enhancing efficiency and developer productivity. Their ability to understand code, localize bugs, suggest repairs, and generate test cases marks a significant advancement over traditional methods. The future lies in a collaborative model where AI assists human developers, augmenting their skills rather than replacing them. Through continuous learning, strategic integration, and a focus on human-AI partnership, LLMs will become indispensable tools in the software development lifecycle, transforming debugging into a more proactive and intelligent process.

By Surya Teja Appini

FOSDEM 2025 Recap: Open Source Contributors Unite to Collaborate and Help Advance Apache Software Projects

FOSDEM 2025 has come to a close, and it certainly was not without a lot of content and participation from Apache Software Foundation (ASF) members, committers, and contributors! We asked ASF participants to provide summaries and observations from this year’s premier free software event, to share a small part of the work that ASF community members do for open-source software development. This blog provides a brief overview of their talks, including links to the video recordings. Apache NuttX RTOS Talk: "SBOM Journey for an Open Source Project - Apache NuttX RTOS" (video) Speaker: Alin Jerpelea, Sony Overview: This was my second time presenting at FOSDEM, and I absolutely love the community and vibe during the open source week. The talk was about presenting the community journey of license migration to Apache NuttX RTOS, SPDX header implementation, and the road bumps encountered when we started looking at how to implement SBOM automatic generation for a project, which is using the C language. Apache Arrow Talks: "Apache Arrow: The Great Library Unifier" (video), "ODBC Takes an Arrow to the Knee" (video) Speaker: Matt Topol, Voltron Data, Apache Arrow (PMC member), Apache Iceberg (committer) Overview: This was my first FOSDEM, both attending and speaking. The event is simply amazing in sheer scope and size. Being free to attend, it attracts an enormous crowd such that even the smaller DevRooms or more niche topics are still likely to have a good amount of attendees. As of yet, it’s definitely my favorite open source event for finding like-minded people and making great connections over a beer (or several). This event gave me the opportunity to not only catch-up with friends that I’ve met at other conferences (like the ASF’s own Community Over Code) but also to reunite in person with several other Apache Arrow maintainers. Given the size of FOSDEM, there are also a significant number of "Fringe" events organized by different companies and organizations. This gives you the opportunity to expand your circle of connections and knowledge of projects in whatever your preferred space of development is, and have amazing conversations. The volume of talks and topics covered can be overwhelming at first, but ultimately, it contributes to the friendly and casual nature of the best conferences. If you’re able to attend, I highly recommend adding FOSDEM to your list of must-go events. In the meantime, if you do any work in the data analytics space or are building AI/ML tools and utilities, check out the recordings of my talks (and the multitude of others) that have great information for everyone! Talks: "Apache Arrow Tensor Arrays: An Approach for Storing Tensor Data" (video), "What Can PyArrow Do for You - Array Interchange, Storage, Compute and Transport" (video) Speakers: Rok Mihevc and Alenka Frim, Apache Arrow Overview: Our first talk was a short 10-minute pitch, where we mainly explained the development of canonical tensor data type support in Arrow C++ and PyArrow. The aim of the talk was to present memory layout specification of the two tensor extension types — fixed and variable shape — together with a Python example of creating a fixed shape tensor array and its interoperability with NumPy. In the PyArrow talk, we gave an overview of some of PyArrow's capabilities, demonstrating data interchange, storage, manipulation, and transport using a single Python library. We had to start with a short introduction of Apache Arrow and its Python implementation as the attendees were a mixed group of users and non-users. After that, we gave a general overview together with code examples for each of the four PyArrow functionalities that we decided to present: array interchange, storage (Parquet, ORC, etc.), compute module, and Flight RPC. The response and questions we got showed us that we picked a good theme and need to do more presentations like this. Forked Communities Talk: "Forked Communities: Project Re-licensing and Community Impact" (video) Speakers: Brian Proffitt, Red Hat, ASF (with Dr. Dawn M Foster, Stephen Walli, and Ben Ford) Overview: The increase of open source businesses changing their licensing from open-source software licenses to something more restrictive is becoming a real concern in the open source ecosystem. Companies such as MongoDB, Elasticsearch, HashiCorp’s Terraform/Vault, and Redis have shifted their projects to non-open-source licenses (sometimes referred to as “source available”), despite there not being consistent evidence that this actually generates improved financial outcomes for those companies. In some cases, this relicensing has resulted in a hard fork of the original project. Both the relicensing and the resulting fork create turmoil for the users of that project and the community of contributors. On Saturday at FOSDEM, I joined Dr. Dawn Foster from Project CHAOSS, Stephen Walli of Microsoft, and Ben Ford from Overlook InfraTech to discuss: The dynamics around relicensing that results in such hard forksExamples of forks along with the impact on the communitiesThoughts about what this means for the future of free and open-source software. The panel’s conclusion? For the most part, the forked projects tend to do better in terms of contributions and adoption growth, and there is real data coming in to support that. Technology is all well and good, but without truly open communities and contributor practices, even the coolest technologies might see less adoption when they are closed off. Apache Mahout Talk: "Introducing Qumat, Apache Mahout’s New Quantum Compute Layer" (video) Speakers: Trevor Grant, The AI Alliance (IBM) and Andrew Musselman, Speedchain Overview: Following Apache Mahout's core values of interoperability and providing tools for matrix arithmetic at scale, we have added a new layer (qumat) alongside our existing distributed matrix math framework (Samsara), that allows quantum researchers and developers to write code once and run it on any available back end. As with distributed compute systems like Spark and Flink, moving from one platform to another typically requires a complete code rewrite. This is prohibitive in most cases, but Samsara provides machine learning researchers and developers one unified interface to write code once and port instantly to another platform if it is deemed necessary. Similarly for quantum computing, multiple vendors (e.g., IBM, GCP, AWS) have their own libraries for accessing their cloud quantum compute services, such as qiskit, cirq, and braket. To give the same flexibility in the quantum area, qumat corrals all these libraries under one interface, allowing users to focus on building circuits and writing algorithms rather than adapting to one particular library. Developer Relations Talk: "Building Bridges: Exploring the Future of Developer Relations BOF" Speaker: Nadia Jiang, Ant Group Overview: Thanks to Willem Jiang for recommending this conference to me last year. This was my second time attending FOSDEM, which also marked its 25th anniversary. This year, I organized a Birds of a Feather (BoF) session on Developer Relations. I was deeply impressed by the lively discussions and the connections made during the event. Participants ranged from DevRel professionals at major companies like Google to entrepreneurs at start-ups. Two key topics sparked the most heated discussions: How open-source projects can adapt their documentation, forums, and support in the face of AI advancementsHow to balance the responsibilities of DevRel (ensuring developer satisfaction) with customer support (ensuring client satisfaction) The diverse experiences and insights shared were truly inspiring. Apache Airflow Talk: "Airflow Beach Cleaning - Securing Supply Chain" (video) Speakers: Jarek Potiuk, Apache Airflow (with Munawar Hafiz and Michael Winser) Overview: This was my third FOSDEM, and I absolutely love the event. It’s hands-down the best place where you can not only share your stories but also meet people that “think” open source, have a beer (or two) with them, and make connections that make your open source journey more enjoyable and successful. In my first event, I was an attendee, I had a talk at the second, and for the third one, I co-organized “Low-Level AI Engineering and Hacking” with Roman. This is how you can fall in love with the event and people around it. This year, I had the opportunity to catch up with my friends, who could learn from me about what I am doing, and meet completely new people. AI is NOT my thing, yet organizing an AI dev room was an opportunity to meet some of the greatest minds in the space and help to make their event experience better as I was also “stage hosting” them and making them comfortable — that was the highlight of this event. But more importantly, I took part in quite a few “Fringe” events around FOSDEM. Having all the people around make it really easy to organize a lot of events around open source, such as: Open source metrics events (CHAOSScon)An open source policy forum, where we discussed the future of open source in a more regulated worldA meeting with EU Commission — and other foundation people — in Brussels, the heart of EU policymaking, where they learned from us how they can design and apply their policies to better suit the open source crowd. This year's FOSDEM set me up for the whole next year — with a number of threads started and people I met. Apache Logging Services Talk: "What if Log4Shell Were to Happen Today?" (video) Speaker: Piotr Karwasz, Freelancer, Apache Logging Services PMC Overview: While the Apache Logging Services PMC handled the Log4Shell crisis pretty well, there was certainly room for improvement. My talk casts light on some of the behind-the-scenes problems we identified and were able to fix last year, thanks to a Sovereign Tech Agency grant. To handle a vulnerability quickly, developers should be able to concentrate on its cause without being distracted by policy and technical problems. It is therefore important to keep an OSS project ready at all times for a new release and have multiple PMC members ready to replace the release manager in case of unavailability. I provided a small cheat sheet on how to handle vulnerabilities in a regulated environment such as the ASF. Another part of the talk is consecrated to two risky behaviors in the OSS world: featuritis and slow reaction to vulnerability announcements. New tools such SBOMs and bots can help with the latter problem, but having a sane policy on which features to accept requires a lot of OSS community experience. Apache Iceberg Talk: "Take the Polar Plunge: A Fearless Introduction to Apache Iceberg" (video) Speaker: Danica Fine, Snowflake Overview: I was thrilled to be attending and speaking at my first-ever FOSDEM as a developer advocate with Snowflake to share Apache Iceberg with everyone! After my and Russell Spitzer’s session on the v3 Table Spec in the Data Analytics room, the organizers asked me to step in to fill a session from a speaker who couldn’t make it; given the success of our first Iceberg talk, I suggested that I share an introduction to Iceberg, a “polar plunge” of sorts. Despite the last-minute schedule change, nearly 100 folks attended. Throughout the session, I covered basics, like the motivation behind a data lakehouse, Iceberg architecture, and an overview of compute engines and catalogs. From there, I introduced everyone to queries and how they interact with the layer of metadata that makes up Iceberg. To complete the polar plunge, I brought everyone up to speed with topics like CoR vs MoR, the small files problem, and compaction. At the end of the session, Russell and I enjoyed addressing follow-up questions in the “Hallway Track” and beyond. The environment at FOSDEM was electric, to say the least! It was incredible to see how excited people were for each of the sessions, and I was especially pleased to see so much interest in Apache Iceberg. Talk: "What the Spec?!: New Features in Apache Iceberg Table Format V3" (video) Speakers: Danica Fine and Russell Spitzer, Snowflake We work at Snowflake and were excited to attend our first ever FOSDEM. We joined the Data Analytics room to give everyone an overview of all the exciting new developments coming in V3 of the Apache Iceberg Table Format. Folks got to hear about how the Iceberg Spec is being expanded to handle a slew of features like Row Lineage, Variant Types, and Delete Vectors. As usual, the hallway track afterward was full of exciting new ideas for features and future work, and I hope we can be back again in the future! Learn More To learn more about FOSDEM, and to see the full list of recorded sessions, visit https://fosdem.org/2025/. To contribute to an Apache Software Foundation project, visit https://community.apache.org/.

By Brian Proffitt

Beyond Retrieval: How Knowledge Graphs Supercharge RAG

Retrieval-augmented generation (RAG) enhances the factual accuracy and contextual relevance of large language models (LLMs) by connecting them to external data sources. RAG systems use semantic similarity to identify text relevant to a user query. However, they often fail to explain how the query and retrieved pieces of information are related, which limits their reasoning capability. Graph RAG addresses this limitation by leveraging knowledge graphs, which represent entities (nodes) and their relationships (edges) in a structured, machine-readable format. This framework enables AI systems to link related facts and draw coherent, explainable conclusions, moving closer to human-like reasoning (Hogan et al., 2021). In this article and the accompanying tutorial, we explore how Graph RAG compares to traditional RAG by answering questions drawn from A Study in Scarlet, the first novel in the Sherlock Holmes series, demonstrating how structured knowledge supports more accurate, nuanced, and explainable insights. Constructing Knowledge Graphs: From Expert Ontologies to Large-Scale Extraction The first step in implementing graph RAG is building a knowledge graph. Traditionally, domain experts manually define ontologies and map entities and relationships, producing high-quality structures. However, this approach does not scale well for large volumes of text. To handle bigger datasets, LLMs and NLP techniques automatically extract entities and relationships from unstructured content. In the tutorial accompanying this article, we demonstrate this process by breaking it down into three practical stages: 1. Entity and Relationship Extraction The first step in building the graph is to extract entities and relationships from raw text. In our tutorial, this is done using Cohere’s command-a model, guided by a structured JSON schema and a carefully engineered prompt. This approach enables the LLM to systematically identify and extract the relevant components. For example, the sentence: “I was dispatched, accordingly, in the troopship ‘Orontes,’ and landed a month later on Portsmouth jetty.” can be converted into the following graph triple: (Watson) – [travelled_on] → (Orontes) Instead of processing the entire novel in one pass, we split it into chunks of about 4,000 characters. Each chunk is processed with two pieces of global context: Current entity list: all entities discovered so far.Current relation types: the set of relations already in use. This approach prevents the model from creating duplicate nodes and keeps relation labels consistent across the book. An incremental merging step then assigns stable IDs to new entities and reuses them whenever known entities reappear. For instance, once the relation type meets is established, later chunks will consistently reuse it rather than inventing variants such as encounters or introduced_to. This keeps the graph coherent as it grows across chunks. 2. Entity Resolution Even with a global context, the same entity often appears in different forms; a character, location, or organization may be mentioned in multiple ways. For example, “Stamford” might also appear as “young Stamford”, while “Dr Watson” is later shortened to “Watson.” To handle this, we apply a dedicated entity resolution step after extraction. This process: Identify and merge duplicate nodes based on fuzzy string similarity (e.g., Watson ≈ Dr Watson).Expand and unify alias lists so that each canonical entity maintains a complete record of its variants.Normalize relationships so that all edges consistently point to the resolved nodes. This ensures that queries such as “Who introduced Watson to Holmes?” always return the correct canonical entity, regardless of which alias appears in the text. 3. Graph Population and Visualization With resolved entities and cleaned relationships, the data is stored in a graph structure. Nodes can be enriched with further attributes such as type (Person, Location, Object), aliases, and optionally traits like profession or role. Edges are explicitly typed (e.g., meets, travels_to, lives_at), enabling structured queries and traversal. Together, these stages transform unstructured narrative text into a queryable, explainable Knowledge Graph, laying the foundation for Graph RAG to deliver richer insights than traditional retrieval methods. Once the Knowledge Graph is constructed, it is stored in a graph database designed to handle interconnected data efficiently. Databases like Neo4j allow nodes and relationships to be queried and traversed intuitively using declarative languages such as Cypher. Graph RAG vs. Traditional RAG The main advantage of Graph RAG over traditional RAG lies in its ability to leverage structured relationships between entities. Traditional RAG retrieves semantically similar text fragments but struggles to combine information from multiple sources, making it difficult to answer questions that require multi-step reasoning. In the tutorial, for example, we can ask: “Who helped Watson after the battle of Maiwand, and where did this occur?”Graph RAG answers this by traversing the subgraph: Dr Watson → [HELPED_BY] → Murray → [LOCATED_AT] → Peshawar.Traditional RAG would only retrieve sentences mentioning Watson and Murray without connecting the location or providing a reasoning chain. This shows that graph RAG produces richer, more accurate, and explainable answers by combining entity connections, attributes, and relationships captured in the knowledge graph. A key benefit of graph RAG is transparency. Knowledge graphs provide explicit reasoning chains rather than opaque, black-box outputs. Each node and edge is traceable to its source, and attributes such as timestamps, provenance, and confidence scores can be included. This level of explainability is particularly important in high-stakes domains like healthcare or finance, but it also enhances educational value in literary analysis, allowing students to follow narrative reasoning in a structured, visual way. Enhancing Retrieval With Graph Embeddings Graph data can be enriched through graph embeddings, which transform nodes and their relationships into a vector space. This representation captures both semantic meaning and structural context, making it possible to identify similarities beyond surface-level text. Embedding algorithms such as FastRP and Node2Vec enable the retrieval of relationally similar nodes, even when their textual descriptions differ By integrating LLMs with knowledge graphs, graph RAG transforms retrieval from a simple text lookup into a structured reasoning engine. It enables AI systems to link facts, answer complex questions, and provide explainable, verifiable insights. Graph RAG represents a step toward AI systems that are not only larger and more capable but also smarter, more transparent, and more trustworthy, capable of structured reasoning over rich information sources. References Hogan, A., Blomqvist, E., Cochez, M., d’Amato, C., de Melo, G., Gutierrez, C., Gayo, J.E.L., Kirrane, S., Neumaier, S., Polleres, A., Navigli, R., Ngomo, A.C.N., Rashid, S.M., Rula, A., Schmelzeisen, L., Sequeda, J., Staab, S., and Zimmermann, A. (2021) ‘Knowledge Graphs’, ACM Computing Surveys, 54(4), pp. 1–37.Larson, J. and Truitt, A. (2024) ‘Beyond RAG: Graph-based retrieval for multi-step reasoning’, Microsoft Research Blog. Available at: https://www.microsoft.com/en-us/research/blog/beyond-rag-graph-based-retrieval-for-multi-step-reasoning/ (Accessed: 20 July 2025).Neo4j (2024) ‘Graph data science documentation’, Neo4j. Available at: https://neo4j.com/docs/graph-data-science/current/ (Accessed: 20 July 2025).

By Salman Khan

CORE

VS Code Agent Mode: An Architect's Perspective for the .NET Ecosystem

GitHub Copilot agent mode had several enhancements in VS Code as part of its July 2025 release, further bolstering its capabilities. The supported LLMs are getting better iteratively; however, both personal experience and academic research remain divided on future capabilities and gaps. I've had my own learnings exploring agent mode for the last few months, ever since it was released, and had the best possible outcomes with Claude Sonnet Models. After 18 years of building enterprise systems — ranging from integrating siloed COTS to making clouds talk, architecting IoT telemetry data ingestions and eCommerce platforms — I've seen plenty of "revolutionary" tools come and go. I've watched us transition from monoliths to microservices, from on-premises to cloud, from waterfall to agile. I've learned Java 1.4, .NET 9, and multiple flavors of JavaScript. Each transition revealed fundamental flaws in how we think about software construction. The integration of generative AI into software engineering is dominated by pattern matching and reasoning by analogy to past solutions. This approach is philosophically and practically flawed. There's active academic research that surfaces this problem, primarily the "Architectures of Error" framework that systematically differentiates the failure modes of human and AI-generated code. At the moment, I'm neither convinced by Copilot's capability nor have I found reasons to hate it. My focus in this article is more on the human-side errors that Agent Mode helps us recognize. Why This Isn't Just Another AI Tool Copilot's Agent Mode isn't just influencing how we build software — it's revealing why our current approaches are fundamentally flawed. The uncomfortable reality: Much of our architectural complexity exists because we've never had effective ways to encode and enforce design intent. We write architectural decision records that few read. We create coding standards that get violated under pressure. We design patterns that work beautifully when implemented correctly but fail catastrophically when they're not. Agent Mode surfaces this gap between architectural intent and implementation reality in ways we haven't experienced before. The Constraint Problem We've Been Avoiding Here's something I've learned from working on dozens of enterprise projects: Most architectural failures aren't technical failures — they're communication failures. We design a beautiful hexagonal architecture, document it thoroughly, and then watch as business pressure, tight deadlines, and human misunderstanding gradually erode it. By year three, what we have bears little resemblance to what we designed. C# // What we designed public class CustomerService : IDomainService<Customer> { // Clean separation, proper dependencies } // What we often end up with after several iterations public class CustomerService { // Direct database calls mixed with business logic // Scattered validation, unclear responsibilities // Works, but violates every architectural principle } Agent Mode forces us to confront this differently. AI can't read between the lines or make intuitive leaps. If our architectural constraints aren't explicit enough for an AI to follow, they probably aren't explicit enough for humans either. The Evolution from Documentation to Constraints In my experience, the most successful architectural approaches have moved progressively toward making correct usage easy and incorrect usage difficult. Early in my career, I relied heavily on documentation and code reviews. Later, I discovered the power of types, interfaces, and frameworks that guide developers toward correct implementations. Now, I'm exploring how to encode architectural knowledge directly into development tooling (and Copilot). C# / Evolution 1: Documentation-based (fragile) // "Please ensure all controllers inherit from BaseApiController" // Evolution 2: Framework-based (better) public abstract class BaseApiController : ControllerBase { // Common functionality, but still optional } // Evolution 3: Constraint-based (AI-compatible) public interface IApiEndpoint<TRequest, TResponse> where TRequest : IValidated where TResponse : IResult { // Impossible to create endpoints that bypass validation } The key insight: Each evolution makes architectural intent more explicit and mechanical. Agent Mode simply pushes us further along this path. We can work around most AI problems like the "AI 90/10 problem" arising from hallucinated APIs, non-existent libraries, context-window myopia, systematic pattern propagation, and model drift. LLM responses are probabilistic by nature, but they can be made deterministic by specifying constraints. Practical Implications Working with Agent Mode on real projects has revealed several practical patterns: 1. Requirement Specification Vague prompts produce (architecturally) inconsistent results. This isn't a limitation — it's feedback about the clarity of our thinking at any role, especially around SDLC, including the architect. We struggled with the same problems with the advent of the outsourcing era, too. SaaS inherits this problem through its extensibility and flexibility. Markdown [BAD] Inviting infinite possibilities: "Create a service for managing customers relationship" [GOOD] More effective: "Create a CustomerService implementing IDomainService<Customer> with validation using FluentValidation and error handling via Result<T> pattern" 2. The Composability Test If AI struggles to combine your architectural patterns correctly, human developers probably do too. They excel at pattern matching but fail at: Systematicity: Applying rules consistently across contextsProductivity: Scaling to larger, more complex compositionsSubstitutivity: Swapping components while maintaining correctnessLocalism: Understanding global vs. local scope implications This also helps to identify the architectural complexity. 3. The Constraint Discovery Process Working with AI has helped me identify implicit assumptions in existing architectures that weren't previously explicit. These discoveries often lead to better human-to-human communication as well. The Skills That Remain Valuable Based on my experience so far, certain architectural skills have become more important now: Domain understanding: AI can generate technically correct code, but understanding business context and constraints remains fundamentally human.Pattern recognition: Identifying when existing patterns apply and when new ones are needed becomes crucial for defining AI constraints.System thinking: Understanding emergent behaviors and system-level properties remains beyond current AI capabilities.Trade-off analysis: Evaluating architectural decisions based on business context, team capabilities, and long-term maintainability. What's Actually Changing The shift isn't as dramatic as "AI replacing architects or developers." It's more subtle: From implementation to intent: Less time writing boilerplate, more time clarifying what we actually want the system to do.From review to prevention: Instead of catching architectural violations in code review, we encode constraints that prevent them upfront.From documentation to automation: Architectural knowledge becomes executable rather than just descriptive. These changes feel significant to me, but they're evolutionary rather than revolutionary. Challenges I'm Still Working Through The learning curve: Developing fluency with constraint-driven development requires rethinking established habits.Team adoption: Not everyone is comfortable with AI-assisted development yet, and that's understandable.Tool maturity: Current AI tools are impressive but still have limitations around context understanding and complex reasoning.Validation strategies: Traditional testing approaches may not catch all AI-generated issues, so we're developing new validation patterns. A Measured Prediction Based on what I'm seeing, I expect a gradual shift over the next 3–5 years toward: More explicit architectural constraints in codebasesIncreased automation of pattern enforcementEnhanced focus on domain modeling and business rule specificationEvolution of code review practices to emphasize architectural composition over implementation details This won't happen overnight, and it won't replace fundamental architectural thinking. But it will change how we express and enforce architectural decisions. What I'm Experimenting With Currently, I'm exploring: 1. Machine-readable architecture definitions that can guide both AI and human developers. JSON { "architecture": { "layers": ["Api", "Application", "Domain", "Infrastructure"], "dependencies": { "Api": ["Application"], "Application": ["Domain"], "Infrastructure": ["Domain"] }, "patterns": { "cqrs": { "commands": "Application/Commands", "queries": "Application/Queries", "handlers": "required" } } } } 2. Architectural testing frameworks that validate system composition automatically. C# [Test] public void Architecture_Should_Enforce_Layer_Dependencies() { var result = Types.InCurrentDomain() .That().ResideInNamespace("Api") .ShouldNot().HaveDependencyOn("Infrastructure") .GetResult(); Assert.That(result.IsSuccessful, result.FailingTypes); } [Test] public void AI_Generated_Services_Should_Follow_Naming_Conventions() { var services = Types.InCurrentDomain() .That().AreClasses() .And().ImplementInterface(typeof(IDomainService)) .Should().HaveNameEndingWith("Service") .GetResult(); Assert.That(services.IsSuccessful); } 3. Constraint libraries that make common patterns easy to apply correctly, starting with domain primitives. C# ```csharp // Instead of generic controllers, define domain-specific primitives public abstract class DomainApiController<TEntity, TDto> : ControllerBase where TEntity : class, IEntity where TDto : class, IDto { // Constrained template that AI can safely compose } // Service registration primitive public static class ServiceCollectionExtensions { public static IServiceCollection AddDomainService<TService, TImplementation>( this IServiceCollection services) where TService : class where TImplementation : class, TService { // Validated, standard registration pattern return services.AddScoped<TService, TImplementation>(); } } 4. Documentation approaches that work well with AI-assisted development. An example is documenting architecture in the Arc42 template in Markdown, diagrams in Mermaid embedded in Markdown. Early results are promising, but there's still much to learn and explore. Looking Forward After 18 years in this field, I've learned to be both optimistic about new possibilities and realistic about the pace of change. VS Code Agent Mode represents an interesting step forward in human-AI collaboration for software development. It's not a silver bullet, but it is a useful tool that can help us build better systems — if we approach it thoughtfully. The architectures that thrive in an AI-assisted world won't necessarily be the most sophisticated ones. They'll be the ones that most clearly encode human insight in ways that both AI and human developers can understand and extend. That's a worthy goal, regardless of the tools we use to achieve it. Final Thoughts The most valuable architectural skill has always been clarity of thought about complex systems. AI tools like Agent Mode don't change this fundamental requirement — they just give us new ways to express and validate that clarity. As we navigate this transition, the architects who succeed will be those who remain focused on the essential questions: What are we trying to build? Why does it matter? How can we make success more likely than failure? The tools continue to evolve, but these questions remain constant. I'm curious about your experiences with AI-assisted development. What patterns are you seeing? What challenges are you facing? The best insights come from comparing experiences across different contexts and domains.

By Shashi Kumar

The AI FOMO Paradox

TL; DR: AI FOMO — A Paradox AI FOMO comes from seeing everyone’s polished AI achievements while you see all your own experiments, failures, and confusion. The constant drumbeat of AI breakthroughs triggers legitimate anxiety for Scrum Masters, Product Owners, Business Analysts, and Product Managers: “Am I falling behind? Will my role be diminished?” But here’s the truth: You are not late. Most teams are still in their early stages and uneven. There are no “AI experts” in agile yet — only pioneers and experimenters treating AI as a drafting partner that accelerates exploration while they keep judgment, ethics, and accountability. Disclaimer: I used a Deep Research report by Gemini 2.5 Pro to research sources for this article. The Reality Behind the AI Success Stories Signals are distorted: Leaders declare AI-first while data hygiene lags. Shadow AI usage inflates progress without creating stable practices. Generative AI has officially entered what Gartner calls the “Trough of Disillusionment” in 2024-2025. MIT Sloan’s research reveals only 5% of business AI initiatives generate meaningful value (Note: The MIT Sloan report needs to be handled with care due to its design.) Companies spend an average of $1.9 million on generative AI initiatives. Yet, less than 30% of AI leaders report CEO satisfaction. Meanwhile, individual workers report saving 2.2–2.5 hours weekly — quiet, durable gains beneath the noise generated by the AI hype. The “AI Shame” phenomenon proves the dysfunction: 62% of Gen Z workers hide their AI usage, 55% pretend to understand tools they don’t, with only a small fraction receiving adequate guidance. This isn’t progress; it’s organizational theater. Good-Enough Agile Is Ending AI is not replacing Agile. It’s replacing the parts that never created differentiated value. “Good-Enough Agile,” teams going through Scrum events without understanding the principles, are being exposed. Ritualized status work, generic Product Backlog clerking, and meeting transcription: all becoming cheap, better, and plentiful. Research confirms AI as a “cybernetic teammate” amplifying genuine agile principles. The Agile Manifesto’s first value, “Individuals and interactions over processes and tools,” becomes clearer. AI is the tool. Your judgment remains irreplaceable. The AI for Agile anti-patterns revealing shallow practice include: Tool tourism: Constant switching that hides a weak positioningHero prompts: One person becomes the AI bottleneck instead of distributing knowledgeVanity dashboards: Counting prompts instead of tracking outcome-linked metricsAutomation overreach: Brittle auto-actions that save seconds but cost days. These patterns expose teams practicing cargo cult Agile. Career insecurity triggers documented fears of exclusion, but the real threat isn’t being excluded from AI knowledge. It’s being exposed as having practiced shallow Agile all along. (Throwing AI at a failed approach to “Agile” won’t remedy the main issue.) The Blunt Litmus Test If you can turn messy inputs into falsifiable hypotheses, define the smallest decisive test, and defend an ethical error budget, AI gives you lift. If you cannot, AI will do your visible tasks faster while exposing absent value and your diminished utility from an organization’s point of view. Your expertise moves upstream to framing questions and downstream to evaluating evidence. AI handles low-leverage generation; you decide what matters, what’s safe, and what ships. Practical Leverage Points There are plenty of beneficial approaches to employing AI for Agile, for example: Product Teams: Convert qualitative inputs into competing hypotheses. AI processes customer transcripts in minutes, but you determine which insights align with the product goal. Then, validate or falsify hypotheses with AI-built prototypes faster than ever before. Scrum Masters: Auto-compile WIP ages, handoffs, interrupting flow, and PR latency to move Retrospectives from opinions to evidence. AI surfaces patterns; you guide systemic improvements. Seriously, talking to management becomes significantly easier once you transition from “we feel that…” to “we have data on…” Developers: Generate option sketches, then design discriminating experiments. PepsiCo ran thousands of virtual trials; Wayfair evolved its tool through rapid feedback — AI accelerating empirical discovery. Stanford and World Bank research shows a 60% time reduction on cognitive tasks. But time saved means nothing without judgment about which tasks matter. Building useless things more efficiently won’t prove your value as an agile practitioner to the organization when a serious voice questions your effectiveness. Conclusion: From Anxiety to Outcome Literacy The path forward isn’t frantically learning every tool. Start with one recurring problem. Form a hypothesis. Run a small experiment. Inspect results. Adapt. This is AI for Agile applied to your development. The value for the organization shifts from execution to strategic orchestration. Your experience building self-managing teams becomes more valuable as AI exposes the difference between genuine practice and cargo cult Agile. Durable wins come from workflow redesign and sharper questions, not model tricks. If you can frame decisions crisply, choose discriminating tests, and hold ethical lines, you’re ahead where it counts. AI FOMO recedes when you trade comparison for learning velocity. Choose an outcome that matters, add one AI-assisted step that reduces uncertainty, measure honestly, and keep what proves worth. AI won’t replace Agile; it will replace Good-Enough Agile, and outcome-literate practitioners will enjoy a massive compound advantage. It helps if you know what you are doing for what purpose. Food for Thought on AI FOMO How might recognizing AI as exposing “Good-Enough Agile” rather than threatening genuine practice change your approach to both AI adoption and agile coaching in organizations that have been going through the motions?Given that AI makes shallow practice obvious by automating ritual work, what specific anti-patterns in your organization would become immediately visible, and how would you address the human dynamics of that exposure?If the differentiator is “boring excellence”—clean operational data, evaluation harnesses, and reproducible workflows—rather than AI tricks, what foundational practices need strengthening in your context before AI can actually accelerate value delivery? Sources Gartner AI Hype Cycle ReportAI FOMO, Shadow AI, and Other Business ProblemsAI Hype Cycle – Gartner Charts the Rise of Agents, HPCwireImpact of Generative AI on Work Productivity, Federal Reserve Bank of St. LouisAI Shame Grips the Present Generation, Times of IndiaGenerative AI & Agile: A Strategic Career Decision, Scrum.orgFear of Missing Out at Work, Frontiers in Organizational Psychology (link downloads as an EPUB)Artificial Intelligence in Agile, SprightbulbAI & Agile Product Teams, Scrum.orgProductivity Gains from Using AI, Visual CapitalistHuman + AI: Rethinking the Roles and Skills of Knowledge Workers, AI Accelerator Institute

By Stefan Wolpers

CORE

AI/ML

DZone's Featured AI/ML Resources

Top AI/ML Experts

The Latest AI/ML Topics