NeoTec                                                    L. Dunbar, Ed.
Internet-Draft                                                 Futurewei
Updates: 8342 (if approved)                                       C. Xie
Intended status: Standards Track                                  Q. Sun
Expires: 17 April 2025                                     China Telecom
                                                         14 October 2024


     Cross-Domain Cloud and Network Resource Management Data Model
              draft-dxs-neotec-crossdomain-net-mgnt-dm-00

Abstract

   This document proposes extensions to existing YANG models, as well as
   new YANG models, to enable the management of cross-domain cloud and
   network resources.  The intent is to provide dynamic resource
   allocation mechanisms that allow services to scale efficiently across
   multiple cloud environments and edge computing platforms.  By
   defining unified YANG models for both network and cloud domains, this
   draft addresses challenges in orchestrating and managing resources in
   a hybrid environment while maintaining interoperability and dynamic
   scaling.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 17 April 2025.

Copyright Notice

   Copyright (c) 2024 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.



Dunbar, et al.            Expires 17 April 2025                 [Page 1]

Internet-Draft         Cloud Resource Abstraction           October 2024


   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Revised BSD License text as
   described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Revised BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Requirements Language . . . . . . . . . . . . . . . . . . . .   3
   3.  Problem Statement . . . . . . . . . . . . . . . . . . . . . .   3
   4.  Data Models Overview  . . . . . . . . . . . . . . . . . . . .   4
   5.  Cross-Domain Resource Orchestrator  . . . . . . . . . . . . .   5
   6.  Dynamic Resource Allocation for Federated Learning  . . . . .   7
   7.  Dynamic Network Reconfiguration . . . . . . . . . . . . . . .   9
   8.  Edge Computing Node . . . . . . . . . . . . . . . . . . . . .  10
   9.  Security Considerations . . . . . . . . . . . . . . . . . . .  12
   10. IANA Considerations . . . . . . . . . . . . . . . . . . . . .  12
   11. References  . . . . . . . . . . . . . . . . . . . . . . . . .  12
     11.1.  Normative References . . . . . . . . . . . . . . . . . .  12
     11.2.  Informative References . . . . . . . . . . . . . . . . .  12
   Acknowledgements  . . . . . . . . . . . . . . . . . . . . . . . .  13
   Contributors  . . . . . . . . . . . . . . . . . . . . . . . . . .  13
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  13

1.  Introduction

   Cloud and edge computing environments are increasingly interconnected
   with network infrastructure, and modern services require dynamic,
   cross-domain orchestration to scale efficiently.  Services placed in
   Cloud Data Centers (DC) are changing dynamically, often undergoing
   high-frequency modifications based on evolving service requirements.
   As a result, the network connecting these services must dynamically
   adapt and reconfigure itself in real-time to accommodate the services
   changes.
















Dunbar, et al.            Expires 17 April 2025                 [Page 2]

Internet-Draft         Cloud Resource Abstraction           October 2024


   A set of network-related problems that enterprises face when
   interconnecting their branch offices with dynamic workloads in third-
   party data centers (Cloud DCs) is described in [Net2Cloud], which
   outlines various issues, including the challenges of ensuring
   reliable, scalable, and efficient network connectivity between
   enterprise sites and cloud-hosted services.  While mitigation
   practices have been referenced by [Net2Cloud], they fall short of
   addressing the dynamic and rapidly changing nature of services placed
   in Cloud DC.  More advanced solutions are needed to make the network
   serve these dynamic services effectively, ensuring that the network
   can adjust in real-time to the changes in service workloads, resource
   allocations, bandwidth requirements, and latency constraints driven
   by cloud-hosted services.

   This draft extends existing YANG models or introduces new ones to
   enable the management of both cloud and network resources in a
   unified, cross-domain manner.  The goal is to optimize dynamic
   resource allocation, allowing services to scale seamlessly across
   public clouds, private clouds, and edge computing nodes while
   ensuring consistency, interoperability, and real time adaptability of
   the network to the dynamically changing services placed in Cloud DC.

2.  Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in BCP
   14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.

3.  Problem Statement

   Current management models face several limitations:

   - Siloed Resource Management: Most current models treat network and
   cloud resources as separate entities, making cross-domain management
   inefficient.

   - Lack of Dynamic Scaling Support: Many models lack the mechanisms
   needed to dynamically allocate and reallocate resources across
   domains based on real-time service demands.

   - Inconsistent Interfaces and Data Models: Inconsistent data models
   across cloud and network platforms hinder seamless integration.

   Limited Support for Edge Environments: Traditional models focus on
   cloud and core network infrastructure, often overlooking edge
   computing platforms where latency-sensitive workloads run.



Dunbar, et al.            Expires 17 April 2025                 [Page 3]

Internet-Draft         Cloud Resource Abstraction           October 2024


   This draft proposes a solution by extending YANG models to facilitate
   cross-domain resource management and efficient scaling.

4.  Data Models Overview

   Several existing IETF YANG models, such as ietf-routing-mgnt
   [RFC8349], ietf-network-instance [RFC8529], and ietf-l3vpn-svc
   [RFC8299], offer foundational models for network resource management.
   However, these models need to be extended to include cloud-specific
   attributes and edge-related extensions.

   The primary design objectives for the extended or new YANG models
   include:

   - Cross-Domain Resource Orchestrator: Provides the high level
   orchestration and policies for managing resources across domains,
   invoking network reconfiguration actions as needed.

   - Dynamic Resource Allocation: Handles the overall allocation of
   resources (compute, storage, network).  For 5G network and beyond,
   Dynamic Resource Allocation can be used to allocating network
   resources based on the needs of federated learning process.

   - Dynamic Network Reconfiguration: To reflect the network dynamic
   adaptation to cloud services, focusing on real-time network
   reconfiguration based on cloud workload needs.  Extend support for
   multi-cloud VPNs, multi-segment SD-WAN [MULTI-SEG-SDWAN], and service
   overlays.

   - Edge Node Resource: edge nodes refer to computing resources placed
   at the edge of the network, closer to the end-user or data source, to
   reduce latency and improve performance for time-sensitive or high-
   bandwidth applications.  Edge nodes can be located Telcom Provider's
   Edge Data Centers, such as Edge DCs for 5G or Regional Micro DC.
   Extend models to manage compute and storage resources on edge
   platforms.

   How they work together:

   - High Level Orchestration (Cross-Domain Resource Orchestrator): The
   orchestrator manages the overall allocation of cloud and network
   resources based on policies and telemetry.

   - Resource Requests (Resource Allocation): When the orchestrator
   detects a need for resource changes (e.g., increased compute or
   bandwidth), it triggers resource requests.  Network resource
   allocation will adapt based on these requests.




Dunbar, et al.            Expires 17 April 2025                 [Page 4]

Internet-Draft         Cloud Resource Abstraction           October 2024


   - Real Time Adjustments (Dynamic Network Reconfiguration): As
   resource demands change (due to dynamic cloud services), the network
   reconfigures in real time.  This includes adjusting bandwidth,
   latency, or other parameters to ensure that the network supports the
   new service requirements effectively.

   - Edge Node Integration (Edge Node Resource): The network
   reconfiguration model can dynamically adjust the network to ensure
   optimal connectivity between edge nodes and cloud services, allowing
   latency-sensitive or bandwidth-intensive applications to operate
   efficiently.

   Together, these models provide a comprehensive framework for
   orchestrating, allocating, and dynamically adjusting network and
   compute resources across cloud, edge, and network domains.  The
   Dynamic Network Reconfiguration model enhances this by ensuring that
   the network component reacts in real-time to the dynamic nature of
   cloud services.

5.  Cross-Domain Resource Orchestrator

   Here is an examplary strcture of YANG model for a Cross-Domain
   Resource Orchestrator.  This model enables the orchestration of cloud
   and network resources, allowing efficient dynamic resource allocation
   and scaling across multiple cloud and network domains.


























Dunbar, et al.            Expires 17 April 2025                 [Page 5]

Internet-Draft         Cloud Resource Abstraction           October 2024


   module: cross-domain-orchestrator
      +--rw orchestrator
         +--rw policies
         |  +--rw policy* [policy-id]
         |     +--rw policy-id               string
         |     +--rw policy-name             string
         |     +--rw policy-type             enumeration
         |     +--rw status                  enumeration
         |     +--rw conditions
         |        +--rw cpu-utilization-threshold    uint8
         |        +--rw memory-utilization-threshold uint8
         |        +--rw latency-threshold            uint32
         |        +--rw bandwidth-threshold          uint32
         +--rw telemetry
         |  +--rw domain* [domain-id]
         |     +--rw domain-id              string
         |     +--rw domain-type            enumeration
         |     +--rw resources
         |     |  +--rw cpu                decimal64
         |     |  +--rw memory             uint64
         |     |  +--rw storage            uint64
         |     |  +--rw bandwidth          uint64
         |     +--rw utilization
         |        +--rw cpu-utilization    decimal64
         |        +--rw memory-utilization decimal64
         |        +--rw storage-utilization decimal64
         |        +--rw bandwidth-utilization decimal64
         |        +--rw latency            uint32
         +--rpc allocate-resources
            +--input
            |  +--rw service-id           string
            |  +--rw resource-type        enumeration
            |  +--rw amount               decimal64
            |  +--rw domain-id            string
            +--output
               +--ro allocation-status    enumeration
               +--ro allocated-amount     decimal64


   Explanation of the structure

   - orchestrator:

   The top-level container for managing the orchestration of resources
   across cloud and network domains.

   - policies:




Dunbar, et al.            Expires 17 April 2025                 [Page 6]

Internet-Draft         Cloud Resource Abstraction           October 2024


   Defines the set of policies that govern resource allocation..

   Each policy has: policy-id, policy-name, polity-type (e.g., the
   purpose of the policy), conditions (e.g., the thresholds (CPU,
   memory, latency, etc.) that trigger the policy).

   - telemetry

   Collects real-time telemetry data from different domains (e.g.,
   cloud, edge, network).

   Each domain contains information about the resources (CPU, memory,
   storage, bandwidth) and their utilization metrics (percentage of
   usage, current latency)

   - Action: allocate-resources (as an RPC or YANG action):

   This defines the action that a service or orchestrator can call to
   request dynamic allocation of resources in real-time.

   Example for using the Action:

   - A cloud-hosted service detects a spike in user traffic and requests
   an additional 50 Mbps of network bandwidth.  The service submits an
   allocate-resources request

   - The orchestration system processes the request based on the current
   telemetry data (bandwidth utilization, network latency) and any
   active policies (scaling, SLA compliance, etc.).  It checks if the
   additional bandwidth is available in the requested domain.

   - If the resources are available, the system returns success.  If
   not, it returns failure.

   - If successful, it shows how much bandwidth (e.g., 50 Mbps) was
   allocated to the service.

6.  Dynamic Resource Allocation for Federated Learning

   The resource needs for federated learning fluctuate depending on the
   phase of the training process, model complexity, and number of
   devices involved.  Dynamic Resource Allocation for Federated Learning
   is a specific type or use case of a Cross-Domain Orchestrator.








Dunbar, et al.            Expires 17 April 2025                 [Page 7]

Internet-Draft         Cloud Resource Abstraction           October 2024


   module: dynamic-resource-allocation-federated-learning
      +--rw dynamic-allocation
         +--rw federated-learning
         |  +--rw training-job* [job-id]
         |     +--rw job-id                        string
         |     +--rw model-type                    string
         |     +--rw device-type                   enumeration
         |     +--rw required-cpu                  decimal64
         |     +--rw required-memory               uint64
         |     +--rw required-storage              uint64
         |     +--rw required-bandwidth            uint64
         |     +--rw latency-tolerance             uint32
         +--rw policies
         |  +--rw policy* [policy-id]
         |     +--rw policy-id                     string
         |     +--rw policy-name                   string
         |     +--rw policy-type                   enumeration
         |     +--rw conditions
         |        +--rw cpu-utilization-threshold   uint8
         |        +--rw memory-utilization-threshold uint8
         |        +--rw bandwidth-utilization-threshold uint8
         |        +--rw latency-threshold           uint32
         +--rw telemetry
         |  +--rw domain* [domain-id]
         |     +--rw domain-id                     string
         |     +--rw domain-type                   enumeration
         |     +--rw resources
         |     |  +--rw cpu                        decimal64
         |     |  +--rw memory                     uint64
         |     |  +--rw storage                    uint64
         |     |  +--rw bandwidth                  uint64
         |     +--rw utilization
         |        +--rw cpu-utilization            decimal64
         |        +--rw memory-utilization         decimal64
         |        +--rw storage-utilization        decimal64
         |        +--rw bandwidth-utilization      decimal64
         |        +--rw latency                    uint32
         +--rpc allocate-resources
            +--input
            |  +--rw job-id                        string
            |  +--rw resource-type                 enumeration
            |  +--rw amount                        decimal64
            |  +--rw domain-id                     string
            +--output
               +--ro allocation-status             enumeration
               +--ro allocated-amount              decimal64





Dunbar, et al.            Expires 17 April 2025                 [Page 8]

Internet-Draft         Cloud Resource Abstraction           October 2024


7.  Dynamic Network Reconfiguration

   This section describe a YANG structure for Dynamic Network
   Reconfiguration, which supports the scenario where services placed in
   Cloud Data Centers (DCs) undergo frequent changes, requiring the
   network to dynamically adapt and reconfigure itself in real time.
   This structure enables the dynamic adjustment of network parameters
   (such as bandwidth, latency, QoS, and paths) based on evolving
   service requirements.


   module: dynamic-network-reconfiguration
      +--rw network-reconfiguration
         +--rw telemetry
         |  +--rw bandwidth-utilization         decimal64
         |  +--rw latency                       uint32
         |  +--rw packet-loss-rate              decimal64
         |  +--rw jitter                        decimal64
         |  +--rw qos-level                     string
         +--rw policies
         |  +--rw policy* [policy-id]
         |     +--rw policy-id                  string
         |     +--rw policy-name                string
         |     +--rw policy-type                enumeration
         |     +--rw conditions
         |        +--rw bandwidth-utilization-threshold    uint8
         |        +--rw latency-threshold                  uint32
         |        +--rw packet-loss-threshold              decimal64
         |        +--rw qos-threshold                      string
         +--rpc reconfigure-network
            +--input
            |  +--rw service-id                 string
            |  +--rw target-latency             uint32
            |  +--rw target-bandwidth           uint64
            |  +--rw target-qos                 string
            |  +--rw target-packet-loss         decimal64
            |  +--rw target-jitter              decimal64
            +--output
               +--ro reconfiguration-status     enumeration
               +--ro achieved-latency           uint32
               +--ro achieved-bandwidth         uint64
               +--ro achieved-qos               string
               +--ro achieved-packet-loss       decimal64
               +--ro achieved-jitter            decimal64



   Explanation of the structure:



Dunbar, et al.            Expires 17 April 2025                 [Page 9]

Internet-Draft         Cloud Resource Abstraction           October 2024


   The telemetry container collects real-time data about the current
   state of the network, which is used to determine whether network
   reconfiguration is needed to accommodate changes in cloud services.

   Policies govern how and when the network should be dynamically
   reconfigured.  Each policy has specific conditions that, when met,
   trigger network reconfiguration.

   This action (or RPC) is the primary mechanism for dynamically
   reconfiguring the network in real-time.  When triggered, it adjusts
   the network settings to meet the new requirements of services running
   in cloud data centers.

   How it works together:

   The system continuously monitors network conditions (bandwidth usage,
   latency, packet loss, jitter) using telemetry data.  As services in
   cloud data centers evolve, this data helps determine whether the
   network is performing within acceptable limits.

   When telemetry data indicates that certain thresholds are being
   breached (e.g., high latency or packet loss), policies are triggered.
   For example, if bandwidth usage exceeds 80%, the system may allocate
   more bandwidth to ensure the services continue to operate smoothly.

   The reconfigure-network action is called in real-time to adjust the
   network parameters, including bandwidth, latency, packet loss, and
   QoS, to accommodate changes in cloud services.  This action ensures
   the network can keep up with the frequent modifications to services
   hosted in the cloud.

8.  Edge Computing Node

   Below is the YANG tree structure designed to enable resource
   allocation close to the end-user or device, specifically optimized
   for latency-sensitive workloads.  It includes support for Mobile Edge
   Computing (MEC) and integration with 5G edge computing.  The
   structure allows for dynamic allocation of compute, storage, and
   network resources, with real-time adjustments based on the needs of
   low-latency applications like IoT, AR/VR, and real-time analytics.











Dunbar, et al.            Expires 17 April 2025                [Page 10]

Internet-Draft         Cloud Resource Abstraction           October 2024


   module: mec-5g-resource-allocation
      +--rw edge-resource-allocation
         +--rw telemetry
         |  +--rw latency                       uint32
         |  +--rw bandwidth-utilization         decimal64
         |  +--rw edge-cpu-utilization          decimal64
         |  +--rw edge-memory-utilization       decimal64
         |  +--rw edge-storage-utilization      decimal64
         +--rw policies
         |  +--rw policy* [policy-id]
         |     +--rw policy-id                  string
         |     +--rw policy-name                string
         |     +--rw policy-type                enumeration
         |     +--rw conditions
         |        +--rw latency-threshold             uint32
         |        +--rw bandwidth-utilization-threshold uint8
         |        +--rw edge-cpu-utilization-threshold  uint8
         |        +--rw edge-memory-utilization-threshold uint8
         +--rw resource-allocation
         |  +--rw workload* [workload-id]
         |     +--rw workload-id                string
         |     +--rw workload-type              enumeration
         |     +--rw required-latency           uint32
         |     +--rw required-bandwidth         uint64
         |     +--rw required-edge-cpu          decimal64
         |     +--rw required-edge-memory       uint64
         |     +--rw required-edge-storage      uint64
         +--rpc allocate-edge-resources
            +--input
            |  +--rw workload-id               string
            |  +--rw target-latency            uint32
            |  +--rw target-bandwidth          uint64
            |  +--rw target-edge-cpu           decimal64
            |  +--rw target-edge-memory        uint64
            |  +--rw target-edge-storage       uint64
            +--output
               +--ro allocation-status         enumeration
               +--ro achieved-latency          uint32
               +--ro achieved-bandwidth        uint64
               +--ro allocated-edge-cpu        decimal64
               +--ro allocated-edge-memory     uint64
               +--ro allocated-edge-storage    uint64









Dunbar, et al.            Expires 17 April 2025                [Page 11]

Internet-Draft         Cloud Resource Abstraction           October 2024


9.  Security Considerations

   Authentication and Authorization: The orchestrator must authenticate
   requests using secure credentials (e.g., OAuth tokens, X.509
   certificates).

   Data Encryption: All data exchanged between domains, especially
   telemetry and resource allocation requests, must be encrypted using
   protocols like TLS.

   Access Control: Role-Based Access Control (RBAC) must be implemented
   to ensure that only authorized users can request or allocate
   resources.

10.  IANA Considerations

   TBD

11.  References

11.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/info/rfc2119>.

   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
              May 2017, <https://www.rfc-editor.org/info/rfc8174>.

   [RFC8299]  Wu, Q., Ed., Litkowski, S., Tomotaki, L., and K. Ogaki,
              "YANG Data Model for L3VPN Service Delivery", RFC 8299,
              DOI 10.17487/RFC8299, January 2018,
              <https://www.rfc-editor.org/info/rfc8299>.

   [RFC8349]  Lhotka, L., Lindem, A., and Y. Qu, "A YANG Data Model for
              Routing Management (NMDA Version)", RFC 8349,
              DOI 10.17487/RFC8349, March 2018,
              <https://www.rfc-editor.org/info/rfc8349>.

   [RFC8529]  Berger, L., Hopps, C., Lindem, A., Bogdanovic, D., and X.
              Liu, "YANG Data Model for Network Instances", RFC 8529,
              DOI 10.17487/RFC8529, March 2019,
              <https://www.rfc-editor.org/info/rfc8529>.

11.2.  Informative References




Dunbar, et al.            Expires 17 April 2025                [Page 12]

Internet-Draft         Cloud Resource Abstraction           October 2024


   [Net2Cloud]
              L. Dunbar, et al, "Net2Cloud", Net2Cloud 
              https://datatracker.ietf.org/doc/draft-ietf-rtgwg-
              net2cloud-problem-statement/.

Acknowledgements

   The authors would like to thank for following for discussions and
   providing input to this document: xxx.

Contributors

Authors' Addresses

   Linda Dunbar (editor)
   Futurewei
   United States of America
   Email: ldunbar@futurewei.com


   ChongFeng Xie
   China Telecom
   China
   Email: chongfeng.xie@foxmail.com


   Qiang Sun
   China Telecom
   China
   Email: sunqiong@chinatelecom.com





















Dunbar, et al.            Expires 17 April 2025                [Page 13]