Internet-Draft Problems Statement for High Performance December 2024
Xiong, et al. Expires 8 June 2025 [Page]
Workgroup:
Network Working Group
Internet-Draft:
draft-xiong-hpwan-problem-statement-00
Published:
Intended Status:
Informational
Expires:
Authors:
Q. Xiong
ZTE Corporation
K. Yao
China Mobile
C. Huang
China Telecom
Z. Han
China Unicom
J. Zhao
CAICT

Problem Statement for High Performance Wide Area Networks

Abstract

High Performance Wide Area Network (HP-WAN) is designed for many applications such as scientific research, academia, education and other data-intensive applications which demand large volume data transmission over WANs, and it needs to ensure large-scale data processing and provide efficient transmission services. This document outlines the problems for HP-WANs.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 8 June 2025.

Table of Contents

1. Introduction

As described in [I-D.kcrh-hpwan-state-of-art], data is fundamental for research, academia, education, industrial and other data-intensive applications, such as High Performance Computing (HPC) for scientific research, cloud storage and backup of industrial internet data, distributed training of Artificial Intelligence (AI), and so on. Within these applications, they may generate huge volumes of data by using advanced instruments and high-end computing devices. It needs to ensure large-scale data transfer within a completion time and provide stable and efficient transmission services over non-dedicated Wide Area Networks (WANs). These WANs need to connect research institutions, universities, and data centers across large geographical areas, and it usually requires massive data transmission over long-distance links. For example, sharing data between research institutes must transfer over hundreds or thousands of kilometers. Moreover, some applications may demand a periodic and on-demand migration with variable transmission frequency, requiring timely data transmission. The large data transfer co-existed services over WANs demand high performance, such as effective high-throughput, fairness among multiple services, and high network utilization.

More recently, the massive data transmission and long-distance connection over complicated WANs have become a key factor affecting the performance of existing technologies. For example, the high-volume data may be transmitted over WANs, which depends on the transport layer protocols such as Transfer Control Protocol (TCP), Quick UDP Internet Connections (QUIC), Remote Direct Memory Access (RDMA) and so on. The traditional congestion control mechanisms can not achieve the high performance, which are typically implemented at the host (sender and receiver) to control or prevent the congestion. For the host, it may adjust sending rates based on the feedback from the network when the packet loss or congestion occurred. But it will impact the performance with the long feedback loop and it could also be inefficient without the fine-grained awareness of network capability. For the network, it always reactively transfers the packets leading to low bandwidth utilization due to the bottleneck link and instantaneous congestion. For example, the network could enhance the capability to regulate the traffic to avoid incast network congestion preemptively and it could also be actively collaborated with the host to adjust the rate efficiently and rapidly when congestion occurred. The negotiation between the host and the network is required to assist the network operator's traffic management and bandwidth allocation and utilization optimization and help the host to adjust the rate with the network resource scheduling acknowledgement. So the host with sophisticated congestion control upon more active network coordination should be considered to improve overall HP-WANs transmission performance.

High Performance Wide Area Network (HP-WAN) is designed specifically to meet the high-speed, low-latency, and high-capacity needs of massive data set applications, which puts forward high performance requirements such as effective high-throughput, multiple service fairness and high bandwidth utilization. This document outlines the problems for HP-WANs.

1.1. Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

2. Terminology

The terminology is defined as following.

High Performance Wide Area Networks (HP-WANs): indicate the wide area networks designed specifically to meet the high-speed, low-latency, and high-capacity needs of research, academia, education, industrial and other data-intensive applications. The primary goal of HP-WAN is to achieve massive data transmission within a completion time, which puts forward high performance requirements such as effective high-throughput, multiple service fairness and high bandwidth utilization.

It also makes use of the following abbreviations and definitions in this document:

DC:
Data Center
DCI:
Data Centers Interconnection
HPC:
High Performance Computing
WAN:
Wide Area Networks
MAN:
Metropolitan Area Networks
PFC:
Priority Flow Control
ECN:
Explicit Congestion Notification
ECMP:
Equal-Cost Multipath
RTT:
Round-Trip Time
TCP:
Transfer Control Protocol
RDMA:
Remote Direct Memory Access
QUIC:
Quick UDP Internet Connections

3. High-performance Goals for HP-WANs

The services need to be provided in HP-WANs mainly focus on massive data with timely transmission while multiple services may co-exist over long-distance networks as described below.

From the application perspective, it is required to achieve effective high-throughput data transmission for an HP-WAN flow to meet a completion time. Moreover, it is also crucial to maximize bandwidth utilization while ensuring fairness among multiple services. This document outlines the high-performance requirements for HP-WANs as described below.

4. Problem Statements

It will be challenging to provide effective high-performance transmission in HP-WANs scenarios with massive concurrent services and long-distance delays and packet loss. The long-distance networks may have more uncertainties, such as long Round-Trip Time (RTT) latency, routing changes, network congestion, packet loss and link quality fluctuations, all of which may have a negative impact on the throughput. The services are massive and concurrent with multiple types and different traffic models such as the elephant flows with short interval time, high speed and large data scale, which may occupy a large amount of network resources and lead to the unfairness among different flows, low network utilization and cost-effectiveness.

The existing network technologies have various problems and cannot meet the performance requirements. This document outlines the problems for HP-WANs.

4.1. Long-distance Delay and Slow Feedback

Several congestion control algorithms are implemented such as loss-based congestion control algorithms (e.g. Reno and CUBIC, it depends the congestion notification with packet loss) and congestion-based congestion control algorithms (e.g. BBR, it depends on the measurement of congestion). It will delay the network state feedback due to the long-distance transmission delays and large RTT, resulting in the inability to adjust the transmission rate in a timely manner. It will be challenging for congestion control in WANs for controlling the total amount of data entering the network to maintain the traffic at an acceptable level. Feedback should be independent of the transmission distance, and as timely as possible.

For example, Explicit Congestion Notification (ECN) can be used for Reno and CUBIC to achieve an end-to-end congestion notification based on IP and transport layers. When a congestion occurred, the network may signal congestion by ECN markings or by dropping packets, and the receiver passes this information back to the sender in transport-layer acknowledgements, notifying the source to adjust the transmission rate to achieve congestion control. The long-distance will delay the notification and slow the feedback, which result in untimely adjustment and buffer overflow, causing a decrease in network performance. Especially for incast congestion based on multi-source targeting, the network needs to send a fast feedback based on offered load.

For BBR, it actively measures bottleneck bandwidth (BtlBw) and round-trip propagation time (RTprop) based on the model to calculate the bandwidth delay product (BDP) and then to adjust the transmission rate to maximize throughput and minimize latency. But BBR relies on real-time measurement of the parameters which may vary greatly, feedback slowly, thereby affecting the control precision of BBR in long-distance networks.

Moreover, other congestion control algorithms such as the Data Center Quantized Congestion Notification (DCQCN) and High Precision Congestion Control (HPCC++) would not tolerate the slow feedback loop over WANs.

4.2. Coarse-grained Exploitation of Network Capacities

The existing congestion control mechanisms focus on rate adjustment, which can control the sending rate of data flows at the source of data transmission, thereby avoiding or reducing network congestion. It will be challenging for the host to adjust the sending rates efficiently without the awareness of network capacity. For example, for CUBIC, as per [RFC9438], when the packet loss is detected using classic ECN mechanism, it will reduce the congestion window based on its multiplicative window decrease factor, that will adjust the sending rate with sawtooth pattern. And for L4S as per [RFC9330], it uses more frequent ECN tagging to provide low latency and scalable throughput and to reduce the convergence time and eliminate the sawtooth effect. However, due to ECN feedback of congestion and frequent rate adjustment, it will result in significant changes in throughput, which affects bandwidth utilization and transmission efficiency. It still lack more accurate network information which is critical for significant transmission capacity gaps between the appropriate sending rate and the available network capacity especially when transmitting the high-volume data over WANs .

Moreover, it incurs inconsistency between the sending rate of the host and the network transmission capability to achieve accurate sending rate adjusting. For example, when determining the starting rate of data transmission, the slow start in congestion control will lead to overall throughput bottleneck with insufficient bandwidth utilization and fail to fully unleash the potential of the network capacity. But the fast start can not adapt to the cache capacity of network devices especially when multiple flows are transmitted over the same link, causing network congestion and resulting in packet loss and transmission delay. For HP-WANs, the fine-grained network-aware sending rate negotiation needs to comprehensively consider factors such as predictable network bandwidth, latency, packet loss rate, while balancing bandwidth utilization and congestion avoidance in WANs.

4.3. Instantaneous Traffic

From the network perspective, it can just reactively transfer the high-volume data without scheduling the predictable traffic and network resources to estimate network congestion preemptively. It will be challenging for the network without the awareness of instantaneous traffic which will occupy a large amount of network resources, resulting in low bandwidth utilization due to the uneven resource allocation.

For example, in HP-WAN applications, a large amount of data will be transmitted, e.g. the data volumes of a single flow may be from 10G to 1TB, the massive data transferring with large burst may cause instantaneous congestion, packet loss, and queuing delay within network devices in WANs. There will be more aggregations at the edge of WANs and it may be accumulated as the flows traverse, join, and separate over hops. It will be challenging for unmanageable congestion control for the bursty traffic.

Moreover, goodput bottleneck with transmission completion time and duration brings traffic scheduling challenging. The applications may have multiple concurrent services co-existed with existing dynamic flows. Considering the multiple services with various types and different traffic requirements, the traffic is required to be scheduled to multiple paths and fine-grained network resources to achieve high utilization and QoS guarantee.

4.4. Incast Congestion upon Bottleneck Links

It will be challenging for incast congestion causing by bottleneck links bandwidth in long-distance and multi-hop networks. And it will be difficult to control packet loss, queuing latency and jitter leading to the decrease of throughput. Incast traffic is the mastermind of congestion for the greedy transmission. The network may regulate them to avoid congestion preemptively. It may proactively avoid the path-level congestion and operate actively reserving and allocating network bandwidth through a scheduler to match the bottleneck link bandwidth as much as possible, thus fully utilizing bandwidth and preventing packet loss.

Moreover, the congestion in the network can be reduced, thereby reducing packet loss caused by buffer overflow, through effective flow control which refers to a method for ensuring the data is transmitted efficiently and reliably and controlling the rate of data transmission to prevent the fast sender from overwhelming the slow receiver and prevent packet loss in congested situations. But it will be challenging to ensure the fairness among multiple services over different distances due to the unequal allocation of network resources among flows with different RTTs. For example, some flows may occupy more bandwidth due to the use of large window sizes, smaller RTTs, or larger packets.

5. Security Considerations

This document covers several of representative applications and network scenarios that are expected to make use of HP-WAN technologies. Each of the potential use cases does not raise any security concerns or issues, but may have security considerations from both the use-specific perspective and the technology-specific perspective.

6. IANA Considerations

This document makes no requests for IANA action.

7. Acknowledgements

The authors would like to acknowledge Guangping Huang, Yao Liu and Zheng Zhang for their thorough review and very helpful comments.

8. References

8.1. Normative References

[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/info/rfc2119>.
[RFC3168]
Ramakrishnan, K., Floyd, S., and D. Black, "The Addition of Explicit Congestion Notification (ECN) to IP", RFC 3168, DOI 10.17487/RFC3168, , <https://www.rfc-editor.org/info/rfc3168>.
[RFC7424]
Krishnan, R., Yong, L., Ghanwani, A., So, N., and B. Khasnabish, "Mechanisms for Optimizing Link Aggregation Group (LAG) and Equal-Cost Multipath (ECMP) Component Link Utilization in Networks", RFC 7424, DOI 10.17487/RFC7424, , <https://www.rfc-editor.org/info/rfc7424>.
[RFC8174]
Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, , <https://www.rfc-editor.org/info/rfc8174>.
[RFC8664]
Sivabalan, S., Filsfils, C., Tantsura, J., Henderickx, W., and J. Hardwick, "Path Computation Element Communication Protocol (PCEP) Extensions for Segment Routing", RFC 8664, DOI 10.17487/RFC8664, , <https://www.rfc-editor.org/info/rfc8664>.
[RFC9232]
Song, H., Qin, F., Martinez-Julia, P., Ciavaglia, L., and A. Wang, "Network Telemetry Framework", RFC 9232, DOI 10.17487/RFC9232, , <https://www.rfc-editor.org/info/rfc9232>.
[RFC9330]
Briscoe, B., Ed., De Schepper, K., Bagnulo, M., and G. White, "Low Latency, Low Loss, and Scalable Throughput (L4S) Internet Service: Architecture", RFC 9330, DOI 10.17487/RFC9330, , <https://www.rfc-editor.org/info/rfc9330>.
[RFC9438]
Xu, L., Ha, S., Rhee, I., Goel, V., and L. Eggert, Ed., "CUBIC for Fast and Long-Distance Networks", RFC 9438, DOI 10.17487/RFC9438, , <https://www.rfc-editor.org/info/rfc9438>.

Authors' Addresses

Quan Xiong
ZTE Corporation
China
Kehan Yao
China Mobile
China
Cancan Huang
China Telecom
China
Zhengxin Han
China Unicom
China
Junfeng Zhao
CAICT
Beijing
China