SIGCOMM 2022 abstracts of posters - Sabrou-mal サブロウ丸

26 (Online): Deep or Statistical: An Empirical Study of Traffic Predictions on Multiple Time Scales
49 (On-site): Network-Accelerated Cluster Scheduler
63 (On-site): Mind the Cost of Telemetry Data Analysis
65 (Online): PipeDevice: A Hardware-Software Co-Design Approach to Intra-Host Container Communication
89 (On-site): Linnet: Limit Order Books Within Switches
97 (On-site): Accelerating Kubernetes with In-network Caching
99 (On-site): CiraaS: cloud computing with programmable logic
107 (On-site): P4Pir: In-Network Analysis for Smart IoT Gateways
118 (On-site): Enabling IoT Self-Localization Using Ambient 5G mmWave Signals
159 (On-site): Robust Heuristics: Attacks and Defenses on Job Size Estimation for WSJF Systems

26 (Online): Deep or Statistical: An Empirical Study of Traffic Predictions on Multiple Time Scales

Yu Qiao, Chengxiang Li, Shuzheng Hao, Jun Wu, and Liang Zhang (Huawei Technologies Ltd.)

Traffic prediction aims to forecast the future traffic level based on past observations, which is a crucial technic in the field of network communication. Traffic prediction has a wide range of application, such as network congestion control and bandwidth allocation, which is a crucial technic in the field of network communications. There are briefly three categories of traffic prediction methods, which extract temporal patterns and inter-dependencies from the sequential data. Statistical models is a linear combination of sequences and noise factors; machine learning models regard the time series analysis as a regression problem; deep learning is a black-box algorithm for extracting temporal patterns. In [5], Yi et al. studied the traffic forecasting on multiple time scales via statistical models. Newly developed tools such as LSTNet [3] motivate us to compare the performance of different models on multiple time scales.

トラフィック予測は、過去の観測結果をもとに将来のトラフィック量を予測するもので、ネットワーク通信の分野で重要な技術である。トラフィック予測は、ネットワークの輻輳制御や帯域割り当てなど、ネットワーク通信の分野で重要な技術であり、その応用範囲は広い。逐次データから時間的なパターンや相互依存関係を抽出するトラフィック予測手法には、簡単に言うと3つのカテゴリーがある。統計モデルは系列とノイズ要因の線形結合，機械学習モデルは時系列分析を回帰問題とみなす，深層学習は時間的パターンを抽出するブラックボックス・アルゴリズムである．5]では、Yiらが統計モデルによる複数の時間スケールでの交通予測を研究しています。LSTNet [3]のような新しく開発されたツールは、複数の時間スケールで異なるモデルの性能を比較する動機付けとなる。

49 (On-site): Network-Accelerated Cluster Scheduler

Radostin Stoyanov, Wesley Armour, and Noa Zilberman (University of Oxford)

Efficient use of computing clusters is crucial in large-scale data centers: even small gains in utilization can save millions of dollars. However, as the number of microsecond-scale tasks increases, using a CPU to schedule tasks becomes inefficient. Cluster scheduling running within the network can solve this problem, and brings additional benefits in scalability, performance and power efficiency. However, the resource constraints of programmable network devices make network-accelerated cluster scheduling hard. In this paper we propose P4-K8s-Scheduler, a network-accelerated cluster scheduler for Kubernetes implemented on a programmable network device. Preliminary results show that by scheduling Pods in the network at line-rate, P4-K8s-Scheduler can reduce the scheduling overheads by an order of magnitude compared to state-of-the-art Kubernetes schedulers.

大規模なデータセンターでは、コンピューティングクラスターの効率的な利用が重要です。わずかな利用率の向上でも、数百万ドルを節約することができます。しかし、マイクロ秒単位のタスクが増えると、CPUを使ったタスクスケジューリングは非効率になります。ネットワーク上で動作するクラスタースケジューリングは、この問題を解決し、スケーラビリティ、パフォーマンス、電力効率の面でさらなる利点をもたらすことができます。しかし、プログラマブルネットワークデバイスのリソース制約により、ネットワークで高速化されたクラスタスケジューリングは困難である。本論文では、プログラマブルネットワークデバイスに実装されたKubernetes用のネットワークアクセラレーションクラスタースケジューラーであるP4-K8s-Schedulerを提案します。予備的な結果では、ネットワーク上のPodをラインレートでスケジューリングすることで、P4-K8s-Schedulerは、最新のKubernetesスケジューラと比較して、スケジューリングのオーバーヘッドを1桁削減することができます。

63 (On-site): Mind the Cost of Telemetry Data Analysis

Alessandra Fais (Università di Pisa), Gianni Antichi (Queen Mary University of London), Stefano Giordano, Giuseppe Lettieri, and Gregorio Procissi (Università di Pisa)

Data Stream Processing engines are emerging as a promising solution to efficiently process a continuous amount of telemetry information. In this poster, we compare four of them: Storm, Flink, Spark and WindFlow. The aim is to shed some lights on the best streaming engine for network traffic analysis.

データストリーム処理エンジンは、連続した量のテレメトリ情報を効率的に処理するための有望なソリューションとして浮上しています。このポスターでは、そのうちの4つを比較します： Storm、Flink、Spark とWindFlowの4つを比較します。その目的は、ネットワークトラフィック解析に最適なストリーミングエンジンに光を当てることです。エンジンに光を当てることです。

65 (Online): PipeDevice: A Hardware-Software Co-Design Approach to Intra-Host Container Communication

Qiang Su (City University of Hong Kong), Chuanwen Wang (The Chinese University of Hong Kong), Zhixiong Niu, Ran Shu, Peng

Containers are prevalently adopted due to the deployment and performance advantages over virtual machines. For many containerized data-intensive applications, however, bulky data transfers may pose performance issues. In particular, communication across colocated containers on the same host incurs large overheads in memory copy and the kernel’s TCP stack. Existing solutions such as shared-memory networking and RDMA have their own limitations, including insufficient memory isolation and limited scalability. This paper presents PipeDevice, a new system for low overhead intra-host container communication. PipeDevice follows a hardware-software co-design approach — it offloads data forwarding entirely onto hardware, which accesses application data in hugepages on the host, thereby eliminating CPU overhead from memory copy and TCP processing. PipeDevice preserves memory isolation and scales well to connections, making it deployable in public clouds. Isolation is achieved by allocating dedicated memory to each connection from hugepages. To achieve high scalability, PipeDevice stores the connection states entirely in host DRAM and manages them in software. Evaluation with a prototype implementation on commodity FPGA shows that for delivering 80 Gbps across containers PipeDevice saves 63.2% CPU compared to kernel TCP stack, and 40.5% over FreeFlow. PipeDevice provides salient benefits to applications. For example, we port baidu-allreduce to PipeDevice and obtain ∼2.2× gains in allreduce throughput.

コンテナは、仮想マシンに比べて導入しやすく、性能面でも優れているため、広く採用されています。しかし、多くのコンテナ型データ集約型アプリケーションでは、大量のデータ転送が性能上の問題となる場合があります。特に、同じホスト上に配置されたコンテナ間の通信では、メモリコピーとカーネルのTCPスタックに大きなオーバーヘッドが発生します。共有メモリネットワークやRDMAなどの既存のソリューションには、メモリ分離が不十分であったり、スケーラビリティに限界があるなど、独自の限界があります。本論文では、低オーバーヘッドのホスト内コンテナ通信のための新しいシステムであるPipeDeviceを紹介します。PipeDeviceは、ハードウェアとソフトウェアの共同設計アプローチに基づき、データ転送をハードウェアに完全にオフロードし、ホスト上のHugepagesでアプリケーションデータにアクセスすることで、メモリコピーとTCP処理によるCPUオーバーヘッドを排除しています。PipeDeviceはメモリ分離を維持し、接続数の拡張性に優れているため、パブリッククラウドでの展開が可能です。分離は、hugepagesから各接続に専用メモリを割り当てることで実現されます。高いスケーラビリティを実現するために、PipeDeviceは接続状態を完全にホストDRAMに保存し、ソフトウェアで管理します。コモディティ FPGA上のプロトタイプ実装による評価では、コンテナ間で80Gbpsの通信を行う場合、PipeDeviceはカーネル TCPスタックと比較して63.2%、FreeFlowと比較して40.5%のCPU削減を達成しています。PipeDeviceは、アプリケーションに顕著なメリットをもたらします。例えば、baidu-allreduceをPipeDeviceに移植したところ、allreduceのスループットが約2.2倍向上しています。

89 (On-site): Linnet: Limit Order Books Within Switches

Xinpeng Hong, Changgang Zheng, Stefan Zohren, and Noa Zilberman (University of Oxford)

Financial trading often relies nowadays on machine learning. However, many trading applications require very short response times, which cannot always be supported by traditional machine learning frameworks. We present Linnet, providing financial market prediction within programmable switches. Linnet builds limit order books from high-frequency market data feeds within the switch, and uses them for machine-learning based market prediction. Linnet demonstrates the potential to predict future stock price movements with high accuracy and low latency, increasing financial gains.

金融取引では、昨今、機械学習に頼ることが多くなっています。しかし、多くの取引アプリケーションは非常に短い応答時間を必要とし、従来の機械学習フレームワークでは対応できない場合がある。私たちは、プログラマブルスイッチで金融市場の予測を行うLinnetを紹介します。Linnetは、スイッチ内の高頻度市場データフィードから指値注文帳を構築し、それを機械学習ベースの市場予測に利用します。Linnetは、高い精度と低いレイテンシで将来の株価の動きを予測し、金融利益を増大させる可能性を示しています。

97 (On-site): Accelerating Kubernetes with In-network Caching

Stefanos Sagkriotis and Dimitrios Pezaros (University of Glasgow)

We present a new Kubernetes architecture that leverages in-network caching to accelerate one of Kubernetes' core components, its key-value store. We also identify performance limitations of previous in-network caching platforms and propose a new platform that demonstrates better throughput and scalability by utilising a different replication method.

Kubernetesのコアコンポーネントの1つであるキーバリューストアを高速化するために、ネットワーク内キャッシングを活用した新しいKubernetes アーキテクチャを紹介する。また、従来のネットワーク内キャッシングプラットフォームの性能限界を明らかにし、異なるレプリケーション方法を利用することでより優れたスループットとスケーラビリティを実証する新しいプラットフォームを提案します。

99 (On-site): CiraaS: cloud computing with programmable logic

Kenji Tanaka, Yuki Arikawa, Tsuyoshi Ito (NTT Corporation), Yuki Matsuda, Keisuke Kamahori, Shinya Kaji (Fixstars Corporation), Takeshi Sakamoto (NTT Corporation), and Kenji Tanaka (NTT Device Technology Labs)

Cloud computing reduces provider and user costs by multiplexing workloads. Advantages of cloud computing include high utilization by temporal and spatial sharing computing resources and a subscription model that charges only for the resources and time used[1]. As recent cloud computing has evolved to maximize these advantages, microservices [11], function-as-a-service (FaaS) [10], and other more granular and short-lived cloud services have emerged. Today's FaaS are oriented towards fast provisioning, fine-grained billing times, tight memory constraints, stateless processing, and real-time processing [15]. However, since the context switching in CPUs is the bottleneck [8], their processing demands are no longer satisfactorily met. Therefore, a shift is expected toward a more efficient system architecture for cloud computing [10].

クラウドコンピューティングは、ワークロードを多重化することで、プロバイダーとユーザーのコストを削減します。クラウドコンピューティングの利点として、コンピューティングリソースを時間的・空間的に共有することによる高い利用率や、利用したリソースと時間に対してのみ課金されるサブスクリプションモデルが挙げられる[1]。近年のクラウドコンピューティングは、これらの利点を最大限に活かすべく進化し、マイクロサービス[11]、ファンクション・アズ・ア・サービス（FaaS）[10]など、より粒度が細かく短命なクラウドサービスが登場しました。今日のFaaSは、高速プロビジョニング、きめ細かい課金時間、厳しいメモリ制約、ステートレス処理、リアルタイム処理などを指向している[15]。しかし、CPUのコンテキストスイッチングがボトルネックとなっているため[8]、その処理要求を満足に満たすことができなくなっている。そのため、クラウドコンピューティングでは、より効率的なシステムアーキテクチャへのシフトが期待されている[10]。

107 (On-site): P4Pir: In-Network Analysis for Smart IoT Gateways

Mingyuan Zang (Technical University of Denmark), Changgang Zheng, Radostin Stoyanov (University of Oxford), Lars Dittmann (Technical University of Denmark), and Noa Zilberman (University of Oxford)

IoT gateways are vital to the scalability and security of IoT networks. As more devices connect to the network, traditional hard-coded gateways fail to flexibly process diverse IoT traffic from highly dynamic devices. This calls for a more advanced analysis solution. In this work, we present P4Pir, an in-network traffic analysis solution for IoT gateways. It utilizes programmable data planes for in-band traffic learning with self-driven machine learning model updates. Preliminary results show that P4Pir can accurately detect emerging attacks based on retraining and updating the machine learning model.

IoTゲートウェイは、IoTネットワークのスケーラビリティとセキュリティに不可欠です。ネットワークに接続するデバイスが増えるにつれ、従来のハードコーディングされたゲートウェイは、非常にダイナミックなデバイスからの多様なIoTトラフィックを柔軟に処理することができなくなります。このため、より高度な解析ソリューションが求められています。本研究では、IoTゲートウェイ向けのネットワーク内トラフィック解析ソリューションであるP4Pirを紹介します。P4Pirは、プログラマブルなデータプレーンを利用して、自己駆動型の機械学習モデルの更新を伴うインバンドトラフィック学習を行います。予備的な結果では、P4Pirは機械学習モデルの再学習と更新に基づき、新たな攻撃を正確に検出できることが示されています。

118 (On-site): Enabling IoT Self-Localization Using Ambient 5G mmWave Signals

Junfeng Guan, Suraj Jog, Sohrab Madani (University of Illinois Urbana-Champaign), Ruochen Lu (University of Texas at Austin), Songbin Gong, Deepak Vasisht (University of Illinois Urbana-Champaign), and Haitham Hassanieh (EPFL)

The small cell size, wide bandwidth, and MIMO antenna arrays in 5G mmWave networks provide great opportunities for IoT localization. However, low-power and low-cost IoT devices are incapable of leveraging these benefits. We present mm-ISLA: a system that enables IoT nodes to localize themselves using ambient 5G mmWave signals without any coordination with the base stations. mm-ISLA leverages MEMS Spike-Train filters to access the wideband 5G signals and estimates the Angle of Departure from the base station MIMO antenna arrays to accurately localize the IoT nodes.

5G mmWaveネットワークにおける小さなセルサイズ、広い帯域幅、およびMIMOアンテナアレイは、IoTローカライズに大きな機会をもたらします。しかし、低電力・低コストのIoTデバイスは、これらの利点を活用することができません。mm-ISLAは、MEMSスパイクトレーンフィルターを活用して広帯域の5G信号にアクセスし、基地局のMIMOアンテナアレイからの離隔角を推定してIoTノードを正確にローカライズするシステムである。

159 (On-site): Robust Heuristics: Attacks and Defenses on Job Size Estimation for WSJF Systems

Erica Chiang, Nirav Atre, Hugo Sadok, Justine Sherry, Weina Wang (Carnegie Mellon University)

Packet scheduling algorithms control the order in which a system serves network packets, which can have significant impact on system performance. Many systems rely on Shortest Job First (SJF), an important packet scheduling algorithm with many desirable properties. Classic results [3] show that SJF provably minimizes average job completion time, and recent work [1] shows that a variant of SJF also protects systems against algorithmic complexity attacks (ACAs), a particularly dangerous class of Denial-of-Service (DoS) attacks [4]. In an ACA, an adversary exploits the worst-case behavior of an algorithm in order to induce a large amount of work in the target system, causing a significant drop in goodput despite using only a small amount of attack bandwidth. SurgeProtector [1] demonstrated that using Weighted SJF (WSJF) - scheduling packets by the ratio of job size to packet size - significantly mitigates the impact of ACAs on any networked system.

パケットスケジューリングアルゴリズムは、システムがネットワークパケットを提供する順序を制御し、システムの性能に大きな影響を与えることがあります。多くのシステムは、多くの望ましい特性を持つ重要なパケットスケジューリングアルゴリズムであるShortest Job First (SJF)に依存しています。古典的な結果 [3] は、SJF が平均ジョブ完了時間を証明的に最小化することを示し、最近の研究 [1] は、SJF の変種が、サービス拒否 (DoS) 攻撃の特に危険なクラスであるアルゴリズム複雑性攻撃 (ACA) [4] からシステムを守ることを示す。ACAでは、敵対者がアルゴリズムのワーストケース動作を悪用して、ターゲットシステムに大量の作業を誘発し、わずかな攻撃帯域しか使用していないにもかかわらず、グッドプットを大幅に低下させることがあります。SurgeProtector [1]は、パケットサイズに対するジョブサイズの比率でパケットをスケジューリングするWeighted SJF（WSJF）の使用により、あらゆるネットワークシステムにおけるACAの影響を大幅に緩和することを実証しました。