这是一篇翻译文章, 原文是Different models of concurrency(by Ted Kaminski). 文章主要向我们介绍了多线程和事件循环这两种并发模型, 以及基于事件循环后的绿色进程和await模式, 可以帮助我们理解现代并发模式的各种实现.

The synchronous threaded model

We traditionally are exposed to a lot of synchronous APIs from our operating systems. These APIs are the root of all evil here. If we were not terribly interested in I/O, and we were really focused on compute (i.e. parallelism more than concurrency), then the threaded model is fine. But once we start to depend on those synchronous APIs, we start to suffer problems.

一直以来操作系统提供给我们的很多API都是同步的, 而这些同步API对于并发就是万恶之源. 对于更侧重于计算而不是I/O(并行比并发更重要)的场景, 多线程模式是适用的. 但如果需要用到同步的API, 问题就开始开始出现了.

The most interesting case for the synchronous model is mixing compute and I/O. Once given a workload, the only tunable parameter we have is thread count. In the figure above, I try to sketch a waste curve, showing the costs of thread count choices. There is no ideal choice here. We can try to tune for minimal waste, but the synchronous threaded model simply imposes waste, no matter what.

But this model has further warts. The heavier the I/O workload, the more it starts to break down. Part of the trouble is that submitting more synchronous I/O requests requires more threads. And more threads means more memory use. The classic “C10K” problem is rooted in this conundrum: if an additional concurrent connection requires another thread (or even process) and each takes 10MB, then 10K concurrent connections starts at 100GB of memory overhead!

多线程模式下如果计算和I/O两方面都要侧重到, 事情就开始变得有趣了. 对于一定的工作负载, 能用来调节多线程模式的参数只有线程数. 上图中展示了不同线程数下的情况, 如果线程数过少, CPU就会闲置, I/O请求得不到有效处理; 如果线程数过多, 线程竞争就会加剧, 任务切换带来的开销会变大. 无论如何调节线程的数量, 多线程模式总要有所权衡, 也总有不如人意的地方.

此外多线程模式还有更严重的一个问题, 那就是I/O的负载越大, 系统就越容易崩溃. 因为I/O请求越多, 需要的线程就越多, 因而需要的内存也就越多, 这也是C10K问题的根源. 如果一个并发连接需要一个线程, 而每个线程需要用10MB的内存, 那10K的连接就需要占用100GB的内存.

The asynchronous event loop model

The immediate, dramatic benefit of embracing event loops is that this inherent trade-off goes away. We can actually hit the “ideal” point in that optimization curve above, and we don’t even have to work at it; there’s no tuning necessary.

This is possible because the event loop will aggressively do as much compute work as possible, dispatching as many I/O requests as it gets to. Multiple concurrent paused requests generally take less memory than whole threads, so this architecture is generally able to scale to an absurd degree. (The event loop model has obsoleted C10K as a real challenge, and some have proposed C10M as the replacement challenge. Three orders of magnitude difference.)

相比于多线程模型, 事件循环模式的显著优势是没有那种取舍权衡了, 而且不需要调节就可以达到理想状态.

之所以能够做到这一点, 是因为事件循环总在尽可能地做事(不阻塞), 一接到I/O请求就分发出去. 因为暂停中的并发请求占用的内存比起线程要少, 所以事件循环的并发规模比起多线程模式, 可以提高几个数量级. 事件循环本是用来解决C10K问题的, 但在某些情况下, 其规模能提高1000倍, 达到C10M.

The tuning that happens with this model isn’t so much to achieve higher performance, but to limit memory usage, if that proves to be the limiting factor. It is better to serve requests less aggressively than to crash from lack of memory. Better still to install more RAM in your servers, though.

One remarkable part of this new concurrency model is that it frequently manifested itself as a single-threaded model. Applications like Nginx and Node.js are largely single-threaded (at least in terms of compute). The task of fully utilizing the machine’s CPU cores was simply delegated to running multiple independent application processes, one per core. This is made easier with Linux kernel features that allowed multiple processes to listen on the same port, easily balancing incoming request workload between the application instances.

针对事件循环模式的调节不是为了提高运行的性能, 而是为了限制内存的使用量, 比起因内存不足而导致系统崩溃, 少处理一些请求也许更好, 当然如果能加内存就更好.

事件循环有个显著的特点, 就是通常是单线程的模式, 比如Nginx和Node.js都有一个巨大的线程(用来计算), 再加上多个独立的工作进程来充分利用CPU, 一核一个. Linux内核允许多个进程监听同一个端口, 利用这一点很容易就能做到负载均衡.

Asynchronous programs still have to deal with synchronous OS APIs, though. If we dig into libuv, what we find is that while it’s built around a single-threaded event loop, there’s also a background thread pool for being the sacrificial lambs calling synchronous APIs. These include file system APIs and DNS resolving (of all the things, the usual API for this is totally synchronous!)

异步程序依然要和同步的系统API打交道, 如果我们去深入了解libuv的话, 会发现libuv是用了一个单线程的事件循环, 加上用来执行同步API的后台线程池. 文件系统相关的, DNS解析等等这些都是同步的操作.

Can we abstract away from asynchony?

One of the classic alternative options to embracing event loops is to hide that event loop behind an abstraction layer. The idea is for the programming language runtime to take care of pumping the event loop, and instead let us continue to believe we’re executing in the synchronous single-threaded model, augmented by “green threads.” (My preferred name. These are Erlang process, or Go routines.)

Different approaches to green threads nevertheless all function in generally the same way: the programmer believes they’re writing synchronous sequential code using cheap “threads,” but secretly the language runtime is not actually blocking when a thread blocks. Much like await, the thread being executed gets put on hold, and the runtime switches to a different thread, selected from the event loop. In many ways, you can imagine “what if every function was async and await was just implicit?”

我们还可以增加一个抽象层, 然后把事件循环隐藏在抽象层后. 思路是让编程语言提供一个运行时, 然后由运行时不断地处理事件循环, 开发者则依然可以以同步串行的方式编写代码, 这也就是绿色线程(Erlan里的process, Go里的routines).

尽管绿色线程的有不同的实现方式, 但他们直接有相同之处, 即开发者觉得他们用的是轻量级线程, 写的代码是同步串行的, 但实际上运行时不会阻塞在本来应该阻塞的地方. 和await的方式一样, "阻塞"的任务被挂起, 然后运行时从事件循环里换一个任务执行. 我们可以把绿色线程和"所有方法都是async的, 所有await都是隐式的"的await方式相比.

In the early days of Linux, threads were initially implemented using an m:n model, where multiple “threads” were implemented per process. Likewise, in the early days of Java (1.x), threads were an in-process concept, like green threads.

These were eventually regarded as very bad decisions and reversed. I regard these as obviously bad ideas, if we have a clear mental model of parallelism vs concurrency. OS threads are a parallel programming model, they should be parallel. We were tricked into believing they’re about concurrency because we need them due to synchronous APIs.

But green threads, properly understood, are a concurrency programming model, not a parallel programming model. (They may or may not be parallel-capable, just as C# Task is, but Node Promise is not.) The semantics are different. The m:n threading model is a bad idea, but green threads can be a good idea. This is not a contradiction.

在Linux早期时代, 线程是按照m:n的模式实现的, 一个进程对应多个线程. 类似的, Java(1.x)的早期版本中, 线程也是进程内的概念, 就像绿色线程一样

最终这些决定被证明是糟糕的并且被推翻了, 如果我们对并发与并行的概念很清楚, 很容易可以看出这是有问题的. 因为系统线程是属于并行模型, 应该用来实现并行, 只是由于同步API的原因, 导致我们觉得线程用来并发的.

绿色线程, 则属于并发的模型(可以是C#里那样是并行的, 也可以是Node里一样不是并行的), 两者的语义是不同的. m:n的线程模型不是好主意, 而绿色线程是好主意.

There are a few drawbacks to this programming model:

It requires a heavier runtime. Rust ruled it out for this reason.

It easily lends itself to erroneous conflation with real threads. We all know what’s supposed to happen when you block, sure, but what happens when you enter a tight numerical loop for a few seconds? With green threads, you’re almost always co-operatively interrupted, which means this loop will in effect block execution of other tasks on this thread (which might be the only thread). This issue might be fine, or it might be a serious problem, or even a serious problem that’s subtly hidden until you reach a critical mass of “blocked” threads. This all depends on whether you’re using green threads for concurrency (ok) or believe they’re giving you parallelism (maybe uh oh).

Green threads are a leaky abstraction. Lots of system calls block. Many things like file I/O only present synchronous APIs, after all. But in practice, we get blocking behavior from even more sources. FFI calls, for instance, might internally make blocking system calls, so even if we try to wrap language-internal system calls cleverly, we can still run aground.

There’s no indication of blocking or non-blocking behavior. A function call is just a function call. It might hard-block in FFI, it might soft-block co-operatively, or it might just run to completion normally. To find out which… run it?

绿色线程这种编程模型有几个缺点:

需要有一个笨重的运行时, Rust也是因为这个原因放弃了绿色线程.
容易和系统线程错误地混淆起来. 当有阻塞操作的时候我们知道会会发生任务切换, 但如果有一个会执行数秒的循环操作呢? 由于绿色线程的调度通常是协作式的, 所以这意味着在这个线程上的其他任务都会被阻塞. 这个问题可大可小, 而且甚至在阻塞线程数到达临界值前都不会暴露出来. 如果需要的是并发, 也许问题不大, 但如果需要的是并行, 那就不一定了.
抽象层可能会有所遗漏. 系统调用, 像文件I/O相关的这类API我们知道是同步的, 但其他地方可能也有同步操作, 比如FFI调用内部有同步操作. 所以就算我们巧妙地把系统调用都包装了一遍, 依然可能会出现差错.
阻塞与非阻塞函数的区别不明显. 因为函数调用都是一样的写法, 但或许是FFI引起的硬阻塞, 或许是协作调度引起的软阻塞, 又或者是正常的执行过程, 这些情况不好区分, 难道要跑一下程序试试吗?

I believe this combination of drawbacks is what motivated the much more recent developments with async/await. (Promises, of course, go back a long time, too, although not entirely in their modern form…) In contrast to the above, what Rust appears to be getting (once the design process is finalized) with its version of explicit asynchronous programming:

A minimal and low-overhead runtime. In fact, likely even multiple runtime implementations. I haven’t investigated in depth yet, but I believe while the initial design efforts have focused on single-threaded concurrency, it may be a library-level change to support parallel work-stealing thread pools. The async programming model isn’t tightly tied to the implementation technique.

Since it’s clearly about asynchronous programming, and not easily confused with parallelism, you know what happens if you get into long-running compute loops. Nothing is hidden.

If a function is going to do I/O, then if it returns a result, it’s synchronous. If it returns a promise/future, it’s asynchronous. You can more easily understand your program just by reading it, instead of running it and investigating the result.

When faced with a synchronous API, it’s possible to build mechanisms (yourself or simply using a library) to use a sacrificial thread pool to “async-ize” the call.

我认为正是绿色线程的这些缺点推动了async/await(当然还有Promise)的发展, 与上面不同, Rust采用了显式的异步编程模型, 因而具有一下特点:

小而轻的运行时, 甚至可能有多个运行时实现. 虽然我没有仔细调研过, 但我相信一开始的设计侧重于单线程并发, 然后由库提供并行的多线程任务窃取型模式支持.
由于异步的代码需要显式声明, 并且不容易和并行混淆, 所以对于长时间执行的情况很清楚就可以知道发生了什么.
如果一个函数需要做I/O操作, 而且返回了一个结果, 则这个函数是同步的, 如果返回的是promise或者future, 那这个函数则是异步的. 同步还是异步, 看到代码就可以区分了, 而不用运行程序来区别.
当遇到同步API的时候, 可以通过线程池的方式把API包装成异步的形式.

In the end, I think the async/await/promise approach gets us the same benefits as green threads, with a different mix of drawbacks that I think comes out in favor of the explicit asynchronous model. Pedagogically, I’m also in favor of actually exposing programmers to asynchrony. The confusion that arises from a wholly synchronous world-view is real. Especially since everything is actually asynchronous.

我觉得async/await/promise的方式有绿色线程的那些优点, 也有一些显式异步所有的缺点. 从教育的角度看, 我希望程序员可以真正的接触到异步, 完全同步的世界观带来了很多混乱, 所有的东西都是异步的.

End notes

Coming up next week are my thoughts on “function coloring,” and then I might be done with concurrency for the moment. If you want my thoughts on a specific topic here, ask away! Patreon or Twitter or messenger pigeon, whatever floats your boat.

-> 如果文章有不足之处或者有改进的建议，可以在这边告诉我，也可以发送给我的邮箱

译文: 并发模型

The synchronous threaded model

The asynchronous event loop model

Can we abstract away from asynchony?

End notes