Answers to some IOCP questions
A reader of my IOCP performance post recently sent me some interesting questions. I actually found them really, really interesting and thought I'd share my answers here instead of just replying to his email.
Q: I understand that at any given point there will be N number of threads associated with an IO Completion Port. Where N is number of Proccessors. If I call GetAvailbaleThreads it returns 25 for Worker Threads and 1000 IO threads. My question is why do we have such a high value for IO Threads as compared to worker threads, when Completion Port can only perform tasks in a sequential order in a queue.
A: You should not get confused with the numbers returned by GetAvailableThreads and the actual number of active IO worker threads. Numbers 25 and 1000 are merely tweaks made in the runtime. These numbers just represent the max number of non IO and IO worker threads that thread pool can spin up concurrently. The theory behind these numbers goes like this. Normal thread pool worker threads are not capable of preserving one thread per core per execution quantum invariant. Therefore if you queue 10 presumably long running items to the thread pool, all ten items will spin a new thread will be active in the thread pool. This also means if you queue CPU intensive work items, you will end-up with poor performance due to high amount of context switching etc. That's why runtime try to mitigate this by setting max number of threads in the thread pool to 25 by default. On the other hand, if you have 25 long running IO operations, you are out of luck unless you change the defaults. Your CPU will still be idle while bunch of threads in the thread pool are waiting for IO.
IO worker threads on the other hand make sure that there are only N number of threads active at any given time (thus preserve the one thread per core per execution quantum invariant). Where N is the number of processors/cores (It might be slightly above N in most circumstances as I explained in the original post, but that's absolutely negligible). So the thread pool can actually spin up as much as threads (1000 by default) if necessary to wait on the IOCP for the next work item. So let's say you queue 25 work items to IO worker threads to wait on some long running IO and then you queue another CPU intensive work. You still get your CPU intensive work done because all 25 IO worker threads are waiting for IO thus making the CPU idle.
Q: My understanding is that when a request is sent to Completion Port , it will finish that task first then move on to the next task. How would this scale if I make two request to download a 700mb files and third request to download 7mb file. Will the third request wait unless request 1 and 2 is finished? I hope not but my understanding is it will.
A: I think my answer to the first question explains this as well. Key thing here is, if you do some blocking work in an IO worker thread, Windows scheduler notifies the IOCP that one active thread went to inactive state thereby making the next thread waiting on the IOCP to go and pick up the next available work item. So if we put this to the sample context explained in the question, I would say, you can make IO worker threads download all 700MB, 700MB and 7MB files and whichever finishes first will probably finish its work first.
Q: How do we use Overlapped Object. I see we can specify offsets, and event handler but where do we specify the IO Handle based on which the operation should be made?
A: All this time I was talking about using IOCP to efficiently use CPU as opposed to how you would do this with regular thread pool work threads. In other words it can be thought as an efficient scheduling method. However, if you want to do asynchronous IO, you can just use the async operations in the .NET IO API (i.e. BeginXxx and EndXxx methods). Almost all of them use IOCP behind the covers. However, if you are still fiddling your keyboard to get your C program up and running, you can use CreateIoCompletionPort API and pass your IO handle to its first parameter
. Then you can use IO API (ReadFileEx for example) to pass the necessary async IO information using OVERLAPPED structure.
HANDLE CreateIoCompletionPort(
HANDLE FileHandle,
HANDLE ExistingCompletionPort,
ULONG_PTR CompletionKey,
DWORD NumberOfConcurrentThreads
);
Q: Is IOCP is just maintaining a queue and after it deques it calls the return function with offsets to perform the operation?
A: The underlying Windows API actually does not use a function pointer per se. It's implemented in .NET framework thread pool (AFAIK). The original API uses a key (ULONG_PTR) based mechanism to correlate the work items when then return from IOCP.