Thread Local: A Convenient Abomination.
Posted by Uncle Bob on 09/04/2007
ThreadLocal variables are a wonderfully convenient way to associate data with a given thread. Indeed, frameworks like Hibernate take advantage of this to hold session information. However, the practice depends upon assumption that a thread is equivalent to a unit-of-work. This is a faulty assumption
Thirteen years ago, while working on my first book, Jim Coplien and I were having a debate on the nature of threads and objects. He made a clarifying statement that has stuck with me since. He said: “An object is an abstraction of function. A thread is an abstraction of schedule.”
It has become the norm, in Java applications, to assume that there is a one-to-one correspondence between a thread, and a unit-of-work. This appears to make sense since every Servlet request has it’s own particular thread. Framework authors have built on this assumption by putting unit-of-work related information (e.g. session) into ThreadLocal variables.
Clearly, the more ThreadLocal variables that hold unit-of-work related information, the more that the thread and the unit-of-work are related. While very convenient, the basic assumption is dead wrong.
A unit-of-work is a task to be performed. There is no rule that says that this task must be single threaded. Nor is there any rule that says that the components of the task must run at the same priority. Indeed, it is not uncommon for a task to have a high-priority event driven part, and a low priority compute-bound part.
Consider, for example, a unit-of-work that makes a request from a third-party-service, and then processes the response with a complex number-crunching algorithm. We want the request to run at a high priority so that it is not waiting for other low priority compute-bound tasks to finish. On the other hand, we want the number-cruncher to run at a low priority so that it yields the processor as soon as possible whenever one of the high priority requests needs a few cycles.
Clearly these two tasks form a complete unit-of-work, but just as clearly they should run as two different threads—probably with a queue in between them in a standard producer-consumer model.
Where are the unit-of-work related variables? They can’t be kept in a ThreadLocal since each part of the task runs in a separate thread. They can’t be kept in static variables since there is more than one thread. The answer is that they have to be passed around between the threads as function arguments on the stack, and recorded in the data structured placed on the queue.
So, though convenient, ThreadLocal variables confuse the issue of separating function from schedule. They tempt us to couple function and schedule together. This is unfortunate since the correspondence of function and schedule is weak and accidental.
What we’d really like is to be able to create UnitOfWorkLocal variables.
Esko Luontola about 5 hours later:
Do you know of any programs or frameworks which have “UnitOfWorkLocal variables”?
Marc about 19 hours later:
Surely ThreadLocal storage of an unit of work probably isn’t the best choice. But it’s at least totally valid, where you know a unit of work doesn’t span multiple threads. With this assumption in mind, holding the unit of work in a thread local variable spares you from passing around the unit of work to everyone who needs it.
I think what really can be precluded in most cases is, that multiple units of work are used in the same thread at the same time (there just may be nested units of work). That means, that nothing keeps you from letting threads “inherit” a unit of work from other threads. So thread local storage definitly isn’t generally wrong, is it?
Roland Kaufmann 1 day later:
Since actually you tought us a long time ago to use processes and not necessarily data as the model for our objects, then I guess that another name of UnitOfWorkLocal variables are “members” :-)
Sebastian Kübeck 4 days later:
Corba offers two ways of context (=session) propagation: Thread bound context propagation and custom context propagation. The latter means that it’s up to you passing a context or session object around or have a ThreadLocal do the job. I think that the user of a library should be given the choice. However, not using ThreadLocalS in the library code doesn’t hinder the user to implement it anyway. Forcing the user to deal with ThreadLocalS that linger around behind the scenes is definetly a smell (similar to mutable static members, aka globals in the library’s back yard).
From the desig standpoint you are perfectly right not to recommend ThreadLocalS. They rise a lot of problems such as testability issues and make code less understandable.
From a pragmatic standpoint I have to admit that I used them successfully to overcome shortcommings in third party APIs. They also helped me to reestablish transaction safety in lousy legacy applications.
My personal recommendation: As long as there is no urgent need, avoid them.
Martin Vilcans 6 days later:
I like your idea of a UnitOfWorkLocal variables. They could be implemented as something similar to ThreadLocals but they should be automatically inherited by subthreads and somehow a worker thread should be able to attach to a UnitOfWorkLocal variable.
Code-wise I see something like this (invented in approximately 1 minute, so it’s not very thought out):
// Declare UnitOfWorkLocal variable called mySession of type Session @UnitOfWorkLocal Session mySession; class ThisAndthat { void execute() { unit_of_work(mySession) { // mySession can now be used: mySession.doSomething(); otherFunction(); } // The following is an error since mySession is used // outside the unit_of_work block! mySession.doSomething(); } void otherFunction() { // mySession is accessible here too // if it's called from within a unit_of_work block. mySession.doSomethingElse(); } }
This looks like it should be simple to implement with a stack of ThreadLocals (except that if we assume that the above is Java, it would require language extensions), but I guess I just haven’t thought about it enough to see the devil in the details. What is nice about the idea is that it seems like a alternative to ThreadLocals, singletons and globals that maps better to the problem domain.
Or perhaps I’m just jetlagged… :-)
Dennis Winter 6 days later:
Im a german Student and my english isnt that good. Could be that i am aiming at a wrong spot, but as far as i understand the issue, i would agree with Bob.
Of course it is a lot of more work seperating data from functions. And i dont really have a clue what difficulties can occur but in my naive student being i would say the following:
Considering the fact, that multithreading becomes even more important than in the past, this could be a very interesting way to accelerate applications. I could imagine using data and functions seperately to balance the load that a thread must work on. Depending on how the implementation of hardware is, a multicore processer could share the work in differnt amount of pieces. Doing this by each function working parallel on that data, all tasks are divided into the maximum count of processors the hardware can offer or functions the application need to be done.
My first entry
dmitry 6 days later:
@Martin, my immediate thought was the same: just create subthreads that will inherit the thread-local values. However, I think, this approach will not work thread-pooled environments (e.g. in application servers).
Will ThreadGroups help somehow? Can a variables be stored in ThreadGroup objects. Perhaps, we should start pooling threads at the group level instead?
Martin Vilcans 7 days later:
The thread pool problem is what I meant with “a worker thread should be able to attach to a UnitOfWorkLocal variable”. In general, the solution seems simple. Just create a map that maps threads to units of work, like map<Thread, UnitOfWork> dataMap. The problem is just that the code to access a UnitOfWorkLocal variable will be a bit convoluted. I guess the Ruby folks would happily implement something like this transparently, but in Java the code to call a function on a UnitOfWorkLocal would be along the lines of:
dataMap.get(Thread.currentThread()).getSession().setFoo(1) (where the Session is the UnitOfWorkLocal)
instead of just
session.setFoo(1) (if session was a static variable).
Implementation details aside, I think UnitOfWorkLocal is an interesting idiom. One could think of it as static variables for a unit of work instead of for the whole process. Or one could think of it as ThreadLocals for a unit of work instead of for just one thread.
Jeff Brown 11 days later:
A UnitOfWorkLocal variable is close enough to a session variable for most practical purposes. You just need to ensure that the session gets passed around across threads, asynchronous operations, callbacks, and other such things.
There are mechanisms for doing this. In .Net you can store session variables in the ExecutionContext. Queued work items on the thread pool acquire the execution context of the caller. You can also explicitly enter an execution context for a delimited region of a thread’s lifetime. You can use that to ensure that session data remains in scope across boundary transitions. In fact it can go a good deal further than that.
Many web frameworks provide a simpler mechanism in the form of an HttpContext for tracking the current session. You can often leverage the fact that the framework is already doing the work of passing this object around. That’s often enough to provide robust storage and scoping for UnitOfWorkLocals (and you can adopt more complex mechanisms when necessary).
Matteo Vaccari 12 days later:
I think it’s a bad idea for people writing business logic for business applications to dabble in threads. Problems of concurrency are best left to the database; finding concurrency primitives in business logic is imho a bad smell.
So there’s a good reason for assuming thread==unit of work. It makes business applications simpler… and more robust.
Davide Baroncelli 13 days later:
“It has become the norm, in Java applications, to assume that there is a one-to-one correspondence between a thread, and a unit-of-work.”
I wonder where you are deducing this from: it’s wrong. In my view, “it has become the norm, in java applications” to assume that resources must come from a “scope”: this is what frameworks such as Spring and Seam induce the programmers to think. A “scope”, then, may be anything convenient for your architecture, ranging from the more or less extreme approaches of a “singleton” and a “prototype” (in Spring terms).
A “thread local” scope is a possibility, such as a “web session” scope, or a “flow scope”: this, is what is actually becoming “the norm”, and it’s not far from being able to declare a “unitofwork” variable.
Phil 13 days later:
It may be an abomination to always use a ThreadLocal as a UoW “session” variable, but if you design an application based on the fact that evry UoW will be conducted inside its own Thread, then you can thank the JVM devs for providing the feature.
In other words, ThreadLocal is not an abomination per-say, its how people use it that make it an abomination… Isn’t it?
ccs@embrace-software.co.uk 2 months later:
Drop the unit of work with a session token onto a queue and then when the UoW is excuted create a session from the token details and then use InheritableThreadLocal instances to store details…when execution finishes you can then store the result onto a result queue to use later or return to the user!?
That said…Thread locals cause problems with NIO since you need to do some sort of context switching with the UoW data…
JRocket was a good implementation of a JVM until they removed the pseudo threading code…basically one thread could be made to act like multiple threads because at the end of the day a JVM is a virtual machine so you can have virtual threads meaning less context switching!? Going off topic abit but the reason people want to move away from ThreadLocals is that representing UoW with a link to threads will, at some point, slow down the number of user requests per second…moving away from thread locals will, at some point, make your code run quicker once true NIO solutions come into play…
Its “convenient” to use InheritableThreadLocals all said and done but it may not be the best scalable solution, however hardware is becoming cheaper and clustering is another solution to the number of requests per second problem!?
PDF to epub Converter over 3 years later:
InheritableThreadLocals all said and done but it may not be the best scalable solution, however hardware is becoming cheaper and clustering is another s
moncler over 3 years later:
Clearly, the more ThreadLocal variables that hold unit-of-work related information, the more that the thread and the unit-of-work are related. While very convenient, the basic assumption is dead wrong.