Clumpy Interface Is Changing

Posted at 10:50 pm on April 14th, 2013

Filed in:

A while back, a bug report helped me notice a design flaw in Clumpy. I’ve addressed the problem, but it involved changing how Clumpy’s for_in_loop method should be called.

// Here's how it used to work:
clumpy.for_in_loop(
    object,
    function (key) {
        // statements
    }
);

// Here's how it works now:
clumpy.for_in_loop(
    function () { return object; },
    function (key) {
        // statements
    }
);

Why

The potential problem with the old way is that Clumpy uses the keys from the object that was passed at the time when the loop was enqueued. If object gets reassigned during asynchronous Clumpy execution, the loop still has the old value:

// let's say "object" starts out describing foods
object = {
    'name': 'grape',
    'type': 'fruit'
};

clumpy.once(function () {
    // but somewhere along the line, we re-assign the variable
    // with a different type of object with different properties
    object = {
        'name': 'Thomas Peri',
        'url': 'http://www.tumuski.com/'
    };
}).
for_in_loop(
    object,
    function (key) {
        alert(key + ': ' + object[key]);
    }
);

You get two alerts, and the first is “name: Thomas Peri” as you’d expect. However, the second is not “url: http://www.tumuski.com/”. Instead, it’s “type: undefined”, because even though Clumpy is reading property values from the current object, it’s reading the property keys from the object the variable held when the method was called. This is not the intended behavior of Clumpy.

The solution, available in the latest version of Clumpy.js, is to pass a function that returns object, instead of passing object itself:

// let's say "object" starts out describing foods
object = {
    'name': 'grape',
    'type': 'fruit'
};

clumpy.once(function () {
    // but somewhere along the line, we re-assign the variable
    // with a different type of object with different properties
    object = {
        'name': 'Thomas Peri',
        'url': 'http://www.tumuski.com/'
    };
}).
for_in_loop(
    function () { return object; },
    function (key) {
        alert(key + ': ' + object[key]);
    }
);

This way, Clumpy gets the object immediately before iterating through its properties, and the program displays the intended information.

Old Behavior Deprecated

You don’t have to change all your old code just yet. If it’s working for you the old way, the updated Clumpy will keep working for now. If it receives an object that isn’t a function, it uses the property names from that object. If it receives a function (as it should), then it until the loop is about to run, before calling the function to retrieve the object and its property names.

However, this is temporary. I plan to remove the old incorrect behavior in a future version.

Legacy Option

“But Thomas,” you may plead, “this breaks my code! I had my program iterating over a function object, and now that functions don’t get used as objects, everything is awful!”

Not to worry, I’m sensitive to your plight. When you create your Clumpy instance, set the legacy_for_in option, and that instance will behave the old way:

var clumpy = new Clumpy({
    'legacy_for_in': true
}),

This is deprecated too, though, so please regard it as a temporary measure and don’t write any new code that relies on this option. The proper way to pass a function as the object to iterate over is to wrap it inside another function as if it were any other object, like so:

myFunction = function () {
    // a function with properties!
};
myFunction.color = 'yellow';
myFunction.cereal = 'crunchy';

clumpy.for_in_loop(
    function () { return myFunction; },
    function (key) {
        if (myFunction.hasOwnProperty(key)) {
            alert(key + ': ' + myFunction[key]);
        }
    }
);

4 Responses

  1. Bobby Street said:

    The changes you made will not fix your design flaw. There is no inherent flaw in your design per-se. It is relative to the fact that asynchronous calls (though not really threads) act as virtual threads of execution.

    Having said that, your solution to "return the object at the moment the for loop begins executing" does not stop another asynchronous method (i.e. virtual thread) from modifying the object after you call the "return object" function and before beginning (or in the middle of) the for loop.

    The ONLY way to actually solve this problem is to implement some form of mutual exclusion: semaphore, monitor, command queue. I believe a Command Pattern style object would probably best fix this issue. Where each Command (i.e. critical section of your code) is in a waiting list of sorts. They each try to execute their critical section and one of them wins. The others have to wait until there is a free time. No amount of return before, during, or after will fix this issue without mutual exclusion and critical sections being implemented or faked somewhere in javascript.

    Now, another approach would be to purposefully act on the data trasnfered AT THE TIME OF THE LOOP CALL. Thus, instead of the method being "return objects" it should be changed to "return objects.slice(0)" and thus the variable in the loop will be a local, unmodifiable copy of the old data. However, this would need slice() to be an atomic or critical code operation which I do not believe it is unless implemented as such "under the hood" of a particular JS parser (i.e. I don't believe the JS spec require slice to be atomic or mutually exclusive). Rather than depending on a proper slice() operation being implemented inside every browser's js engine, it would still be best to rely on a good Mutex/Command type of logic using timers and events.

    Hope this makes sense and helps you along the way.

    Bobby

    April 29th, 2013 at 8:47 am
  2. You've misunderstood the problem.

    It's true that if other code running simultaneously has access to the same variable used in a Clumpy loop, then it could still change its value between the time the object's properties are read, and when they were used. Clumpy, only provides asynchronous execution, though, and makes no attempt at solving the separate problem you describe. That is left to the user to solve as they see fit. If you were to develop a Command Pattern library that used Clumpy, I'd be honored.

    However, you are wrong when you say: "There is no inherent flaw in your design per-se. It is relative to the fact that asynchronous calls (though not really threads) act as virtual threads of execution." That's not true.

    The flaw in the original design of the for_in_loop method was that it did things out of order. It kept a reference to the value of the variable when the method created the data structure representing the loop, instead of waiting until that data structure was acted upon. This stood to cause problems even when no other code was executed simultaneously, as you can see in the example I provided. The new design fixes that flaw.

    Therefore, a Command Pattern approach alone would not fix that flaw. Even if outside access were blocked, the old-style for_in_loop method would still pose the same potential problem that the new design fixes.

    April 29th, 2013 at 9:35 am
  3. Bobby Street said:

    I think I do see what you mean, and there is every possiblity that I misunderstood the underlying problem.

    Perhaps I am misunderstand how Clumpy works underneath. If two asynchronous Clumpy calls are initiated and each interracts with the same object then won't there still be room for the "return object" method's object to be modified by another clumpy call? Or do all Clumpy calls, though asynchronous, execute in order.

    Example:

    //same object
    object = {
        'name': 'grape',
        'type': 'fruit'
    };
    
    //generate 1000 asynchronous Clumpy for_in_loop methods.
    //obviously rediculous and this code would not itself
    //exist in practice.  But, a similar circumstance could
    //arise accidentally with a large enough webiste.
    //Half of the loops switch the object to type A,
    //the other half to type B.
    for(int i = 0; i < 1000;i++)
    {
        clumpy.once(function () {
            object = (i%2) == 0 ? 
                {
                'name': 'Thomas Peri',
                'url': 'http://www.tumuski.com/'
                } :
                {
                   'name': 'grape',
                   'type': 'fruit'
                };
        }).
        for_in_loop(
            function () { return object; },
            function (key) {
                alert(key + ': ' + object[key]);
            }
        );
    }

    Obviously, no one would really write the above specific code intentionally. However, something similar could happen in real life given a large enough site. If Clumpy executes the asynchrnous methods in-order from a queue there is no issue. If Clumpy executes multiple methods in parallel or ignores the order they are enqueued, the same problem could arise with the above code.

    I see what you are trying to fix: someone modifying the object inside the direct call to construct the for loop. But it seems like the same issue could happen anywhere in the code even just calling multiple Clumpy loops and yours is just a special case that you can handle without user intervention. It would be interesting to make each of the for_in_loops critical sections to solve the issue. I do like the concepts of Clumpy and wiring up a virtual multi-thread library out of it could be interesting. It may be interesting to talk with you more about it.

    April 29th, 2013 at 12:30 pm
  4. Ah, yes, I think that's the source of the confusion. As long as you're only using one instance of Clumpy, there's just one queue, and any loops or onces called will just get added at the end of the queue.[1] Your code above (though I haven't run it) will execute all the loop iterations in order, not in competition with one another.

    Chaining each method on the result of the last is the same as calling each method explicitly on your Clumpy instance, as seen here:
    http://www.tumuski.com/code/clumpy/usage/2/

    That example shows two ways of calling three loops in succession. Both ways work by enqueueing them all synchronously, and then working through them sequentially, not starting them all at the same instant. So it works the same way even if you enqueue them one after another inside a real loop.[2]

    (I started work a while ago documenting the implementation, which went into more detail than the comments in the code could. I don't think I ever posted that documentation though. I'll have to dig it up and decide whether it's ready for presentation yet.)

    About multi-threading: Each Clumpy instance basically emulates a thread. So to emulate multiple threads, you would use multiple instances of Clumpy. But doing so, of course, introduces the same complexities and pitfalls of actual multithreading.

    Notes:

    [1] Technically, it's implemented as a stack of queues rather than a single queue, so that you can nest Clumpy loops. But it operates as a single queue into which more nodes can be inserted on the fly. Each Clumpy instance promises to do things in the order in which it received them.

    [2] Using a real loop outside a Clumpy loop is generally a bad idea though, because it queues up 2000 nodes first, and then starts working through them. Using a Clumpy loop on the outside would probably save a little memory by allowing the two nodes inside to enqueue and dequeue each time the enqueueing methods are called.

    April 29th, 2013 at 2:43 pm

Leave a Comment

  • Formatting
    • No HTML. Any code you enter will display as that code.
    • If you are putting code in your reply in order to present the code itself, you can use these special HTML comments for formatting:
      Inline: <!--code-->...<!--/code-->
      Block: <!--pre-->...<!--/pre-->

© Thomas Peri