Friday, 29 October 2010

Dusting the Sandbox #2: Mocking oldstyle lookup on newstyle classes.

“Dusting the Sandbox” is a series of blogposts where I pick scripts or modules of my sandbox and explain the process of creation and my motives behind it. The Sandbox can be found at Github.



With Python 2.7 being the last non-bugfix release in the 2.x branch of Python, attention is expected to gradually shift to the new 3.x branch. Albeit onloved by a part of the developer community, the discontinuation of implementing new features into the 2.x branch is the most effective way of forcing developers to begin to embrace the new backwards-incompatible version. The one major issue with Python 3.x is that many old, large libraries do not want or cannot, due to lack of workpower, port their code to support the latest release.

If there are any faithful readers of my blog, and you are one of them, you may just experience a déjà vu. Yes - I have already written a post with roughly the same content as above, but: this time, I present a solution that I think is more clever.

These facts have inspired me to once again think of a way to get the old lookup behaviour with Python 3.x, where there are no more oldstyle classes. The piece of code I provided you with in my earlier post led to some runtime overhead for each call of a magic method, whereas this, radically different, approach only implies overhead each time __setattr__ is called (but it does cause a bit higher memory usage). Please again bear in mind that it is not recommended to use the code provided here in production code.

It all started with someone showing me a snippet that did some thing, which I do not remember, but it certainly was clever thing, by replacing the __class__ member of an instance. I instantly thought that I may exploit this to fake old-style behaviour by creating a new class programmatically each time the application developer overrides a magic method. By creating a new class, the new method is in the class struct and thus is consistently accessed, be it from directly Python code or from C code that accesses it in said struct.

The trick is to create a new class using type and replacing the current __class__ member of the instance each time __setattr__ is called with a magic name. The new class is created by the following code, with attr being the function a magic function is being overridden with.
# Licensed under the terms of the X11-License.
d = dict(self.__class__.__dict__)
d.update({name: staticmethod(attr)})
ncls = type(
self.__class__.__name__ + 'Magic', self.__class__.__bases__, d
)

The static method is necessary because when replacing a method of an instance, the new function is not bound to the object, contrary to a normal function in a class construction that is automatically bound to the object unless it's a staticmethod; it does also work with attributes, because staticmethod makes the argument passed to it the respective instance-member. The next logical step is to write a function determining whether a method is magic or not, it turned out to be the following (explanation about the forbidden set are given in the next paragraph):
# Licensed under the terms of the X11-License.
forbidden = set(('__setattr__', '__class__'))

def is_magic(name):
return (
name not in forbidden and name.startswith('__') and name.endswith('__')
)

The mechanism needing to replace __setattr__, it needs to be treated specially. The same is true for __class__ because that would have us hit infinite recursion (plus __class__ is no method anyway and thus needs not be wrapped).

Now all we need to do is handle the case when we do not handle a "magic" case. If __setattr__ was replaced earlier, use the member of the instance, if it is not, use the __setattr__ method provided by the parent class.
# Licensed under the terms of the X11-License.
import inspect
if (not inspect.ismethod(self.__setattr__) or
self.__setattr__.im_func != oldstyle):
# Costum setattr is stored in the instance so that this method is
# never overriden. Hence __setattr__ is not part of the global
# magic set.
self.__setattr__(name, attr)
else:
super(self.__class__, self).__setattr__(self, name, attr)


Now we combine these two cases (the one where we treat a magic member and the one where we do not) into one function and end up with the final code. Old style behaviour is applied to a class by overriding the __setattr__ method with the oldstyle function provided below.
# Licensed under the terms of the X11-License.
import inspect

forbidden = set(('__setattr__', '__class__'))

def is_magic(name):
return (
name not in forbidden and name.startswith('__') and name.endswith('__')
)


def oldstyle(self, name, attr):
if is_magic(name):
d = dict(self.__class__.__dict__)
d.update({name: staticmethod(attr)})
ncls = type(
self.__class__.__name__ + 'Magic', self.__class__.__bases__, d)

self.__class__ = ncls
else:

if (not inspect.ismethod(self.__setattr__) or
self.__setattr__.im_func != oldstyle):
# Costum setattr is stored in the instance so that this method is
# never overriden. Hence __setattr__ is not part of the global
# magic set.
self.__setattr__(name, attr)
else:
super(self.__class__, self).__setattr__(self, name, attr)


Now we can easily verify that it works by comparing the behaviour of a normal new-style class and of one with oldstyle as setattr member. Therefore, we create really simple class that allows us to verify whether the member has been replaced or not.
class A(object):
__setattr__ = oldstyle

def __int__(self):
return 2

Then we create an instance thereof, convert it to an integer by using the builtin int, replace the member to return any other number and then verify that this number is returned upon passing the object to int.
a = A()
assert int(a) == 2
a.__int__ = lambda: 5
assert int(a) == 5

Removing the "__setattr__ = oldstyle" declaration we can see that normal new-style classes do not support replacing of magic methods in the instance.

Saturday, 16 October 2010

Dusting the Sandbox #1: Get nth digit and other funny, but useless, solutions to problems no one will ever face

“Dusting the Sandbox” is a series of blogposts where I pick scripts or modules of my sandbox and explain the process of creation and my motives behind it. The Sandbox can be found at Github.


Those who are too impatient to read about the creation of the code can skip directly to the complete code.


The other day I've gotten some inspiration on what to code from the #python.de channel on irc.freenode.net, namely a program that gets the nth digit of a number (it later turned out that we were talking about getting one particular digit of small numbers in the channel, which does not require a sophisticated algorithm but rather yields the best results (in terms of performance) when doing the naive method, which is:

str(nmb)[n]

The solution that yields results for large numbers in a reasonable time is to divide the number by 10log(x) - n.



This may be quite nice, but there is a lot of space for improvement. The first thing that came into my mind was allowing the user to specify how many digits he wants to access (that is, how many of the digits that are in a higher decimal place should be returned). That is easily done by raising the 10 after the modulo operator by the power of the number of digits the user wants to access.

What turned out as a valuable trick throughout the time I have coded so far is to, once a program produces the desired result, take a step back and see how it could be adapted so it can applied to a super-set of problems. The main instrument for doing is, for myself, is to scan the source-code for any constants that are unnecessary and just specialize the problem without adding value. The only constant that can be found in the snipped shown above is 10 (two direct references and one subtle one in `log10`); by replacing all of them with a variable, we can effectively adapt the snippet to work with arbitrary number systems.



Now non-negative indices are wholly implemented; the next step is self-evident: implementing negative indices. This is even easier because we need not know how long the number is (that is what the logarithm is for when accessing via positive indices), we just need to divide by s|n| - 1. The -1 is to compensate for the fact that negative indices start at -1 rather than 0, and because the number does not need to be divided at all (i.e. divided by 1) to access the last digit, we need to subtract 1 from the absolute value of the index.



The next logical thing to implement is slicing. Slicing is a bit more sophisticated, because it involves two indices which might be of a different type in terms of negative or non-negative indices. The way to easily implement this is use the digits parameter of nth we have defined before but not yet used. The two indices need to be processed to one index (which is the larger one, being stop - 1 regardless of the algebraic sign of the index in question [the item with the index given in stop is not included in the slice, just as it isn't with lists, hence stop -1]). With all input sanitizing, that handles the case where some parameters are not given and the case where one index is negative while the other is not, included, we end up with the following code; because there is no way to return an “empty int” (which would be the obvious approach when modeling after the list API which returns an empty list in these cases) when there are no digits between the two indices (which happens if start is greater or equal to stop), we need to throw an expection in this case:



So the final code is:



A funny (but not at all useful, like the rest of the code shown here) addition is to write a wrapper class that supplies __getitem__ on int objects. The gotcha is that you need to inherit from long, rather than from int, and that you must not call long's __init__ in yours, lest you face a deprecation warning.



That concludes this post, I hope you enjoyed it. And yes: I have tried implementing slicing with steps, but my implementation was heavily outperformed by the naive str and slice way (which does not work for all number systems, though).

asynchia 0.1.1

asynchia 0.1.1 was released yesterday, featuring a host of bugfixes for critical bugs that prevented the library from working in corner-cases and a few feature additions. I have tried to put work into optimizing existing operations that wasted a lot of resources, e.g. by not pre-allocating the amount of memory required for the buffer. Python's bytearrays are used for that purpose, in 2.7 or later memoryview and recv_into is used, but I need to do some more benchmarking to see whether this results in a significant speed-up or not, the data I have yet collected is not unambiguous about that matter.

Check asynchia out at Github