Assembly tasks may very much vary in complexity depending on application specifics. Here the task is found moderately complex, as it implies about 10/sup 60/ possible states on the perception side, each involving about 30000 bit of action data. The paper presents two systems conceived in our lab to perform the task. In one case, perception is achieved in a classic sense, by video camera and original vision software; and action is induced by a robot. In the second case, input space is reduced from 10/sup 60/ possible states to 1 (one) by purely mechanical, a priori designed, task oriented devices. Comparison is made of both approaches, in a discussion where we refer to coercive, adaptive and cognitive paradigms.