There is a need for semantic representations that can bridge the gap between linguistic inputs verbs and their corresponding visual knowledge which are indispensable in performing a variety of tasks involving the automatic generation of 3D animation. The semantic representation of events in visual knowledge and the design of a suitable knowledge base specifically for the integration of linguistic and visual information are discussed here. We describe a framework used to represent action verb semantics in a visual knowledge base. Visually observed events are described by establishing a correspondence between verbs and the visual depictions they evoke. The method proposed here is well-suited to practical applications such as automatic language visualisation applications and intelligent storytelling systems. In particular, it will be useful within CONFUCIUS, a system which receives input natural language stories and presents them with 3Danimation, speech, and non-speech audio.