The language Of Film

When you're writing something — an email, an essay, a report, a paper, etc, you're using the rules of grammar to put forth your point. Your choice of words, the way you construct the sentence, correct use of punctuation, and most importantly, what you have to say, all contribute towards the effectiveness of your message.

Cinema is about how ideas and emotions are expressed through a visual form. It's a visual language, and just like any written language, your choice of words (what you put in the shot/frame), the way you construct the sentence (the sequence of shots), correct use of punctuation (editing & continuity) and what you have to say (the story) are key factors of creating effective cinema. The comparison doesn't apply rigidly, but is a good starting point to start thinking about cinema as a language.


The most basic element of this language is a shot. There's many factors to consider while filming a shot — how big should the subject be, should the camera be placed above or below the subject, how long should the shot be, should the camera remain still or move with the subject, and if it's moving, how should it move? Should it follow the subject, observe it from a certain point while turning right/left or up/down and should the movement be smooth or jerky. There are other major visual factors, such as color and lighting, but we'll restrict our scope to these factors only. A filmmaker chooses how to construct a shot based on what he/she wants to convey, and then juxtaposes them effectively to drive home the message.

Breaking Down A Film

The decision behind the different elements of a shot: shot scale (Long Shot, Wide Shot, etc), camera movement, camera angles, length of the shot, shot composition, color, and lighting are based on the message that the filmmaker wishes to convey. These shots are then juxtaposed meaningfully to convey a coherent visual story.

The analysis done above is far from comprehensive, but (hopefully) shed some light on how intricate creating effective cinema can be. The curious reader may want to dig deeper and look at how other factors such as composition,editing,color impact visual storytelling.

Breaking down this scene took a few hours of work. At the cost of being repetitive, this is where neural networks offer ample promise. With smart algorithms finding patterns like the ones shown above in a matter of seconds, your frame of reference could no longer be restricted to what you or your colleagues have watched, but all of cinema itself.

Ai In Film

What follows is a gentle introduction to neural networks, followed by a description of the dataset, methodology, and results.

'AI' is most often a buzzword for deep learning, the field that uses neural networks to learn from data.

The key idea is that instead of explicitly specifying patterns to look for, you specify the rules for the neural network to autonomously detect patterns from data. The data could be something structured, like a database of customers' purchasing decisions, or something unstructured, like images, audio clips, medical scans, or video. Neural networks are good at tasks like predicting a customer's desired products, differentiating the image of a dog and a cat, the mating calls of dolphins and whales, a video of a goal being scored vs. the goalkeeper saving the day, or whether a tumor is benign or malignant.

With a large enough labelled dataset (say 1000 images of dogs and cats stored separately), you could use a neural network to learn patterns from these images. The network puts the image through a pile of computation, and spits out two probabilities: P(cat) and P(dog). You calculate how wrong the network was using a loss function, then use calculus (chain rule) to tweak this pile of computation to produce a lower loss (a more correct output). Neural networks are nothing but a sophisticated mechanism of optimising this function.

If the network's output is far off from the truth, the loss is larger, and so the tweak made is also larger. Tweaks that are too large are bad, so you multiply the tweaking factor with a tiny number known as the learning rate. One pass through the entire dataset is known as an epoch. You'd probably run through many epochs to reach a good solution; it's a good idea to tweak the images non-invasively (such as flipping them horizontally), so that the network sees different numbers for the same image and can more robustly detect patterns. This is known as data augmentation.

Neural networks can transfer knowledge from one project to another. It's very common to take a network that's been trained with 14 million images of a thousand common objects (ImageNet), and then tweak it to adapt to your project. It works because it has already learnt basic visual concepts like curves, edges, textures, eyes, etc, which come in handy for any visual task. This process is known as transfer learning.

Rinse and repeat this process carefully, and you have in your hands an 'AI' solution to your problem.*

* Of course, this isn't all that deep learning is about. What I've described here is supervised learning.

There are several other sub-fields of deep learning that are more nuanced and cutting-edge, such as unsupervised learning, reinforcement learning, and generative modelling, to name a few.

Our Services

We offer a plug and play services for film language cataloging on any length of film uploaded to our servers. Our charges are dependent on the lenght of the film and the resolution. Once you upload a snippet of film in the dashboard after signing up a quotation will be generated by our systems and you can pay to have the film analysed. Get in touch or sign up to use our services.