Today, we’re proud to announce we led the investment into Valossa, a groundbreaking Video AI company from Oulu, Finland with offices in Helsinki, Finland and representation in New York.
Valossa has built a video AI technology; Valossa AI, which understands video the same way a human does. This takes the field of Artificial Intelligence and Machine Vision technology a big step forward. Moving well beyond using simple object recognition to describe scenes, their technology unlocks a whole new set of uses for customers that have not been possible - until now.
Valossa AI can be delivered as a cloud platform, an on premise installation or even for use on-board a piece of hardware!
To put into context just how significant a milestone what they can do is, only recently Mark Zuckerberg said in a podcast with Reid Hoffman “So, in the next 5 years or so, maybe 10 years at the furthest out but I think it'll probably be closer to 5, I expect that we'll have machines that can perform better than people can at all of our primary senses. So, seeing... you know understanding what's in a scene, what's in a video, who people are, what things are, what's going on. Hearing, speaking, language and understanding around that... so it'll be able to read and have greater comprehension. And even if it's not much better comprehension it'll be much faster so a computer will be able to read a million times more things than a person can.”
Mark was right, it is nearer 5 years away - Valossa can already do this level of full comprehension in almost real time.
OK, it’s ground-breaking… but how?
When we use our senses, our brains combine a huge number of skills to comprehend our reality. This is true for our real-time reality or when we watch a video. We recognise faces (including partial faces) of those we know and don’t know, we pick up emotional cues, and we understand the context of the surrounding – we can understand the “feel” of the setting not just the physical objects and places. We understand motion (walking vs. running vs. cartwheeling across a street) and we can even have a sense of someone’s heart rate (are they calm, are they nervous, are they excited etc.) Our brains don’t just stop there, through our hearing, we then add on audio cues (we know from a police siren that a police vehicle is around, even though we can’t see it), and emotional cues from the way people are speaking. And of course, significantly we understand the cues from the conversation and content of the spoken words. When we watch a video, our senses stop here, and our brains combine everything that lets us understand the video. Of course, in real life, we also use our other senses.
Valossa can do what we humans do when we watch a video. Within the field of Machine Vision (getting a machine to understand imagery), Valossa AI can do everything I described above- a laundry list of machine vision features (yes, including heart rate detection!) at world-leading accuracy. The team behind this are from the cutting-edge Machine Vision research lab at the University of Oulu, building on years of research done there and in academia. But crucially, in their mission to build what the world has yet to have, they built a multi-modal AI. Which is geek for not just AI that understand what is in the video, but also the audio. Again, in the same way I described before, not just detecting objects or events via audio but also by understanding the content of the conversation.
Why does this matter?
For a start, existing machine vision services stop at parsing videos into a series of pictures. Which is fine for generating basic descriptions of each scene. This, however, is very limiting; imagine a machine trying to watch a video of a football game which was parsed into a sequence of static images. It’s pretty much all “people, person, ball, field, grass”. Valossa AI machine vision capabilities alone are already a step forward.
But their multi-modal AI approach is really where things get exciting. Take their first target customers in the Media industry; with an understanding of video at scale, it allows companies to really let their users consume content dynamically. For example, users can explore a news topic from a variety of news shows as it has been stitched together by Valossa AI based on the content being talked about. Or allow the user to watch content about their favourite sports team or player, from build- up commentary, to in-game footage and post- match analysis. This is huge for the media industry for two reasons, users have dramatically changed how they consume content and it allows them to also better utilise their existing content. Furthermore, as we consume content less programmatically, Valossa AI has been used very successfully to pick the best contextually relevant video ads to show the user.
Early adopters are using Valossa AI to enrich their video- on- demand streaming services, enhance archiving and retrieval of their video catalogues, allow users to engage more dynamically with their content universe and to better select video ads. This may be a good point to highlight that, not content with just building AI, the Valossa team have also developed their own deep video search capabilities.
And this is just the start of what people can do with Valossa AI technology; I am also personally really excited about their uses outside of Media. Take Robotics alone; Valossa’s AI can really take human- machine interactions to the place envisaged by the science fiction writers. A robot that can recognise that I’m getting agitated while interacting with it, or that another person is just a bit down and adjust its interactions accordingly.
We are really excited to be working with the amazing men and women at Valossa and warmly welcome them to the 01Ventures universe. We look forward to being a part of their amazing story.
If you would like to find out more, please visit www.valossa.com.