I happened to catch, a bit late, Extractiv’s public launch dated 11/17 (story here). Whether they’re successful or not remains to be seen, but I like to think that they are focusing on the sweet spot of ‘X as a service’ with their structured data extraction offering (I’ve created a demo account and will share my experience with the tool in a later post). The general idea – by the way – is to scan unstructured data (web pages, documents, other content) to create meaningful structured data (think data model, rows, and columns)…So why is this interesting?
When I speak to folks about ‘cloud,’ the conversation usually involves one or more of the three common constructs: ‘Infrastructure, Platform,’ or ‘Software’ as a service. These three generally reflect the transition of what we know as distributed architecture in an enterprise environment into an Internet environment. All good – all makes sense. The focus is primarily optimization, cost savings, improved efficiency and so on with a theme of ‘how can we do the same thing today better, faster, cheaper on – or with – the web.’ An important sea change with a lot of folks figuring it out…
Now, in addition to scale and cost effectiveness, the internet also offers access to the largest collection of unstructured and structured data the world has ever known. For those who can develop software capable of working through the fragmentation of data sources to create new information (with appropriate filtering, inference, natural language processing, etc) opportunities will be endless. Again, a lot of folks trying to figure it out (opinion mining, or sentiment analysis, is one example that comes to mind but there are many applications). The winning formula will undoubtedly offer flexibility, breadth, and depth but most importantly, fill in the right gaps, and /or provide results that are meaningful – or enlightening – and in the proper context, for the inquirer.