What is Zanran?
Your source for data & statistics - graphs, charts and tables
Zanran helps you to
find ‘semi-structured’ data on the web. This is the numerical
data that people have presented as graphs and tables and charts. For
example, the data could be a graph in a PDF report, or a table in an
Excel spreadsheet, or a barchart shown as an image in an HTML page.
This huge amount of information can be difficult to find using
conventional search engines, which are focused primarily on finding
text rather than graphs, tables and bar charts.
Put more simply:
Zanran is Google for data. Click on Link to view Examples,
e.g. African mobile phones
How it works… technology overview
Zanran doesn't work by spotting wording in the
text and looking for images – it's the other way round. The system
examines millions of images and decides for each one whether it's a
graph, chart or table – whether it has numerical content.
The core technology is
patented computer vision algorithms that decide whether an image is
numerical – and they're accurate (about 98%). But the huge majority
of images on the internet are not graphs etc. So even though the
accuracy is high, you will still get some non-numerical images.
In comparison, looking
for tables is relatively simple. Once we've found a table we then
have to decide whether it's essentially numerical - and we have
algorithms for that.
Our programmes then
take suitable text near that image and build the search engine around
that text. At present, we extract tables and images from HTML, PDF
and Excel files and will be processing PowerPoint and Word documents
in the near future.
It is worth also
mentioning that mapping the numerical content on the web would not
have been possible without the development of open-source software
and the access to vast processing power and cheap storage in cloud
computing.
Zanran has crawled
most of the internet. But if you think there is a good site
we've missed, please let us know.