A machine can recognize high-score tags of photos

PIXTA Advent Calendar 2017 Day 11.

As a stock photo provider, our passion is to help users find their desired photos in the most effortless way. With millions of photos in our library, delivering to our users what they truly need whenever they visit our website is one of the most challenging goals. Several strategies have been used to increase user engagement, for example: search optimization. Search optimization, with the only purpose is to improve searching system, helps users find their desired photos in less amount of time and as accurate as possible. In order to optimize searching system, we need to develop some techniques. Selecting good and precise high-score tags seems to affect searching system’s effectiveness to a great extent.

f:id:hamuyuuki:20171208200414j:plain

Search results for keyword "baby and computer"

Considering the photos and the large number of tags inserted by contributors, tag selection is time consuming and prone to subjectivity.

While shooting photos, the photographer knows beforehand which ideas he/she wants to express. His/her ideas might be described accurately in his/her photos. However, the photos might also be misleading, turn out to imply different ideas. This process is neither science or art; since we cannot define (with 100% accuracy) what is a correct set of high-score tags yet we can have similar sets of high-score tags at the same time. It’s the grey area. That explains why this process is relatively subjective, depends on reviewers. The final high-score tags might be different if reviewed by different reviewers, or even by same reviewers in different time of the day.

Idea

Since it is a time consuming process, we wish to predict high-score tags of a specific photo by machine, and relieve reviewers from such boring task.

Science is what we understand well enough to explain to a computer. Art is everything else we do. Donald Knuth

We understand that we cannot explain how to select good high-score tags to machine, thus we stimulate this process as the way human do. Let’s go back to our childhood! When you was a little kid, everything is strange to go. However, as time goes by, you gradually grew up and started being curious about things around you. When you failed to recognize something, someone will correct for you and by that, you remember it and try not to make the same mistake. It is a type of learning - learning from experience. In machine learning, we attempt to teach the computer how to recognize high-score tags from a photo as in the same way as human do.

How did we do it

In machine learning projects, data is a treasure. It provides us with insights on how we can build our model. It also gives our model enough information to learn on its own. Collecting, processing and understanding data accurately take quite a lot of time in the whole project lifetime.

In Lab team, we constructed our model, which simulates what I’ve just described above along with collected data. With much effort, we went through several experiments until we successfully built a model which meets our requirements at acceptable level. For this project, all of our knowledge about computer vision and machine learning are utilized to create a machine which can understand photos, produce a set of high score tags as precise as possible. The most effective model focuses on a specific category only; meaning if the machine gets used to predict high-score tags for photos belong to people category, it only have knowledge related to people. Due to this reason, we have to train multiple machines for multiple categories. Yet this method will attain high-quality result at the end.

As mentioned above, there are no certain rules to determine correct a set of high-score tags of a particular photo. Therefore, we cannot build a program to evaluate our model accurately. After many rounds of model evaluation our members, we selected a random number of photos in specific categories, then used our model to produce sets of high-score tags for photos in these categories. The photos had been reviewed by our reviewers to ensure that our model result satisfies the predefined requirements before applying for production. The results from our model were accepted by reviewers and thus our model will be used on PIXTA system for several selected categories at the end of December.

f:id:hamuyuuki:20171208194409j:plain

29963733 (evgenyatamanenko) / PIXTA

High-score tags(by reviewers): computer, boy, work, phone, talking

High-score tags(by machine): computer, boy, baby, laptop, phone

Contributor's tags: computer, boy, work, phone, talking, baby, child, laptop, cute, kid, caucasian, technology, internet, people, happy, childhood, person, sitting, pc, beautiful, young, white, toddler, home, business, little, indoor, modern, office, night, small, happiness, infant, desk, dark, fun, screen, lifestyle, education, learning, male, notebook, typing, businesswoman, background, communication, son, concept, professional

Further work

Currently, PIXTA system has several dozens of main categories. And yes, we have plan to support all of them. In addition, there are existing problems with our current model which remains unresolved by now but we are determined to solve them in the future, e.g how to catch up with trending, or understand abstract meaning, recognize locations,... For our team’s achievements, along with this project mission to understand photos and huge forward step in machine learning, computer vision area recently, this will definitely be a fascinating project and an inspiration for other awesome projects in the future.

About Trong(チョン): He joined Pixta Vietnam as an software engineer. For almost a year, he had been working as a Ruby on Rails developer, before moved to Lab team to keep challenging himself with new things and having chances to work on different Machine Learning projects.

About Lab team: Lab team is a small team in PIXTA’s Hanoi office. We focus on researching new technologies, especially Machine Learning, Data science and use them to solve the difficult problems which could not be solved by using the traditional techniques. Our goal is building the creative working environment where all members can be a self-initiator who is able to raise the ideas, brainstorming and make contribute to company's development.