From their early days at MIT, and even before that, Emma Liu ’22, MNG ’22, Yo-whan “John” Kim ’22, MNG ’22, and Clemente Ocejo ’21, MNG ’22 knew they wanted to do computational research and explore artificial intelligence and machine learning. “Since high school, I’ve been engaged in deep learning and involved in projects,” said Kim, who attended a Research Science Institute (RSI) summer program at MIT and Harvard University and started working on action recognition in videos using Microsoft’s Kinect.
As students of the Department of Electrical Engineering and Computer Science who recently graduated from the Master of Engineering (MEng) Thesis Program, Liu, Kim, and Ocejo have developed the skills to guide application-oriented projects. In collaboration with the MIT-IBM Watson AI Lab, they improved text classification with limited tagged data and designed machine learning models for better long-term forecasting for product purchases. For Kim, “it was a very smooth transition and … a great opportunity for me to continue working in deep learning and computer vision at the MIT-IBM Watson AI Lab.”
In collaboration with researchers from academia and industry, Kim designed, trained and tested a deep learning model for recognizing actions in different domains – in this case video. His team focused specifically on using synthetic data from generated videos for training and performed prediction and inference tasks on real data, composed of different classes of action. They wanted to see how pre-training models on synthetic videos, especially simulations of, or game engine-generated, humans or humanoid actions stacked up into real data: publicly available videos scraped from the Internet.
The reason for this research, Kim says, is that real videos can have issues, including representational bias, copyright, and/or ethical or personal sensitivity, for example, videos of a car hitting people would be difficult to collect, or the use of faces, real addresses or license plates without permission. Kim is experimenting with 2D, 2.5D and 3D video models, with the goal of creating domain-specific or even a large, generic, synthetic video dataset that can be used for some transfer domains where data is missing. For applications in the construction sector, this could mean, for example, that the action recognition is performed on a construction site. “I didn’t expect synthetically generated videos to perform as well as real videos,” he says. “I think that opens up a lot of different roles [for the work] in the future.”
Despite a shaky start to the project, collecting and generating data and running many models, Kim says he wouldn’t have done it any differently. “It was amazing how the lab members encouraged me, ‘It’s okay. All the experiments and the fun part are coming. Don’t worry too much.’” It was this structure that helped Kim take responsibility for the work. “In the end, they gave me so much support and great ideas that help me carry out this project.”
Data scarcity was also a theme in Emma Liu’s work. “The overarching problem is that there is all this data in the world, and for a lot of machine learning problems you have to label that data,” Liu says, “but then you don’t really use all this unlabeled data that’s available.” from.”
Liu, led by her MIT and IBM groups, worked to leverage that data, by training semi-supervised text classification models (and combining aspects of it) to add pseudo-labels to the untagged data, based on predictions and probabilities about which categories each piece of previously unlabeled data fits into. “Then the problem is that there has been work before that has shown that you can’t always trust the odds; in particular, neural networks have been shown to be often overconfident,” says Liu.
Liu and her team addressed this by evaluating the accuracy and uncertainty of the models and recalibrating them to improve her self-training framework. The step of self-training and calibration made her more confident in the predictions. This pseudo-tagged data, she says, can then be added to the pool of real data, expanding the data set; this process can be repeated in a series of iterations.
For Liu, her biggest takeaway was not the product, but the process. “I learned a lot about being an independent researcher,” she says. As a student, Liu worked with IBM to develop machine learning methods to reuse drugs already on the market and improve her decision-making abilities. After working with academic and industry researchers to gain skills to ask focused questions, seek out experts, process and present scientific papers for relevant content, and test ideas, Liu and her cohort of MEng students felt working with MIT -IBM Watson AI Lab, that they were confident in their knowledge, freedom and flexibility to dictate the direction of their own research. Liu takes on this key role, saying, “I feel like I owned my project.”
After his time at MIT and at the MIT-IBM Watson AI Lab, Clemente Ocejo also came away with a sense of mastery, having built a strong foundation in AI techniques and time series methods, beginning with his MIT Undergraduate Research Opportunities Program ( UROP), where he met his MEng advisor. “You really have to be proactive in the decision-making,” says Ocejo, “saying it [your choices] as a researcher and let people know that this is what you do.”
Ocejo used his background in traditional time series methods to collaborate with the lab, applying deep learning to better predict demand for products in the medical sector. Here he designed, wrote and trained a transformer, a specific machine learning model is most commonly used in natural language processing and has the ability to learn very long-lasting dependencies. Ocejo and his team compared target forecasting needs between months, and learned dynamic relationships and attentional weights between product sales within a product family. They looked at identifiers, regarding the price and amount, as well as account characteristics about who buys the items or services.
“One product does not necessarily affect the prediction made at the time of the prediction for another product. It only affects the parameters during training that lead to that prediction,” says Ocejo. “Instead, we wanted to have a little more direct impact, so we added this layer that makes this connection and draws attention between all the products in our dataset.”
In the long run, over a one-year forecast, the MIT-IBM Watson AI Lab group was able to outperform the current model; Even more impressive, it did so in the short term (nearly a fiscal quarter). Ocejo attributes this to the dynamics of its interdisciplinary team. “A lot of the people in my group weren’t necessarily very experienced in the deep learning aspect of things, but they had a lot of experience in supply chain management, operations research and optimization, which is something I don’t know. have so much experience in it,” says Ocejo. “They gave a lot of great, high-level feedback on what to tackle next and… and knew what the industry wanted to see or improve on, so it was really helpful in streamlining my focus.”
For this work, a deluge of data didn’t make a difference to Ocejo and his team, but rather its structure and presentation. Often large deep learning models need millions and millions of data points to draw meaningful conclusions; however, the MIT-IBM Watson AI Lab group showed that results and technical improvements can be application-specific. “It just goes to show that these models can learn something useful, in the right setting, with the right architecture, without requiring an overload of data,” says Ocejo. “And then with an excessive amount of data, it’s only going to get better.”