Recognition of human movement utilizing time collection from cell and wearable units is often used as key context data for varied purposes, from well being situation monitoring to sports activities exercise evaluation to person behavior research. Nonetheless, gathering large-scale movement time collection knowledge stays difficult as a consequence of safety or privateness considerations. Within the movement time collection area, the dearth of datasets and an efficient pre-training process makes it tough to develop comparable fashions that may function with restricted knowledge. Sometimes, present fashions carry out coaching and testing on the identical dataset, they usually battle to generalize throughout completely different datasets given three distinctive challenges throughout the movement time collection drawback area: First, putting units in numerous places on the physique—like on the wrist versus the leg—results in very completely different knowledge, which makes it robust to make use of a mannequin educated for one spot on one other half. Second, since units will be held in varied orientations, it’s problematic as a result of fashions educated with a tool in a single place typically battle when the system is held in another way. Lastly, completely different datasets typically deal with various kinds of actions, making it laborious to check or mix the info successfully.
The traditional movement time collection classification depends on separate classifiers for every dataset, utilizing strategies like statistical characteristic extraction, CNNs, RNNs, and a focus fashions. Common-purpose fashions like TimesNet and SHARE purpose for process versatility, however they require coaching or testing on the identical dataset; therefore, they restrict adaptability. Self-supervised studying helps in illustration studying, although generalization throughout varied datasets stays difficult. Pretrained fashions like ImageBind and IMU2CLIP contemplate movement and textual content knowledge, however they’re constrained by device-specific coaching. Strategies that use giant language fashions (LLMs) depend on prompts however have problem recognizing complicated actions as they don’t seem to be educated on uncooked movement time collection and battle with precisely recognizing complicated actions.
A bunch of researchers from UC San Diego, Amazon, and Qualcomm proposed UniMTS as the primary unified pre-training process for movement time collection that generalizes throughout numerous system latent components and actions. UniMTS makes use of a contrastive studying framework to hyperlink movement time collection knowledge with enriched textual content descriptions from giant language fashions (LLMs). This helps the mannequin to grasp the which means behind completely different actions and permits it to generalize throughout varied actions. For big-scale pre-training, UniMTS generates movement time collection knowledge based mostly on present detailed skeleton knowledge, which covers varied physique elements. The generated knowledge is then processed utilizing graph networks to seize each spatial and temporal relationships throughout completely different system places, serving to the mannequin generalize to knowledge from completely different system placements.
The method begins by creating movement knowledge from skeleton actions and adjusting it based on completely different orientations. It additionally makes use of a graph encoder to grasp how joints join so it may possibly work effectively throughout completely different units. The textual content descriptions are improved utilizing giant language fashions. To create movement knowledge, it calculates the velocities and accelerations of every joint whereas it considers their positions and orientations, including noise to imitate real-world sensor errors. To deal with inconsistencies in system orientation, UniMTS makes use of knowledge augmentation to create random orientations throughout pre-training. This methodology takes into consideration variations in system positions and axis setups. By aligning movement knowledge with textual content descriptions, the mannequin can adapt effectively to completely different orientations and exercise sorts. For coaching, UniMTS employs rotation-invariant knowledge augmentation to deal with system positioning variations. It was examined on the HumanML3D dataset and 18 different real-world movement time collection benchmark datasets, notably with a efficiency enchancment of 340% within the zero-shot setting, 16.3% within the few-shot setting, and 9.2% within the full-shot setting, in contrast with the respective best-performing baselines. The mannequin’s efficiency was in comparison with baselines like ImageBind and IMU2CLIP. Outcomes confirmed UniMTS outperformed different fashions, notably in zero-shot settings, based mostly on statistical assessments that confirmed important enhancements.
In conclusion, the proposed pre-trained mannequin UniMTS is solely based mostly on physics-simulated knowledge, but it exhibits outstanding generalization throughout numerous real-world movement time collection datasets that includes completely different system places, orientations, and actions. Whereas leveraging its efficiency from conventional strategies, UniMTS possesses some limitations, too. In a broader sense, this pre-trained movement time collection classification mannequin can act as a possible base for the upcoming analysis within the subject of human movement recognition!
Take a look at the Paper, GitHub, and Mannequin on Hugging Face. All credit score for this analysis goes to the researchers of this undertaking. Additionally, don’t neglect to comply with us on Twitter and be a part of our Telegram Channel and LinkedIn Group. For those who like our work, you’ll love our e-newsletter.. Don’t Overlook to affix our 55k+ ML SubReddit.
[Sponsorship Opportunity with us] Promote Your Analysis/Product/Webinar with 1Million+ Month-to-month Readers and 500k+ Neighborhood Members
Divyesh is a consulting intern at Marktechpost. He’s pursuing a BTech in Agricultural and Meals Engineering from the Indian Institute of Expertise, Kharagpur. He’s a Knowledge Science and Machine studying fanatic who needs to combine these main applied sciences into the agricultural area and clear up challenges.