By Jesse Li
The talk I attended today was held in the School of Computer Science at Carnegie Mellon University. Professor Li mentioned about her current research projects and explain the data mining techniques they used. Despite I do not have research interest in spatial data mining, I expected this talk to be the most relevant one to my background among the ones hold on this Friday. The summary of the talk and my reflections are below:
The first application domain they conduct research about semantic understanding of spatial trajectories is understanding animal movement (migration) behaviors. The question to be answered here is why animals change their paths while migrating? She demonstrated the issue with the spatial trajectories of eagles when they migrate. As another example, she mentioned about the bamboo dataset they collect. In this dataset, there are bamboos moving together and the aim of the spatial data mining is to understand who is moving first and who is following it. Prediction of the spread of several diseases is another application domain using the same techniques with above examples.
Her second research area is understanding human movement behaviors. As she noted, human movement is very different from animal movement from the point that people are limited to road network. They conducted a research by collecting 30 days trajectory data to understand the connection between the places visited and health status of the individuals. This connection is mostly about socio-economic variables. For example, if a person goes to low-end grocery stores usually, then it can be inferred that the person may be unhealthy. For their second work under this category, they collected traditional taxi trip dataset (by traditional, she ment only trajectory data, not the detailed travel data including who called the taxi who drove etc, like collected by Uber).By using those data, they profile users based on their trip. This kind of data attract the companies since they use them to present personalized advertisements to the users. For example, if you go shoppingvery frequently, then you like shopping, ads are displayed accordingly.
The third category of their research projects is about understanding urban dynamics. They inspect data such as taxi trips, POIs (point of interest),crime statistics, geo-tagged tweets, noise data, collision records and so on collected in New York City. Here, they are concerned with answering the following question: Can we understand one type of data by using other types of data? (such as understanding crime by using taxi trip data and POI data). The last category of their research projects is about understanding environmental data. I could not get quite much of this part, but basically she mentioned about advances in shale gas development in Pennsylvania. Shale gas development help us get gas much more efficiently and cheaper. They inspect the environmental impacts of this methods, such as its causing to increase the methane in the groundwater and in the air.
After detailed introduction of their research projects, she elaborated on the data mining techniques and their novel contributions in each application domain in the second part of the talk. I do not want to go into detail of all those low-level discussions, but will mention about general ideas of the one of the research projects, trajectory data mining. An individual trajectory is affected by :
- Self periodic behaviors
- Relationships with other moving objects
- Correlation with external context
I will summarize the techniques they used for the second factor, relationship with other moving objects. First of all, what they investigate is that: Given two trajectories R and S, measure their relationship strength by using: 1) Euclidean distance and 2)Meeting frequency. However, the distance between two objects does not mean that they move together all the time. Hence, they inspect the frequency of their meeting times. In other words, the more frequently you correlate with another person, the stronger the relationship is. Second, the global background should be considered while mining the relationship between moving objects. As an example, she demonstrated two cases that two individuals being present together in the Central Park in New York and in an uncrowded private resident. The prior may occur mostly due to the chance, because the Central Park is quite crowded, so being at the Central Park with another person does not necessarily mean that two people have relationship. For doing such kind of analysis, we need to use the semantic context of locations, here the public park and private resident.
I was impressed by learning their quite interdisciplinary research projects and how data mining is used in real world applications in a way that they serve to the well-being of the public. However, there was not enough discussion about big data specific discussions in the talk as committed in the abstract. Also I found it a little bit detailed, actually it took longer than the scheduled time making me hard to focus on after the half of it.