Xu Dong and Nazrul I. Shaikh
Department of Industrial Engineering
Networks exist everywhere in the world. Internet, transportation networks, social networks, IoT networks, and biological networks are typical instances of them. These real-world networks usually have millions of nodes and edges and a complex structure, since the interactions among nodes are irregular, or stochastic. Measuring the structures of real-world networks helps us better understand the dynamic processes and mechanisms, such as spread of virus, cascade of failures, and propagation of fake news, that take place on them. However, traditional tools are rarely scalable or efficient to handle the magnitude of scale and the complexity of real-world networks. We need more efficient way to present network data and easier way to analyze networks. This poster presents an open-source network analytics tool, called fastnet (https://cran.r-project.org/web/packages/fastnet/index.html), aiming at fast simulating and analyzing large-size relational data via sampling techniques. By utilizing multi-core processing and nodes-/edge-wise sampling strategies, fastnet can boost the graph analytics speed 10X+ faster compared with previous network analysis tools. In addition, this R-based tool works seamlessly with other data acquisition and analytics packages in the R environment, allowing more users to get access to the analytical power from both the tool and the R environment. fastnet is attracting attentions from both academic and industrial domains.
Department of Industrial Engineering
Theorists in operations research (OR) and management science (MS) have favored of analytical tools and mathematical models, although many manufacturing and service environments are too complex to model and develop solutions by using such techniques. On the other hand, firms incorporating (big) data analytics have stated that they are 5–6% greater in productivity and profitability. Surely, the roles of data and the effective use of data are very exciting matters. However, as theorists in OR/MS, we cannot put aside the premises of mathematical modeling techniques because they provide the most acceptable idealizations in a utopian-world setting. But, we must not also develop theoretically correct models that ignore what actually is being done in real-world settings. Data is indeed a very exciting matter to search for such information in order to build computing algorithms based on statistics and mathematics for solving real-world business problems. In this context, this presentation will cover four topics in data science: philosophy of data, four operations of data science (descriptive, diagnostic, predictive and prescriptive), compass of data science (reasoning, searching for facts and factors, and learning), and exploratory examples from industries (e.g., Amazon, eBay, Netflix) to point out where we are today in OR/MS.
Yudong Tao, Yuexuan Tu, Mei-Ling Shyu
Department of Electrical and Computer Engineering
Data Analysis for real-world applications, such as disaster information management, usually encounters data with various modalities. Disaster information management applications have always attracted lots of attention due to its impacts on the society and government. To enhance these applications, it is essential to adequately analyze all information extracted from different data modalities, while most existing learning models only focus on a single modality. This poster presents a multimodal deep learning framework for the content analysis on disaster-related videos. First, several deep learning models are utilized to extract useful information from multiple modalities. Among them, the pre-trained Convolutional Neural Network (CNN) for visual and audio feature extraction and a word embedding model for textual analysis are utilized. Next, our novel fusion technique is applied to integrate the data representation in different levels. The proposed multimodal framework can reason about a missing data type using other available data modalities. It is then evaluated on a web-crawled disaster video dataset and compared with several state-of-the-art single modality and fusion techniques.
Haluk Damgacioglua1, Nurcin Celik1, Emrah Celik PhD2
1Department of Industrial Engineering
2Department of Mechanical and Aerospace Engineering, University of Miami, Coral Gables
EPIGENETIC refers to all heritable alterations that occur in a given gene function without having any change on DNA sequence. DNA methylation, i.e., the addition of a methyl-group to cytosine is a very common type of epigenetic alteration. Identifying idiosyncratic DNA methylation profiles of different tumor types and subtypes can provide invaluable insights for
- Accurate diagnosis,
- Early detection,
- Tailoring of the related treatment for cancer.
This profiling has led to extensive usage of conventional distance-based clustering algorithms such as hierarchical clustering, k-means clustering, etc. Despite their speed in conduct of high-throughput analysis, these methods commonly result in suboptimal solutions and/or trivial clusters due to their greedy search nature. Hence, methodologies are needed to improve the quality of clusters formed by these algorithms without sacrificing from its speed. We introduce three algorithms for a complete high-throughput methylation analysis:
- variance-based dimension reduction algorithm to reduce the number of dimensions of methylation data before it is processed for clustering,
- distance-based outlier detection algorithm to detect the outliers and micro-clusters of the methylation data with reduced dimensionality,
- advanced Tabu-based iterative k-medoids clustering algorithm (T-CLUST) to reduce the impact of initial solutions on the performance of the conventional k-medoids algorithm
- Explore: learn the reward of arms
- Exploit: pull the arm that yielded highest reward in the past
- Maximize cumulative reward over time horizon by balancing exploration and exploitation.