Data Analytics and Cybersecurity Poster Session

Fastnet: A revolutionary tool to handle big data from real-world networks

Xu Dong and Nazrul I. Shaikh

Department of Industrial Engineering

Networks exist everywhere in the world. Internet, transportation networks, social networks, IoT networks, and biological networks are typical instances of them. These real-world networks usually have millions of nodes and edges and a complex structure, since the interactions among nodes are irregular, or stochastic. Measuring the structures of real-world networks helps us better understand the dynamic processes and mechanisms, such as spread of virus, cascade of failures, and propagation of fake news, that take place on them. However, traditional tools are rarely scalable or efficient to handle the magnitude of scale and the complexity of real-world networks. We need more efficient way to present network data and easier way to analyze networks. This poster presents an open-source network analytics tool, called fastnet (https://cran.r-project.org/web/packages/fastnet/index.html), aiming at fast simulating and analyzing large-size relational data via sampling techniques. By utilizing multi-core processing and nodes-/edge-wise sampling strategies, fastnet can boost the graph analytics speed 10X+ faster compared with previous network analysis tools. In addition, this R-based tool works seamlessly with other data acquisition and analytics packages in the R environment, allowing more users to get access to the analytical power from both the tool and the R environment. fastnet is attracting attentions from both academic and industrial domains.

Poster

Facial Analytics: 3D Reconstruction from Video

Olga Jumbo1, Dr. Mohamed Abdel-Mottaleb2, Dr. Shihab Asfour2, Maolin Pang3

1Department of Industrial Engineering, 2Department of Electrical and Computer Engineering, 3University of Science and Technology of China

Poster

The Role of Data Analytics in Operations Research and Management Science

Busra Keles

Industrial Engineering

Theorists in operations research (OR) and management science (MS) have favored of analytical tools and mathematical models, although many manufacturing and service environments are too complex to model and develop solutions by using such techniques. On the other hand, firms incorporating (big) data analytics have stated that they are 5â€“6% greater in productivity and profitability. Surely, the roles of data and the effective use of data are very exciting matters. However, as theorists in OR/MS, we cannot put aside the premises of mathematical modeling techniques because they provide the most acceptable idealizations in a utopian-world setting. But, we must not also develop theoretically correct models that ignore what actually is being done in real-world settings. Data is indeed a very exciting matter to search for such information in order to build computing algorithms based on statistics and mathematics for solving real-world business problems. In this context, this presentation will cover four topics in data science: philosophyÂ of data, four operations ofÂ data science (descriptive, diagnostic, predictive andÂ prescriptive), compass of dataÂ science (reasoning,Â searching for facts and factors, and learning), and exploratory examples from industries (e.g., Amazon, eBay, Netflix) to point out where we are today inÂ OR/MS.

Poster

Multimodel deep representation learning for disaster information management

Yudong Tao, Yuexuan Tu, Mei-Ling Shyu

Department of Electrical and Computer Engineering

Data Analysis for real-world applications, such as disaster information management, usually encounters data with various modalities. Disaster information management applications have always attracted lots of attention due to its impacts on the society and government. To enhance these applications, it is essential to adequately analyze all information extracted from different data modalities, while most existing learning models only focus on a single modality. This poster presents a multimodal deep learning framework for the content analysis on disaster-related videos. First, several deep learning models are utilized to extract useful information from multiple modalities. Among them, the pre-trained Convolutional Neural Network (CNN) for visual and audio feature extraction and a word embedding model for textual analysis are utilized. Next, our novel fusion technique is applied to integrate the data representation in different levels. The proposed multimodal framework can reason about a missing data type using other available data modalities. It is then evaluated on a web-crawled disaster video dataset and compared with several state-of-the-art single modality and fusion techniques.

Poster

Understanding high-throughput DNA methylation data via computationally affordable analytics

Haluk Damgacioglua1, Nurcin Celika1, Emrah Celikb Ph.D2

1Department of Industrial Engineering

2Department of Mechanical and Aerospace Engineering, University of Miami, Coral Gables

EPIGENETIC refers to all heritable alterations that occur in a given gene function without having any change on DNA sequence.

DNA methylation, i.e., the addition of a methyl-group to cytosine is a very common type of epigenetic alteration. Identifying idiosyncratic DNA methylation profiles of different tumor types and subtypes can provide invaluable insights for

• Accurate diagnosis,
• Early detection,
• Tailoring of the related treatment for cancer.

This profiling has led to extensive usage of conventional distance-based clustering algorithms such as hierarchical clustering, k-means clustering, etc.Â  Despite their speed in conduct of high-throughput analysis, these methods commonly result in suboptimal solutions and/or trivial clusters due to their greedy search nature. Hence, methodologies are needed to improve the quality of clusters formed by these algorithms without sacrificing from its speed. We introduce three algorithms for a complete high-throughput methylation analysis:

1. i) variance-based dimension reduction algorithm to reduce the number of dimensions of methylation data before it is processed for clustering,
2. ii) distance-based Â outlier detection algorithm to detect the outliers and micro-clusters of the methylation data with reduced dimensionality, advanced Tabu-based iterative k-medoids clustering algorithm (T-CLUST) to reduce the impact of initial solutions on the performance of the conventional k-medoids algorithm
Contextual Combinatorial Multi-armed Bandits with Volatile Arms and Submodular Reward

Lixing Chen and Jie Xu
Department of Electrical and Computer Engineering; Department of Electrical and Computer Engineering

In this paper, we study the stochastic contextual combinatorial multi-armed bandit (CC-MAB) framework that is tailored for volatile arms and submodular reward functions. CC-MAB inherits properties from both contextual bandit and combinatorial bandit: it aims to select a set of arms in each round based on the side information (a.k.a. context) associated with the arms. By volatile arms'', we mean that the available arms to select from in each round may change; and by submodular rewards'', we mean that the total reward achieved by selected arms is not a simple sum of individual rewards but demonstrates a feature of diminishing returns determined by the relations between selected arms (e.g. relevance and redundancy). Volatile arms and submodular rewards are often seen in many real-world applications, e.g. recommender systems and crowdsourcing, in which multi-armed bandit (MAB) based strategies are extensively applied. Although there exist works that investigate these issues separately based on standard MAB, jointly considering all these issues in a single MAB problem requires very different algorithm design and regret analysis. Our algorithm CC-MAB provides an online decision-making policy in a contextual and combinatorial bandit setting and effectively addresses the issues raised by volatile arms and submodular reward functions. The proposed algorithm is proved to achieve $O(cT^{\frac{2\alpha+D}{3\alpha + D}}\log(T))$ regret after a span of T rounds. The performance of CC-MAB is evaluated by experiments conducted on a real-world crowdsourcing dataset, and the result shows that our algorithm outperforms the prior art.

Poster

Efficient Computation of Belief Theoretic Conditionals for Time Sensitive Uncertainty Reasoning Applications

Lalintha G. Polpitiya, Dr. Kamal Premaratne, Dr. Manohar N. Murthi, Dr. Dilip Sarkar

Department of Electrical and Computer

Artificial Intelligence (AI) applications in Data Analytics are growing at a rapid pace in a wide range of critical and sensitive domains. However, expert systems are still prone to collapse due to the difficulty in accommodating uncertainties and replicating complex

environments in many real-world domains. Dempster-Shafer (DS) belief theory plays a major role in modeling these uncertainties and data imperfections. A major limitation associated with the application of DS theoretic techniques for reasoning under uncertainty is the absence of a feasible computational framework to overcome the prohibitive computational burden involved in the conditional operation. This is a known problem with non-deterministic polynomial-time hardness (NP-hard). We address this critical challenge via two novel generalized conditional computational models ---DS-Conditional-One and DSConditional- All --- which allow the conditional mass and belief to be computed in significantly less computational time and space complexity. They provide valuable insight into the DS theoretic conditional itself and can be utilized as a tool for visualizing the conditional computation. We provide the implementation for computing both the Dempster's conditional and the Fagin-Halpern conditional, the two most widely utilized DST conditional strategies. A new computational library, which we refer to as DS-COCA (DS-Conditional-One and DS-Conditional-All) is developed and harnessed in the simulations.

Poster

Convolutional Neural Network Transfer for Automated Glaucoma Detection

Manal Ghamdi, Linhao Luo, Arda Efe Okay, Mohamed Adbel-Mottaleb, Mohamed Abou Shousha

Department of Electrical and Computer Engineering, Umm Al-Qura University, Harbin Institute of Technology, Bascom Palmer Eye Institute

Retracted by the authors.

Characterization of Perfluoroalkyl and Polyfluoroalkyl Substances (PFAS) in Landfill Leachate and Preliminary Evaluation of Leachate Treatment Processes

Athena Jones, Hekai Zhang, Helena Solo-Gabriele

Department of Civil, Architectural, and Environmental Engineering

Perfluoroalkyl and Polyfluoroalkyl Substances (PFAS) are fluorine-containing chemicals that are found in many products that are stick and stain resistant. The most common of the PFASs are perfluorooctanoic acid (PFOA) which is used to make Teflon, and perfluorooctane sulfonate (PFOS), a breakdown product of a common water resistant chemical known as Scotchgard. Although used widely, only recently have their human health impacts been recognized. Studies have linked PFOA and PFOS to thyroid and liver diseases, diseases of the immune system, and cancer. Due to their wide ranging usage in consumer products, landfills represent a logical end-of-life reservoir for PFASs. The objectives of this study are to evaluate the concentrations of PFASs in leachates from Florida landfills and to assess the capacity of current treatments to remove PFASs from leachate. Leachate samples will be collected from landfills in the State of Florida and from the effluent of leachate treatment facilities. These samples are to be analyzed with LC-MS/MS for PFASs. Data on leachate volumes and treatment data have been consolidated for landfills in the State of Florida. From this literature information coupled with leachate measurements, a preliminary assessment will be made about the effectiveness of existing leachate treatment strategies in reducing PFOA and PFOS levels. In an effort to broadly assess the health risks associated with the PFASs, results from leachate measurements will be compared to the U.S. Environmental Protection Agencyâ€™s PFCs health advisory of 0.07 parts per billion. Results can be used by regulators to assess whether treatment systems are needed to remove PFASs from landfill leachates in Florida.

Translate Â»