Research

(1) Large Language Model and Gen-AI Research

Our research in generative AI technology spans from exploring novel statistical techniques to understand large language models (LLMs), particularly within the domain of natural language processing for AI-assisted programming [1-3]. We are currently focused on unsupervised learning with human-in-the-loop, using optimization-theoretic and information-theoretic techniques to develop robust solutions that address AI fairness and safety. Additionally, we are studying the benchmarking [4] and ranking of LLM agents, such as copilots and chatbots, and exploring retrieval-augmented generation (RAG) to evaluate and rank their performance.

[1] M. F Wong and C. W. Tan, Aligning Crowd-sourced Human Feedback for Code Generation with Bayesian Inference, IEEE Conference on Artificial Intelligence, 2023 (Honorable Mention Award in Foundation Models and Generative AI).
[2] M. F. Wong, S. Guo, C.-N. Hang, S.-W. Ho and C. W. Tan, Natural Language Generation and Understanding of Big Code for AI-Assisted Programming: A Review" Entropy, 25(6): 888, 2023.
[3] C. W. Tan, S. Guo, M. F. Wong, Y. Chen: Copilot for Xcode: Exploring AI-Assisted Programming by Prompting Cloud-based Large Language Models, arXiv, 2023.
[4] Md. T. R. Laskar, S. Alqahtani, M. S. Bari, M. Rahman, Md. A. M. Khan, H. Khan, I. Jahan, A. Bhuiyan, C. W. Tan, Md. R. Parvez, E. Hoque, S. Joty, J. X. Huang: A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations, The Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024.

(2) Network Science of Online Mis/Disinformation

The pervasive spread of online rumors and misinformation within social networks can lead to disastrous outcomes (e.g., Twitter Files, COVID-19 Infodemic). Addressing this challenge involves deciphering stochastic spreading processes to identify malicious sources. In [1-4], we study rumor source detection problems using large-scale optimization and the concept of network centrality as statistical inference. Can we predict online sentiments by engineering viral messaging? Our work in [5] delves into this question, leveraging an analysis of Tweets data from the 2012 U.S. Presidential Election. We are now studying large language model-driven data analytics and its impact on AI Trust and Safety.

[1] C. Hang, P.-D. Yu, C. W. Tan, TrumorGPT: Query Optimization and Semantic Reasoning Over Networks for Automated Fact-Checking, The 58th Annual Conference on Information Sciences and Systems (CISS), 2024.
[2] C. W. Tan and P.-D. Yu: Contagion Source Detection in Epidemic and Infodemic Outbreaks: Mathematical Analysis and Network Algorithms, Foundations and Trends in Networking, Vol. 13, No. 2-3, pp. 107-251, 2023.
[3] P. D. Yu, C. W. Tan and L. Zheng, Unraveling Contagion Origins: Optimal Estimation through Maximum-Likelihood and Starlike Graph Approximation in Markovian Spreading Models, under review, 2023.
[4] Z. Wang, W. Dong, W. Zhang and C. W. Tan, Rumor Source Detection with Multiple Observations: Fundamental Limits and Algorithms, ACM SIGMETRICS 2014. Longer version in journal IEEE Journal of Selected Topics in Signal Processing, Vol. 9, No. 4, pp. 663-677, 2015.
[5] W. Dong, W. Zhang and C. W. Tan, Rooting out the Rumor Culprit from Suspects, Proc. of IEEE Intl. Symp. on Information Theory 2013.
[6] F. M. F. Wong, C. W. Tan, S. Sen and M. Chiang, Quantifying Political Leaning from Tweets, Retweets and Retweeters, IEEE Transactions on Knowledge and Data Engineering, Vol. 28, No. 8, pp. 2158-2172, 2016, conference in International AAAI Conference on Weblogs and Social Media (ICWSM).

(3) Large-Scale Optimization and Edge Learning

Our recent work advances distributed optimization and federated learning, pushing the boundaries of secure and scalable edge learning for Edge AI. In [1], we develop efficient recovery from poisoning attacks, while [2] introduces a blockchain-based framework for credible and fair federated learning. In [3], we survey communication-efficient edge computing via coded federated learning. [4] presents FedReMa, enhancing personalized learning by selecting the most relevant clients. In [5], we integrate network coding for efficient hierarchical federated learning, and [6] accelerates federated unlearning with the Polyak Heavy Ball method.

[1] Y. Jiang, J. Shen, Z. Liu, C. W. Tan, K.-Y. Lam, Towards Efficient and Certified Recovery from Poisoning Attacks in Federated Learning, IEEE Transactions on Information Forensics & Security, accepted, 2024
[2] L. Chen, D. Zhao, L. Tao, K. Wang, S. Qian, X. Zeng and C. W. Tan, A Credible and Fair Federated Learning Framework Based on Blockchain, IEEE Transactions on Artificial Intelligence, 2024.
[3] Y. Zhang, T. Gao, C. Li and C. W. Tan, Coded Federated Learning for Communication-Efficient Edge Computing: A Survey, IEEE Open Journal of the Communications Society, Vol. 5, pp. 4098-4124, 2024.
[4] H. Liang, Z. Zhan, W. Liu, X. Zhang, C. W. Tan and X. Chen, FedReMa: Improving Personalized Federated Learning via Leveraging the Most Relevant Clients, 27th European Conference on Artificial Intelligence, 2024.
[5] T. Gao, J. Lin, C. Li and C. W. Tan, Federated Learning Meets Network Coding: Efficient Coded Hierarchical Federated Learning, IEEE Information Theory Workshop, 2024.
[6] Y. Jiang, C. W. Tan and K.-Y. Lam, Accelerating Federated Unlearning via Polyak Heavy Ball Method, IEEE Information Theory Workshop, 2024.

(4) AI in Education

Recent research highlights innovative applications of AI in education. Our studies in [1] and [3] explore how ChatGPT and LLM-driven tools can foster self-regulated learning and enhance science education, as well as generate personalized MCQs. In [2], we present Nemobot, a strategic gaming agent for K-12 education, showcasing the potential of game-based learning. Additionally, we propose innovations in flipped classrooms using LLMs to enhance peer instruction and just-in-time teaching, transforming interactive and adaptive learning methods.

[1] D. T. K. Ng, C. W. Tan, J. K. L. Leung, Empowering student self-regulated learning and science education through ChatGPT: A pioneering pilot study, British Journal of Educational Technology, 55(4), pp. 1328-1353, 2024

[2] Y. Wang, S. Guo, L. Ling, C. W. Tan, Nemobot: Crafting Strategic Gaming LLM Agents for K-12 AI Education, ACM Learning at Scale 2024

[3] C. N. Hang, C. W. Tan, P. D. Yu, MCQGen: A Large Language Model-Driven MCQ Generator for Personalized Learning, IEEE Access 12, pp. 102261-102273, 2024

[4] C. W. Tan, Large Language Model-Driven Classroom Flipping, Empowering Student-Centric Peer Questioning with Flipped Interaction, CoRR abs/2311.14708, 2023

[5] Jingting Li, Lin Ling, Chee-Wei Tan, Blending Peer Instruction with Just-In-Time Teaching: Jointly Optimal Task Scheduling with Feedback for Classroom Flipping, ACM Learning at Scale 2021

(5) Health Informatics for Epidemics/Infodemics

The rapid spread of infectious diseases and online rumors, with similar contagion patterns, has traditionally been studied separately. The COVID-19 pandemic revealed the devastating consequences of simultaneous epidemics and misinformation crises. Our research explores leveraging network science, big data analytics, and machine learning for large-scale challenges like digital contact tracing and Infodemic risk management using COVID-19 data [1-4]. Additionally, our algorithms have broader applications against future epidemics, including the more dangerous Disease X.

[1] C. W. Tan and P.-D. Yu: Contagion Source Detection in Epidemic and Infodemic Outbreaks: Mathematical Analysis and Network Algorithms, Foundations and Trends in Networking, Vol. 13, No. 2-3, pp. 107-251, 2023.
[2] P. Yu, C. W. Tan and H. Fu, Epidemic Source Detection in Contact Tracing Networks: Epidemic Centrality in Graphs and Message-Passing Algorithms, IEEE Journal of Selected Topics in Signal Processing, Vol. 16, No. 2, pp. 234-249, 2022 (COVID-19 special issue).
[3] Z. Fei, Y. Ryeznik, O. Sverdlov, C. W. Tan and W. K. Wong, An Overview of Healthcare Data Analytics with Applications to the COVID-19 Pandemic, IEEE Trans. on Big Data, Vol. 8, No. 6, pp. 1463-1480, 2022.
[4] C. Hang, P.-D. Yu, S. Chen, C. W. Tan, G. Chen, MEGA: Machine Learning-Enhanced Graph Analytics for Infodemic Risk Management. IEEE Journal of Biomedical and Health Informatics, Vol. 27, No. 12, pp. 6100-6111, 2023.

(6) Automated Reasoning by Optimization in Information Theory

Information-theoretic problems often possess surprisingly beautiful structures associated with optimization theory that shed insights to designing optimal systems. In [1-2], we propose a novel idea of Automated Reasoning by Convex Optimization to automate the process of proving or disproving large-scale information inequalities, pushing the limits of knowledge-discovery by mathematical optimization and cloud computing. Applications include determining fundamental limits of interference channel and secured diversity coding systems [3-4]. Our AITIP open-source code was featured in the IEEE Information Theory Society Newsletter in June 2020.

[1] S. Ho, L. Ling, C. W. Tan and R. W. Yeung, Proving and Disproving Information Inequalities: Theory and Scalable Algorithms, IEEE Transactions on Information Theory, Vol. 66, No. 9, pp. 5522-5536, 2020 (AITIP Open-source Code)
[2] S. W. Ho, C. W. Tan and R. W. Yeung, Proving and Disproving Information Inequalities, Proc. of IEEE Symp. of Information Theory, 2014.
[3] C. Li, X. Guang, C. W. Tan and R. W. Yeung, Fundamental Limits on a Class of Secure Asymmetric Multilevel Diversity Coding Systems, IEEE Journal on Selected Areas in Communications, Vol. 36, No. 4, pp. 737-747, 2018.
[4] Y. Zhao, C. W. Tan, A. S. Avestimehr, S. N. Diggavi and G. J. Pottie, On the Maximum Achievable Sum-rate with Successive Decoding in Interference Channels, IEEE Transactions on Information Theory, Vol. 58, No. 6, pp. 3798-3820, 2012.

(7) Computing at Scale: How to Solve Large Problems Faster

The increasing digitalization in many aspects of life leads to a proliferation of data analytics. At the same time, the emergence of cloud computing enables economics of scale to distributed computation. We are currently developing the mathematical theories of edge computing driven by supply and demand of big data analytics. In [1-3], we design algorithms for users to bid for high-value and low-cost computation resources in online secondary markets like virtual ISPs and Amazon Web Services real-time spot pricing. In [4], we develop novel scalable graph algorithms based on MapReduce as part of the DARPA-MIT-Amazon Graph Challenge 2018. We are currently studying the interplay between economics and large-scale computing with applications to scientific machine learning and AI applications. Our MapReduce/Hadoop source code for ACM SIGCOMM 2015 and ACM SIGMETRICS 2016 is available on GitHub as open source.

[1] S. Liu, C. Joe-Wong, J. Chen, C. Brinton, C. W. Tan, L. Zheng, Economic Viability of a Virtual ISP, IEEE/ACM Transactions on Networking, Vol. 28, No. 2, pp. 902-916, 2020.
[2] L. Zheng, C. Joe-Wong, C. W. Tan, M. Chiang and X. Wang, How to Bid the Cloud?, ACM SIGCOMM 2015.
[3] L. Zheng, C. Joe-Wong, C. Brinton, C. W. Tan, S. Ha and M. Chiang, On the Viability of a Cloud Virtual Service Provider, ACM SIGMETRICS 2016.
[4] C. Kuo, C. N. Hang, P. Yu and C. W. Tan, Parallel Counting of Triangles in Large Graphs: Pruning and Hierarchical Clustering Algorithms, IEEE High Performance Extreme Computing, MIT-Amazon Graph Challenge 2018 (Honorable Mention).

(8) Network Optimization by Perron-Frobenius Theory

A basic question in wireless networking is how to design optimal resource allocation schemes to maximize the network utility and to guarantee fairness when there is interference. We make advances in the development of nonlinear Perron-Frobenius theory to overcome the notorious non-convexity barriers in wireless utility maximization problems in a series of work [1-5]. Our approach was able to solve several previously open issues in the literature, e.g., Kandukuri and Boyd (TWC 2002). More generally, the nonlinear Perron-Frobenius theory can address the solvability of a broad class of non-convex optimization problems, and guide the design of computationally-fast algorithms for optimal resource allocation and interference management in wireless networks (an overview survey in the monograph [1]). We have developed algorithms for various wireless network optimization problems, including: max-min SINR optimization (2009 INFOCOM/2013 ToN), reliability fairness optimization under stochastic fading (2011 INFOCOM/2015 ToN), and sum rate maximization using outer approximation (2011 JSAC) and convex relaxation (2013 JSAC). These algorithms are based on nonlinear Perron-Frobenius theory and global optimization techniques. Matlab code is available at the provided links.

[1] C. W. Tan, Wireless Network Optimization by Perron-Frobenius Theory, Foundations and Trends in Networking, Now Publishers, 2015.
[2] C. W. Tan, Optimal Power Control in Rayleigh-fading Heterogeneous Wireless Networks, IEEE/ACM Transactions on Networking, Vol. 24, No. 2, pp. 940-953, 2016.
[3] C. W. Tan, M. Chiang and R. Srikant, Fast Algorithms and Performance Bounds for Sum Rate Maximization in Wireless Networks, IEEE/ACM Transactions on Networking, Vol. 21, No. 3, pp. 706-719, 2013.
[4] L. Zheng and C. W. Tan, Cognitive Radio Network Duality and Algorithms for Utility Maximization, IEEE Journal on Selected Areas in Communications, Vol. 31, No. 3, pp. 500-513, 2013.
[5] C. W. Tan, S. Friedland and S. H. Low, Nonnegative Matrix Inequalities and Their Application to Nonconvex Power Control Optimization, SIAM Journal on Matrix Analysis and Applications, Vol. 32, No. 3, pp. 1030-1055, 2011.

(9) Next Generation Multiple Access (NGMA)

We have developed novel signal processing and resource allocation algorithms in wireless networks. A novel space-time communication scheme was proposed in [1] using quaternion vectors as building blocks to create robust energy-efficient transmissions. The idea is to select the best code from a family of codes induced by the geometry and statistics of quaternion vectors (code diversity) (link). Energy-efficient resource allocation algorithms in Cognitive Radio Networks were studied using convex optimization and iterative algorithm tools in [2-3] and also in a recently-edited book.

[1] C. W. Tan and A. R. Calderbank, Multiuser Detection of Alamouti Signals, IEEE Transactions on Communications, Vol. 57, No. 7, pp. 2080-2089, 2009.
[2] X. Zhai, L. Zheng and C. W. Tan, Energy-Infeasibility Tradeoff in Cognitive Radio Networks: Price-driven Spectrum Access Algorithms, IEEE Journal on Selected Areas in Communications, Vol. 32, No. 3, pp. 528-538, 2014.
[3] C. W. Tan, D. P. Palomar and M. Chiang, Energy-Robustness Tradeoff in Cellular Network Power Control, IEEE/ACM Transactions on Networking, Vol. 17, No. 3, pp. 912-925, 2009.

(10) Network Utility Maximization (NUM)

A fundamental question in Internet TCP/IP engineering is: “What is the cost of routing TCP flows over a single path or multiple routes?” The answer can guide the decision on whether to support multipath routing –an expensive operation– in the Internet. We show that optimal routing can in fact be achieved using a randomized TCP/IP algorithm to load-balance large proportion of Internet traffic with only a small group of users sparsely routing over more than one path [2]. Our work in [1, 2] lend theoretical evidence that the current single-path routing, e.g., the OSPF Internet protocol, can be improvised to scale up for multipath routing in next-generation software-defined networks.

[1] M. Wang, C. W. Tan, W. Xu and A. Tang, Cost of Not Splitting in Routing: Characterization and Estimation, IEEE/ACM Transactions on Networking, Vol. 19, No. 6, pp. 1849-1859, 2011.
[2] Y. Bi, C. W. Tan and A. Tang, Network Utility Maximization with Path Cardinality Constraints, IEEE INFOCOM, 2016.

(11) Power Network Optimization

The power network is experiencing an architectural transformation from transmission to new Smart Grid services. How to move energy through space and time to optimize Smart Grid using electric vehicles (EVs) as mobile batteries that can load-balance energy usage. In [1-3], we study the optimal power flow problem and EV charging optimization using data from a Southern California utility. Also, see article in UGC Research Frontiers.

[1] C. W. Tan, D. W. H. Cai, and Lou, Resistive Network Optimal Power Flow: Uniqueness and Algorithms, IEEE Transactions on Power Systems, Vol. 30, No. 1, 2015.
[2] N. Chen, C. W. Tan and T. Quek, Optimal Charging of Electric Vehicles in Smart Grid: Characterization and Valley-filling Algorithms, IEEE Journal of Selected Topics in Signal Processing, Vol. 8, No. 6, pp. 1073-1083, 2014.
[3] X. Lou and C. W. Tan, Optimization Decomposition of Resistive Power Networks with Energy Storage, IEEE Journal on Selected Areas in Communications, 2014.

(12) EuclidNet: Deep Visual Reasoning for Constructible Problems in Geometry

Research Paper