• Recent and Selected Research Projects

The Internet of Things (IoT) devices have been increasingly deployed in smart homes for automation. Unfortunately, extensive recent research shows that external on-path adversaries can infer and fingerprint user sensitive in-home activities by analyzing IoT network traffic rates alone. Most recent traffic padding-based defending approaches cannot sufficiently protect user privacy with reasonable traffic overhead. In addition, these approaches typically assume the installation of additional hub hardware in smart homes to host their traffic padding-based defending approaches.

To address these problems, we design a new open-source traffic reshaping system—privacy as a router operating system service (PAROS) that enables smart home users to significantly reduce private information leaked through IoT network traffic rates. PAROS does not assume the installation of any additional hardware device. We evaluate PAROS on open-source router Operating System (OS)—OpenWrt enabled virtual machine and also two real best-selling home routers. We find that PAROS can effectively prevent a wide range of state-of-the-art adversarial machine learning-based user in-home activity inference attacks, with near-zero system overhead increasing.

[ICCCN’23] PAROS: The Missing “Puzzle” in Smart Home Router Operating Systems.
Keyang Yu, Dong Chen.
In Proc. of the 32nd International Conference on Computer Communications and Networks (ICCCN 2023), July 24 – July 27, 2023, Waikiki Beach, Honolulu, HI, USA. Acceptance Rate = 30.38%.

People have been increasingly deploying the Internet of Things (IoT) devices in smart homes to monitor and control their environments. Unfortunately, extensive recent research has shown that IoT devices are vulnerable to multiple adversarial attacks, which analyze their network traffic to reveal a wide range of sensitive private information about user in-home activities. Thus, smart home users recently have a keen interest in employing virtual private networks (VPN) to obscure their privacy information in their IoT network traffic. Our key insight is that VPN-encrypted IoT network traffic data is not anonymous, since this aggregate traffic data can still be disaggregated into individual IoT device traffic data. And this individual IoT device traffic may have an identifiable traffic signature that already embeds detailed user sensitive information.

To explore the severity and extent of this privacy threat, we design a new factorial hidden Markov model (FHMM)-based smart home network traffic disaggregator—TrafficSpy that can accurately disaggregate VPN-encrypted whole-house IoT network traffic data into individual IoT device network traffic data. We evaluate TrafficSpy using VPN network traffic data from three smart homes. We find that TrafficSpy can disaggregate VPN traffic data into individual IoT device data accurately. We also show that the disaggregated traffic traces can be further attacked by smart and adaptive adversaries and thus reveal user sensitive information. TrafficSpy represents a serious privacy threat, but also a potentially useful tool that provides important contextual information for smart home monitoring and automation.

[CNS’22] TrafficSpy: Disaggregating VPN-encrypted IoT Network Traffic for User Privacy Inference.
Qi Li, Keyang Yu, Dong Chen, Mo Sha and Long Cheng.
In Proc. of the 10th IEEE Conference on Communications and Network Security (CNS 2022), 3-5 October 2022, Austin, Texas, USA. Acceptance Rate = 35.25%.

The Internet of Things (IoT) devices have been increasingly deployed in smart homes and smart buildings to monitor and control their environments. The Internet traffic data produced by these IoT devices are collected by Internet Service Providers (ISPs) and IoT device manufacturers, and often shared with third-parties to maintain and enhance user services. Unfortunately, extensive recent research has shown that on-path adversaries can infer and fingerprint users’ sensitive privacy information such as occupancy and user in-home activities by analyzing IoT network traffic traces. Most recent approaches that aim at defending against these malicious IoT traffic analytics can not sufficiently protect user privacy with reasonable traffic overhead. In particular, many approaches did not consider practical limitations, e.g., network bandwidth, maximum package injection rate or actual user in-home behavior in their design.

To address this problem, we design a new low-cost, open-source user “tunable” defense system—PrivacyGuard that enables users to significantly reduce the private information leaked through IoT device network traffic data, while still permitting sophisticated data analytics or control that is necessary in smart home management. In essence, our approach employs intelligent deep convolutional generative adversarial networks (DCGANs)-based IoT device traffic signature learning, long short-term memory (LSTM)-based artificial traffic signature injection, and partial traffic reshaping to obfuscate private information that can be observed in IoT device traffic traces. We evaluate PrivacyGuard using IoT network traffic traces of 31 IoT devices from 5 smart homes. We find that PrivacyGuard can effectively prevent a wide range of state-of-the-art machine learning-based and deep learning-based occupancy and other 9 user in-home activity detection attacks. We release the source code and datasets of PrivacyGuard to IoT research community.

[IPSN’21] PrivacyGuard: Enhancing Smart Home User Privacy.
Keyang Yu, Qi Li, Dong Chen, Mohammad Rahmann, and Shiqiang Wang.
In Proc. of the 20th ACM/IEEE International Conference on Information Processing in Sensor Networks, IPSN’21, May 18–21, 2021, Nashville, TN, USA. Acceptance Rate = 24.76%. (Source Code and Data)

[ICDCS’18] Private Memoirs of IoT Devices: Safeguarding User Privacy in the IoT Era.
Dong Chen, Phuthipong Bovornkeeratiroj, David Irwin and Prashant Shenoy.
In Proc. of the 38th IEEE International Conference on Distributed Computing Systems (ICDCS’18), July 2 – 5, 2018, Vienna, Austria.

    The Internet of Things (IoT) has been erupting the world widely over the decade. Smart home and smart building owners are increasingly deploying IoT devices to monitor and control their environments due to the rapid decline in the price of IoT devices. The recent intensive research has shown that network traffic traces of IoT devices have significant cybersecurity and privacy issues. These security and privacy defending techniques have enabled sophisticated approaches to ensure security and preserve user privacy. However, due to the fact that different approaches are evaluated using their own datasets, their own developed security and privacy attack models, and their own evaluating metrics, it is being significantly difficult to make a fair and comprehensive comparisons among different IoT security strengthening and user privacy preserving research to better understand IoT security issues and end-user benefits.

    To address this problem, we present a deep learning-based adversarial attack model framework-SmartAttack, which enables a set of sophisticated adversarial attack models that can be leveraged by researchers and industrial users from IoT security community to better evaluate their work. In essence, we leverage the most widely used unsupervised machine learning and deep learning models to design and implement these attack models. SmartAttack also provides user options to select the detailed configuration for each attack model, such as kernel, dataset splitting, cross-validation states, and evaluating metrics. We also evaluate the performance of SmartAttack using two different datasets. In addition, we made the source codes and the related datasets of SmartAttack publicly-available on our research website such that researchers can use our SmartAttack to benchmark their security strengthening and privacy-preserving approaches.

     

    [BigDataCPS’20] SmartAttack: Open-source Attack Models for Enabling Security Research in Smart Homes.
    Kengyang Yu, Dong Chen.
    In Proceedings of the 2nd IEEE International Workshop on Big Data Analytics of Cyber-Physical Systems (BigDataCPS’20), Oct 19, 2020, co-located with IGSC’20.

    [Milcom’19] IoTSpot: Identifying the IoT Devices Using Their Anonymous Network Traffic Data.
    Liangdong Deng, Yuzhou Feng, Dong Chen, and Naphtali Rishe.

     

       

      Electric utilities are rapidly deploying smart meters that record and transmit electricity usage in real-time. As prior research shows, smart meter data indirectly leaks sensitive, and potentially valuable, information about a home’s activities. An important example of the sensitive information smart meters reveal is occupancy-whether or not someone is home and when. As prior work also shows, occupancy is surprisingly easy to detect, since it highly correlates with simple statistical metrics, such as power’s mean, variance, and range. Unfortunately, prior research that uses chemical energy storage, e.g., batteries, to prevent appliance power signature detection is prohibitively expensive when applied to occupancy detection.

      To address this problem, we propose preventing occupancy detection using the thermal energy storage of large elastic heating loads already present in many homes, such as electric water and space heaters. In essence, our approach, which we call Combined Heat and Privacy (CHPr), controls the power usage of these large loads to make it look like someone is always home. We design a CHPr-enabled water heater that regulates its energy usage to mask occupancy without violating its objective, e.g., to provide hot water on demand, and evaluate it in simulation and using a prototype. Our results show that a 50-gallon CHPr-enabled water heater decreases the Matthews Correlation Coefficient (a standard measure of a binary classifier’s performance) of a threshold-based occupancy detection attack in a representative home by 10x (from 0.44 to 0.045), effectively preventing occupancy detection at no extra cost.

       

      Homeowners are increasingly deploying grid-tied solar systems due to the rapid decline in solar module prices. The energy produced by these solar-powered homes is monitored by utilities and third parties using networked energy meters, which record and transmit energy data at fine-grained intervals. Such energy data is considered anonymous if it is not associated with identifying account information, e.g., a name and address. Thus, energy data from these “anonymous” homes is often not handled securely: it is routinely transmitted over the Internet in plaintext, stored unencrypted in the cloud, shared with third-party energy analytics companies, and even made publicly available over the Internet. Extensive prior work has shown that energy consumption data is vulnerable to multiple attacks, which analyze it to reveal a range of sensitive private information about occupant activities. However, these attacks are useless without knowledge of a home’s location.

      Our key insight is that solar energy data is not anonymous: since every location on Earth has a unique solar signature, it embeds detailed location information. To explore the severity and extent of this privacy threat, we design SunSpot to localize “anonymous” solar-powered homes using their solar energy data. We evaluate SunSpot on publicly-available energy data from 14 homes with rooftop solar. We find that SunSpot is able to localize a solar-powered home to a small region of interest that is near the smallest possible area given the energy data resolution, e.g., within a ~500m and ~28km radius for per-second and per-minute resolution, respectively. SunSpot then identifies solar-powered homes within this region using crowd-sourced image processing of satellite data before applying additional filters to identify a specific home.

      [BuildSys’16] SunSpot: Exposing the Location of Anonymous Solar-powered Homes.
      Dong Chen, Srinivasan Iyengar, David Irwin, Prashant Shenoy.
      In Proceedings of the 2016 ACM International Conference on Systems for Energy-Efficient Built Environments (BuildSys’16), Stanford, CA, USA, 2016. Accept Rate = 24.48%.

      Smart energy meters record electricity consumption and generation at fine-grained intervals, and are among the most widely deployed sensors in the world. Energy data embeds detailed information about a building’s energy-efficiency, as well as the behavior of its occupants, which academia and industry are actively working to extract. In many cases, either inadvertently or by design, these third-parties only have access to anonymous energy data without an associated location. The location of energy data is highly useful and highly sensitive information: it can provide important contextual information to improve big data analytics or interpret their results, but it can also enable third-parties to link private behavior derived from energy data with a particular location.

      In this paper, we present Weatherman, which leverages a suite of analytics techniques to localize the source of anonymous energy data. Our key insight is that energy consumption data, as well as wind and solar generation data, largely correlates with weather, e.g., temperature, wind speed, and cloud cover, and that every location on Earth has a distinct weather signature that uniquely identifies it. Weatherman represents a serious privacy threat, but also a potentially useful tool for researchers working with anonymous smart meter data. We evaluate Weatherman’s potential in both areas by localizing data from over one hundred smart meters using a weather database that includes data from over 35,000 locations. Our results show that Weatherman localizes coarse (one-hour resolution) energy consumption, wind, and solar data to within 16.68km, 9.84km, and 5.12km, respectively, on average, which is more accurate using much coarser resolution data than prior work on localizing only anonymous solar data using solar signatures.

      [BigData’17] Weatherman: Exposing Weather-based Privacy Threats in Big Energy Data.
      Dong Chen, David Irwin.
      In Proceedings of 2017 IEEE International Conference on Big Data (BigData’17), Boston, MA, USA, Dec 11-14, 2017. Accept Rate = 87/437 = 20%.

      Distributed solar energy resources (DSERs) in smart grid systems are rapidly increasing due to the steep decline in solar module prices. This DSER penetration has prompted utilities to balance the real-time supply and demand of electricity proactively. A direct consequence of this is virtual power plants (VPPs) that enable solar generated energy trading to mitigate the impact of the intermittent DSERs while also benefiting from distributed generation for more reliable and profitable grid management. However, existing energy trading approaches in residential VPPs do not actually allow DSER users to trade their surplus solar energy independently and concurrently to maximize benefit potential; they typically require a trusted third-party to play the role of an online middleman. Furthermore, due to a lack of fair trading algorithms, these approaches do not necessarily result in “fair” solar energy saving among all the VPP users in the long term.

      We propose Sola/Trader, a new solar energy trading system that enables unsupervised, distributed, and long term fair solar energy trading in residential VPPs. In essence, SolarTrader leverages a new multi-agent deep reinforcement learning approach that enables peer-to-peer solar energy trading among different DSERs to ensure that both the DSER users and the VPPs maximize benefit. We implement SolarTrader and evaluate it using both synthetic and real smart meter data from 4 U.S. residential VPP communities that are comprised of ~229 residential DSERs in total. Our results show that SolarTrader can reduce the aggregated VPP energy consumption by 83.8% when compared against a non-trading approach. Furthermore, SolarTrader achieves a ~105% average saving in VPP residents’ monthly electricity cost. We also find that SolarTrader-enabled VPPs can achieve a fairness of 0.05, as measured by the Gini Coefficient, a level equivalent to that achieved by the fairness-maximizing Round-Robin approach.

       

      [BuildSys’20] SolarTrader: Enabling Distributed Solar Energy Trading in Residential Virtual Power Plants.
      Yuzhou Feng, Qi Li, Dong Chen, and Raju Rangaswami.
      In Proc. of the 7th ACM International Conference on Systems for Energy-Efficient Built Environments (BuildSys 2020), Acceptance Rate = 24.3%, November 18–20, 2020, Virtual Event, Japan. (Source Code and Data). The Best Paper Award at ACM BuildSys’20.

      Homeowners are increasingly deploying rooftop solar photovoltaic (PV) arrays due to the rapid decline in solar module prices. However, homeowners may have to spend up to ∼$375 to diagnose their damaged rooftop solar PV system. Thus, recently, there is a rising interest to inspect potential damage on solar PV arrays automatically and passively. Unfortunately, recent approaches that leverage machine learning techniques have the limitation of distinguishing solar PV array damages from other solar degradation (e.g., shading, dust, snow).

      To address this problem, we design a new system—SolarDiagnostics that can automatically detect and profile damages on rooftop solar PV arrays using their rooftop images with a lower cost. In essence, SolarDiagnostics first leverages an K-Means algorithm to isolate rooftop objects to extract solar panel residing contours. Then, SolarDiagnostics employs a convolutional neural networks to accurately identify and characterize the damage on each solar panel residing contour. We evaluate SolarDiagnostics by building a lower cost prototype and using 60,000 damaged solar PV array images generated by deep convolutional generative adversarial networks. We find that SolarDiagnostics is able to detect damaged solar PV arrays with a Matthews correlation coefficient (MCC) of 1.0. In addition, pre-trained SolarDiagnostics yields an MCC of 0.95, which is significantly better than other re-trained machine learning-based approaches and yields as the similar MCC as of re-trained SolarDiagnostics. We make the source code and datasets that we use to build and evaluate SolarDiagnostics publicly-available.

       

      Smart cities, utilities, third-parties, and government agencies are having pressure on managing stochastic power generation from distributed rooftop solar photovoltaic (PV) arrays, such as predicting and reacting to the variations in electric grid. Recently, there is a rising interest to identify solar PV arrays automatically and passively. Traditional approaches such as online assessment and utilities interconnection filings are time consuming and costly, and limited in geospatial resolution, and thus do not scale up to every location. Significant recent work focuses on using aerial imagery to train machine learning or deep learning models to automatically detect solar PV arrays. Unfortunately, these approaches typically require Very High Resolution (VHR) images and human handcrafted solar PV array templates for training, which have a minimum cost of $15 per km 2 and are not always available at every location.To address the problem, we design a new system—SolarFinder that can automatically detect distributed solar PV arrays in a given geospatial region without any extra cost. SolarFinder first automatically fetches regular resolution satellite images within the region using publicly-available imagery APIs. Then, SolarFinder leverages multi-dimensional K-means algorithm to automatically segment solar arrays on rooftop images. Eventually, SolarFinder employs hybrid linear regression approach that integrates support vector machine (SVM) modeling with a deep convolutional neural networks (CNNs) approach to accurately identify solar PV arrays and characterize each solar deployment. We evaluate SolarFinder using 269,632 satellite images that include 1,143,636 contours from 13 geospatial regions in U.S. We find that pre-trained SolarFinder yields a Matthews Correlation Coefficient (MCC) of 0.17, which is 3 times better than the most recent pre-trained CNNs approach and the same as a re-trained CNNs approach.

       

      [IPSN’20] SolarFinder: Automatic Detection of Solar Photovoltaic Arrays.
      Qi Li, Yuzhou Feng, Yuyang Leng, and Dong Chen.
      In Proc. of the 19th ACM/IEEE International Conference on Information Processing in Sensor Networks, IPSN’20, April 21-24, 2020, Sydney, Australia, Acceptance Rate = 21.33%. (Source Code and Data)