Facebook. Seattle, WA
07/2020 - Present. Engineering Manager. Ads Delivery Infra - Ads ML Data & Training Infra.
I am leading teams of 25+ software engineers, research scientists, and engineering managers, responsible for core machine learning backend services powering Facebook Ads Delivery. Areas include ads training data infrastructure, ads data warehouse, and ads realtime model training infrastructure.
- Drive the team vision and strategy to build state-of-the-art realtime ML data + training platform, in order to enable more incremental Facebook revenue across all products (Facebook, Instagram, Shops, etc.).
- Build an effective organization of strong engineers, scientists, and managers to tackle the challenges of powering revenue growth with scaling our ML data + realtime training systems.
- Deliver double-digit cumulative improvement in incremental Facebook revenue by our teams' work.
- Improve the revenue stability from the angle of ML data and model quality.
07/2018 - 06/2020. Staff Software Engineer / Engineering Manager. Ads Delivery Infra - Ads Training Data Infra.
I led a team of 10+ software engineers, responsible for ads machine learning training data infrastructure and ads data warehouse.
- Drove the team vision and strategy to scale the ads training data platform, in order to meet the exponential ML demands for incremental Facebook revenue.
- Shipped a new generation of training data systems to bring >10X uplift of efficiency and make all data generation 100% realtime.
- Led company-wide initiatives in data warehouse solutions to powering machine learning workloads efficiently and reliably.
- Delivered double-digit cumulative cut in Facebook cost of revenue.
- Grew strong engineers to innovate and lead the exiciting confluence of machine learning and distributed systems.
- Created the machine learning infra hiring pipeline in the Seattle area.
07/2017 - 06/2018. Senior Software Engineer and Tech Lead. Ads Core ML Team - Ads Ranking Infra.
I led and mentored a team of 6-8 engineers responsible for the training data infrastructure, powering model training for ads ranking.
- Drove roadmap, directions, and prioritization for building the new generation of ads training data platform to scale up the ads' end-to-end machine-learning model training systems
- Designed and built the framework to create long-aggregation features in order to capture the ranking signals in a longer history of ads data, delivering +1% incremental Facebook revenue.
- Built asynchronous model selection logic in core ads ranking business logic flow, cutting +3% latency in serving relevant ads on Facebook
- Orchestrated the migration of training data pipelines at PB level with 5+ machine learning engineers onto a new training data service, which resulted in +2% incremental Facebook revenue.
- Supported 5+ ads product teams and 20+ machine learning engineers for training data needs.
- Ranked among the top 5% of senior engineers at the company based on performance reviews.
08/2016 - 06/2017. Software Engineer. Ads Core ML Team - Ads Ranking Infra.
- Built the unified calculation of ads' bidding prices in mobile news feed for ads ranking products, enabling easier customization and faster development of new ads products for our machine learning engineers.
- Designed and built a new offline-data-based ads training data joiner to enable long-attribution-window ads conversion ranking products, which account for >10% of Facebook ads revenue.
- Built and launched the shadow-traffic-based testing framework for the ads training data stream joining service, shortening the develop/test cycle of this service to <10 minutes.
VMware. Cambridge, MA
06/2015 - 08/2016. Senior Software Engineer. Cloud Foundation.
I worked in the VMware Cloud Foundation (a.k.a. EVO Software-Defined Datacenter) team. Our team works on building a scalable, datacenter-wide distributed management system to deliver on the promise of Software-defined Datacenter (SDDC) for private/hybrid cloud.
- Project: Platform as a Service (PaaS) for Hybrid Cloud (11/2015 - 08/2016)
- Tech lead on designing and developing a PaaS system for enterprises in hybrid-cloud environment.
- Built the PaaS system to automatically deploy and manage virtual machines and Docker container applications across EVO-based private cloud and Amazon Web Services.
- Project: Automatic Server Provisioning in Datacenters (06/2015 - 05/2016)
- Built software to automatically discover, bootstrap, and provision servers for datacenter expansion.
- Shortened the server addition process from several days to <20 minutes.
- Released the software as part of VMware Cloud Foundation software suite.
Microsoft. Redmond, WA
06/2012 - 05/2014. Software Engineer (in absentia from Princeton University). Azure Cloud Networking Team (a.k.a., Bing Autopilot Team).
I was in absentia from Princeton University to work on degree-related works in Microsoft Azure for two years, in a collaboration between the Azure networking team (a.k.a. Bing Autopilot) and our research group in Princeton.
- Project: Network-State Management Service (03/2013 - 05/2014)
- Designed and built a network operating system for automatic and safe infrastructure management.
- Deployed the system in over 10 Azure datacenters across the globe, managing over 10 thousand devices.
- Reduced the development cost of management applications by 5 times.
- The system is now the foundation layer of Azure's networking services to its customers (part of Azure Service Fabric).
- Project: Automatic Latency Diagnosis in Multi-tier Web Services (06/2012 - 02/2013)
- Designed and built a self-adapting real-time diagnosis system for high search-query latencies.
- Integrated into Bing's index service, document service, and internal big-data analysis platform.
- Accurately pinpointed the services and servers that caused high search latencies in several incidents.
- Contributed to the improvement of user-experiencing latency of Bing.
Princeton University. Princeton, NJ.
12/2010 - 05/2012 & 06/2014 - 05/2015. Research Assistant & PhD student. Prof. Jennifer Rexford's Group.
- Project: Sprite: Scalable Programmable Inbound Traffic Engineering (06/2014 - 05/2015)
- Designed and built a scalable distributed system to directly control the entrance point of traffic from cloud services to enterprises in per-connection granularity.
- In collaboration with and being deployed on the campus network of Princeton University.
- Project: HONE: Programmable Host-Network Traffic Management (06/2011 - 05/2012)
- Realized more effective datacenter-traffic management by extending to include Linux network stack.
- Integrated with Verizon/Overture Networks's products for business cloud services at the edge.
- Project: Identifying Performance Bottlenecks in CDNs through TCP-Level Monitoring (12/2010 - 05/2011)
- Built a TCP profiler in Linux network stack for performance diagnosis of web services.
- Deployed in the Coral content distribution network.
- Improved the CDN's overall latency by 10% by locating and resolving its bottlenecks.
Microsoft Research Asia. Beijing, China.
12/2009 - 03/2010. Research Intern. Datacenter Network Group of Dr. Chuanxiong Guo.
- Project: Virtualized Datacenter Network
- Built the control module in a virtualized data center network architecture with bandwidth guarantees.
- Carried out experiments in datacenters to validate link failure handling and adjustment of bandwidth guarantees.
University of Southern California. Los Angeles, CA.
07/2009 - 08/2009. Research Intern. Prof. Urbashi Mitra's Group.
- Project: Underwater Acoustic Communication System
- Designed a channel estimator to improve underwater acoustic wireless communication.
- Deployed and tested in a Navy testbed in Southern California.