Henry Tan

Henry Tan

Greater Seattle Area
2K followers 500+ connections

About

Working on the coolest LLM technology in one of the most famous Search engine on the…

Activity

Join now to see all activity

Experience

  • Amazon Graphic

    Amazon

    Seattle, Washington, United States

  • -

  • -

    Mountain View, California

  • -

    Singapore

  • -

    Mountain View, California, United States

  • -

  • -

  • -

    Seattle/Cuppertino

  • -

  • -

Education

  • University of Technology Sydney Graphic

    University of Technology Sydney

    -

    Graduated on March 2008, with thesis entitled: "Tree Model Guided (TMG) Enumeration as the Basis for Mining Frequent Patterns from XML Documents"

    1 year leave of absence in 2006-2007 while working @ Microsoft

  • -

  • -

    Obtained his Bachelor of Computer System Engineering with first class honour from La Trobe University, VIC, Australia in 2003. During his undergraduate study, he was nominated as the most outstanding Honours Student in Computer Science. Additionally, he was the holder of 2003 ACS Student Award

  • Received acceptance to prestigious Oxford EMBA program 2024 with Director Awards scholarship.

  • -

    Activities and Societies: Musics club, science club, OSIS

    SMAK 1 BPK PENABUR Jakarta (also known as SMUK 1, SMAK 1, or SMA Kristen 1 PENABUR - Jakarta, nicknamed "Smukie") is a private Protestant high school in Jakarta, Indonesia. It is located in Tanjung Duren, a financial and residential district in West Jakarta.

    SMAK 1 is considered one of the most prestigious high schools in Indonesia. It sends its students to local and international competitions, most notably the International Science Olympiads.[1] In 2007 then-student Jonathan Pradana…

    SMAK 1 BPK PENABUR Jakarta (also known as SMUK 1, SMAK 1, or SMA Kristen 1 PENABUR - Jakarta, nicknamed "Smukie") is a private Protestant high school in Jakarta, Indonesia. It is located in Tanjung Duren, a financial and residential district in West Jakarta.

    SMAK 1 is considered one of the most prestigious high schools in Indonesia. It sends its students to local and international competitions, most notably the International Science Olympiads.[1] In 2007 then-student Jonathan Pradana Mailoa (now studying at MIT) won the International Physics Olympiad in Singapore.

    SMAK 1 is one of 50 schools in Jakarta, Banten, Lampung and West Java managed by BPK PENABUR, a Christian-based organization.

    http://en.wikipedia.org/wiki/SMUK_1_Jakarta

Volunteer Experience

  • Tech Activist

    Tech Activist in 2014 Indonesian Presidential Election (www.pilpres2014.org)

    - Present 9 years 11 months

    Politics

    A monumental day in Indonesia, when the Elections General Commission (KPU) will be announcing who’s the next president of Indonesia based on the official voting tally. In the two weeks since the vote took place, both candidates declared that they have won based on different quick count results, and neither of them are backing down from their claims today. Because of this, many people in the country have turned to tech, creating initiatives such as online crowdsourced vote counts that aim to…

    A monumental day in Indonesia, when the Elections General Commission (KPU) will be announcing who’s the next president of Indonesia based on the official voting tally. In the two weeks since the vote took place, both candidates declared that they have won based on different quick count results, and neither of them are backing down from their claims today. Because of this, many people in the country have turned to tech, creating initiatives such as online crowdsourced vote counts that aim to make the contested count more transparent.

    Volunteered as a tech activist for a good cause during Indonesian Election 2014 to build a cloud-based (Azure) pipeline to automate real-time monitoring of vote counting by building an E2E pipeline that can automatically crawl the Election Commissioner site and display the tally almost in real-time. I led an initiative to develop pilpres2014.org through a 24 hours hackaton and invited other developers that share the same mission to help building the site to crawl, collect and display the election vote counter in real-time. The site was created as an independent watchdog to guard the clean process of democracy.

    The project overview can be read from http://www.pilpres2014.org/AboutUs.html.

    About Us
    http://www.pilpres2014.org/AboutUs.html

    Press coverage by TechInAsia:
    https://www.techinasia.com/pilpres2014-open-source-indonesia-president-election-vote-counting-site/

    Wikipedia:
    http://id.wikipedia.org/wiki/Pemilihan_umum_Presiden_Indonesia_2014

    Press coverage by Kompas.com (Indonesian largest online media):
    http://tekno.kompas.com/read/2014/07/20/15310027/peneliti.microsoft.ikut.awasi.hitung.suara.pilpres.2014
    http://tekno.kompas.com/read/2014/07/20/15310027/peneliti.microsoft.ikut.awasi.hitung.suara.pilpres.2014
    http://tekno.kompas.com/read/2014/07/23/10405767/bikin.bangga.semangat.kolaborasi.teknologi.untuk.pilpres.2014

Publications

  • Maguro, a System for Indexing and Searching over Very Large Text Collections

    ACM (WSDM Conference)

    Maguro is a system for efficiently searching very large collections of text content of up to 1 trillion documents at low cost. Search engines span across content that is very dynamic and highly augmented with metadata to the tail content of the web. A long tail distribution of content calls for different trade-offs in the design space for good efficiency across the entire index range. Maguro is designed for the long tail of content with less dynamics and less metadata, but very good cost…

    Maguro is a system for efficiently searching very large collections of text content of up to 1 trillion documents at low cost. Search engines span across content that is very dynamic and highly augmented with metadata to the tail content of the web. A long tail distribution of content calls for different trade-offs in the design space for good efficiency across the entire index range. Maguro is designed for the long tail of content with less dynamics and less metadata, but very good cost efficiency. Maguro is part of the serving stack in Bing and allows us to scale the index significantly better.

    Other authors
    See publication
  • Mining of Data with Complex Structures

    Series: Studies in Computational Intelligence, Vol. 333

    The primary audience is 3rd year, 4th year undergraduate students, Masters and PhD students and academics. The book can be used for both teaching and research. The secondary audiences are practitioners in industry, business, commerce, government and consortiums, alliances and partnerships to learn how to introduce and efficiently make use of the techniques for mining of data with complex structures into their applications. The scope of the book is both theoretical and practical and as such it…

    The primary audience is 3rd year, 4th year undergraduate students, Masters and PhD students and academics. The book can be used for both teaching and research. The secondary audiences are practitioners in industry, business, commerce, government and consortiums, alliances and partnerships to learn how to introduce and efficiently make use of the techniques for mining of data with complex structures into their applications. The scope of the book is both theoretical and practical and as such it will reach a broad market both within academia and industry. In addition, its subject matter is a rapidly emerging field that is critical for efficient analysis of knowledge stored in various domains.

    Other authors
    See publication
  • Tree model guided candidate generation for mining frequent subtrees from XML documents

    Journal ACM Transactions on Knowledge Discovery from Data (TKDD) TKDD Homepage archive Volume 2 Issue 2, July 2008 Article No. 9 ACM New York, NY, USA

    Due to the inherent flexibilities in both structure and semantics, XML association rules mining faces few challenges, such as: a more complicated hierarchical data structure and ordered data context. Mining frequent patterns from XML documents can be recast as mining frequent tree structures from a database of XML documents. In this study, we model a database of XML documents as a database of rooted labeled ordered subtrees. In particular, we are mainly concerned with mining frequent induced…

    Due to the inherent flexibilities in both structure and semantics, XML association rules mining faces few challenges, such as: a more complicated hierarchical data structure and ordered data context. Mining frequent patterns from XML documents can be recast as mining frequent tree structures from a database of XML documents. In this study, we model a database of XML documents as a database of rooted labeled ordered subtrees. In particular, we are mainly concerned with mining frequent induced and embedded ordered subtrees. Our main contributions are as follows. We describe our unique embedding list representation of the tree structure, which enables efficient implementation of our Tree Model Guided (TMG) candidate generation. TMG is an optimal, nonredundant enumeration strategy that enumerates all the valid candidates that conform to the structural aspects of the data. We show through a mathematical model and experiments that TMG has better complexity compared to the commonly used join approach. In this article, we propose two algorithms, MB3-Miner and iMB3-Miner. MB3-Miner mines embedded subtrees. iMB3-Miner mines induced and/or embedded subtrees by using the maximum level of embedding constraint. Our experiments with both synthetic and real datasets against two well-known algorithms for mining induced and embedded subtrees, demonstrate the effectiveness and the efficiency of the proposed techniques.

    Other authors
    See publication
  • IMB3-Miner: mining induced/embedded subtrees by constraining the level of embedding

    Pacific-Asia Conference on Knowledge Discovery and Data Mining

    Tree mining has recently attracted a lot of interest in areas such as Bioinformatics, XML mining, Web mining, etc. We are mainly concerned with mining frequent induced and embedded subtrees. While more interesting patterns can be obtained when mining embedded subtrees, unfortunately mining such embedding relationships can be very costly. In this paper, we propose an efficient approach to tackle the complexity of mining embedded subtrees by utilizing a novel Embedding List representation, Tree…

    Tree mining has recently attracted a lot of interest in areas such as Bioinformatics, XML mining, Web mining, etc. We are mainly concerned with mining frequent induced and embedded subtrees. While more interesting patterns can be obtained when mining embedded subtrees, unfortunately mining such embedding relationships can be very costly. In this paper, we propose an efficient approach to tackle the complexity of mining embedded subtrees by utilizing a novel Embedding List representation, Tree Model Guided enumeration, and introducing the Level of Embedding constraint. Thus, when it is too costly to mine all frequent embedded subtrees, one can decrease the level of embedding constraint gradually up to 1, from which all the obtained frequent subtrees are induced subtrees. Our experiments with both synthetic and real datasets against two known algorithms for mining induced and embedded subtrees, FREQT and TreeMiner, demonstrate the effectiveness and the efficiency of the technique.

    Other authors
    See publication
  • MB3-Miner: Mining eMBedded subTREEs using tree model guided candidate generation

    Proceedings of the 1st International Workshop on Mining Complex Data (MCD’05), Houston, TX, USA, pp. 103-110.

Patents

  • DYNAMIC QUERY MASTER AGENT FOR QUERY EXECUTION

    Issued US 9195745

    A preliminary segment root and a final segment root are selected for each segment. Each time a search query is received, a set of nodes in each segment that will be used to resolve the search query is identified. A preliminary segment root is selected from the set of nodes. Based on statistical data from each node in the set of nodes indicating each node's capability to act as a final segment root that assembles query-execution data, the preliminary segment root algorithmically selects the…

    A preliminary segment root and a final segment root are selected for each segment. Each time a search query is received, a set of nodes in each segment that will be used to resolve the search query is identified. A preliminary segment root is selected from the set of nodes. Based on statistical data from each node in the set of nodes indicating each node's capability to act as a final segment root that assembles query-execution data, the preliminary segment root algorithmically selects the final segment root. The other nodes in the set of nodes are notified regarding the identity of the final segment root.

    Other inventors
    See patent
  • Processing data obtained from a presence-based system

    Issued US 11772111

    Functionality is described for collecting data from a presence-based system, such as an instant messaging system. The functionality can extract information from the collected data based on one or more rules. For instance, the functionality can identify presence data and/or message data that includes predetermined key words. The functionality can formulate result information based on the extracted information for presentation to a recipient. Based on these operations, the collected data supports…

    Functionality is described for collecting data from a presence-based system, such as an instant messaging system. The functionality can extract information from the collected data based on one or more rules. For instance, the functionality can identify presence data and/or message data that includes predetermined key words. The functionality can formulate result information based on the extracted information for presentation to a recipient. Based on these operations, the collected data supports a data-mining operation, as well as the traditional role of facilitating communication among the participants of the presence-based system. The result information can correspond to a report that presents aggregated findings, optionally organized into one or more demographic categories, or the result information can correspond to an advertisement, etc.

    Other inventors
    See patent

Projects

  • Multilayer Perceptron (Neural Network) as the basis for Gaming Motion Generation (Honours Thesis)

    The computer gaming industry is large and growing rapidly. Consumers demand realistic motion in computer games. Consequently, the ability to realistically simulate motion is vital in computer games development. Currently, most simulation of motion is done through the use of a physics engine which essentially involves numerically solution of the set of differential equations which describe the corresponding physics situation in the real world. This paper proposes an alternative strategy using a…

    The computer gaming industry is large and growing rapidly. Consumers demand realistic motion in computer games. Consequently, the ability to realistically simulate motion is vital in computer games development. Currently, most simulation of motion is done through the use of a physics engine which essentially involves numerically solution of the set of differential equations which describe the corresponding physics situation in the real world. This paper proposes an alternative strategy using a multilayer perceptron to generate these simulations. Benchmarking against traditional physics engines shows that there are considerable advantages to the new methodology.

    Other creators
    See project
  • Google Cloud Video Intelligence

    -

    Video Intelligence API has pre-trained machine learning models that automatically recognize a vast number of objects, places, and actions in stored and streaming video. It’s highly efficient for common use cases and improves over time as new concepts are introduced.

    See project
  • Project Adam v2.0

    -

    Large deep neural network models have recently demonstrated state-of-the-art accuracy on hard visual recognition tasks. Unfortunately such models are extremely time consuming to train and require large amount of compute cycles. We describe the design and implementation of a distributed system called Adam comprised of commodity server machines to train such models that exhibits world-class performance, scaling and task accuracy on visual recognition tasks. Adam achieves high efficiency and…

    Large deep neural network models have recently demonstrated state-of-the-art accuracy on hard visual recognition tasks. Unfortunately such models are extremely time consuming to train and require large amount of compute cycles. We describe the design and implementation of a distributed system called Adam comprised of commodity server machines to train such models that exhibits world-class performance, scaling and task accuracy on visual recognition tasks. Adam achieves high efficiency and scalability through whole system co-design that optimizes and balances workload computation and communication. We exploit asynchrony throughout the system to improve performance and show that it additionally improves the accuracy of trained models. Adam is significantly more efficient and scalable than was previously thought possible and used 30x fewer machines to train a large 2 billion connection model to 2x higher accuracy in comparable time on the ImageNet 22,000 category image classification task than the system that previously held the record for this benchmark. We also show that task accuracy improves with larger models. Our results provide compelling evidence that a distributed systems-driven approach to deep learning using current training algorithms is worth pursuing.

    See project
  • Tree Model Guided (TMG) Enumeration as the Basis for Mining Frequent Patterns from XML Documents

    -

    Association mining consists of two important problems, namely frequent patterns
    discovery and rule construction. Because of its importance and application in a number of data mining
    tasks, it has become the focus of many studies. A substantial amount of research has
    gone into the development of efficient algorithms for mining patterns from large
    structured or relational data. Compared with the fruitful achievements in mining
    structured data, mining in the semi-structured…

    Association mining consists of two important problems, namely frequent patterns
    discovery and rule construction. Because of its importance and application in a number of data mining
    tasks, it has become the focus of many studies. A substantial amount of research has
    gone into the development of efficient algorithms for mining patterns from large
    structured or relational data. Compared with the fruitful achievements in mining
    structured data, mining in the semi-structured world still remains at a preliminary stage.
    The most popular representative of the semi-structured data is XML. Mining frequent
    patterns from XML poses more challenges in comparison to mining frequent patterns
    from relational data because XML is a tree-structured data and has an ordered data
    context. Moreover, XML data in general is larger in data size due to richer contents and
    more meta-data. The increase of
    XML data and the need for mining semi-structured data has sparked a lot of interest in
    finding frequent rooted trees in forests.

    In this thesis, we aim to develop a framework to mine frequent patterns from XML
    documents. The framework utilizes a structure-guided enumeration approach, Tree
    Model Guided (TMG), for efficient enumeration of tree structure and it makes use of
    novel structures for fast enumeration and frequency counting. By utilizing a novel
    array-based structure, an embedded list (EL), the framework offers a simple sequence like tree enumeration technique. The effectiveness and extendibility of the framework is
    demonstrated in that it can be utilized not only for enumerating ordered subtrees but
    also for enumerating unordered subtrees and subsequences. The framework
    tackles the unprecedented complexity in mining frequent tree-structured patterns by
    generating only valid candidates with non-zero frequency count and employing a
    constraint-driven approach.

    Other creators
    See project
  • Google Cloud TPU

    -

    Empowering businesses with Google Cloud AI
    Machine learning has produced business and research breakthroughs ranging from network security to medical diagnoses. We built the Tensor Processing Unit (TPU) in order to make it possible for anyone to achieve similar breakthroughs. Cloud TPU is the custom-designed machine learning ASIC that powers Google products like Translate, Photos, Search, Assistant, and Gmail. Here’s how you can put the TPU and machine learning to work accelerating your…

    Empowering businesses with Google Cloud AI
    Machine learning has produced business and research breakthroughs ranging from network security to medical diagnoses. We built the Tensor Processing Unit (TPU) in order to make it possible for anyone to achieve similar breakthroughs. Cloud TPU is the custom-designed machine learning ASIC that powers Google products like Translate, Photos, Search, Assistant, and Gmail. Here’s how you can put the TPU and machine learning to work accelerating your company’s success, especially at scale.

    See project
  • Maguro, a system for indexing and searching over very large text collections

    -

    Maguro is a system for efficiently searching very large collections of text content of up to 1 trillion documents at low cost. Search engines span across content that is very dynamic and highly augmented with metadata to the tail content of the web. A long tail distribution of content calls for different trade-offs in the design space for good efficiency across the entire index range. Maguro is designed for the long tail of content with less dynamics and less metadata, but very good cost…

    Maguro is a system for efficiently searching very large collections of text content of up to 1 trillion documents at low cost. Search engines span across content that is very dynamic and highly augmented with metadata to the tail content of the web. A long tail distribution of content calls for different trade-offs in the design space for good efficiency across the entire index range. Maguro is designed for the long tail of content with less dynamics and less metadata, but very good cost efficiency. Maguro is part of the serving stack in Bing and allows us to scale the index significantly better.

    See project
  • TensorFlow for TPU

    -

    The core open source library to help you develop and train ML models. Get started quickly by running Colab notebooks directly in your browser.

    See project

Languages

  • English

    Native or bilingual proficiency

  • Indonesian

    Native or bilingual proficiency

Recommendations received

More activity by Henry

View Henry’s full profile

  • See who you know in common
  • Get introduced
  • Contact Henry directly
Join to view full profile

People also viewed

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Others named Henry Tan in United States

Add new skills with these courses