“Henry is an 'A player'. He is a well-rounded engineer who knows how to communicate well, creative and has great understanding of software engineering, math and physics. He is a must-have in the team (if you can convince him to work with you). I wholeheartedly recommend him.”
About
Working on the coolest LLM technology in one of the most famous Search engine on the…
Activity
-
My team Rufus is hiring Applied Scientists! We work on LLMs, RLHF, alignment, and other awesome things! DM me if you are interested. :)…
My team Rufus is hiring Applied Scientists! We work on LLMs, RLHF, alignment, and other awesome things! DM me if you are interested. :)…
Liked by Henry Tan
-
Very happy to see my team firing on all cylinders and launching critical updates to products which help our customers in life sciences make clinical…
Very happy to see my team firing on all cylinders and launching critical updates to products which help our customers in life sciences make clinical…
Liked by Henry Tan
-
Today, I am thrilled to share the release of #SnowflakeArctic, an enterprise focused foundation model from #Snowflake, that is efficiently…
Today, I am thrilled to share the release of #SnowflakeArctic, an enterprise focused foundation model from #Snowflake, that is efficiently…
Liked by Henry Tan
Experience
Education
-
University of Technology Sydney
-
Graduated on March 2008, with thesis entitled: "Tree Model Guided (TMG) Enumeration as the Basis for Mining Frequent Patterns from XML Documents"
1 year leave of absence in 2006-2007 while working @ Microsoft -
-
-
-
Obtained his Bachelor of Computer System Engineering with first class honour from La Trobe University, VIC, Australia in 2003. During his undergraduate study, he was nominated as the most outstanding Honours Student in Computer Science. Additionally, he was the holder of 2003 ACS Student Award
-
-
Received acceptance to prestigious Oxford EMBA program 2024 with Director Awards scholarship.
-
-
-
Activities and Societies: Musics club, science club, OSIS
SMAK 1 BPK PENABUR Jakarta (also known as SMUK 1, SMAK 1, or SMA Kristen 1 PENABUR - Jakarta, nicknamed "Smukie") is a private Protestant high school in Jakarta, Indonesia. It is located in Tanjung Duren, a financial and residential district in West Jakarta.
SMAK 1 is considered one of the most prestigious high schools in Indonesia. It sends its students to local and international competitions, most notably the International Science Olympiads.[1] In 2007 then-student Jonathan Pradana…SMAK 1 BPK PENABUR Jakarta (also known as SMUK 1, SMAK 1, or SMA Kristen 1 PENABUR - Jakarta, nicknamed "Smukie") is a private Protestant high school in Jakarta, Indonesia. It is located in Tanjung Duren, a financial and residential district in West Jakarta.
SMAK 1 is considered one of the most prestigious high schools in Indonesia. It sends its students to local and international competitions, most notably the International Science Olympiads.[1] In 2007 then-student Jonathan Pradana Mailoa (now studying at MIT) won the International Physics Olympiad in Singapore.
SMAK 1 is one of 50 schools in Jakarta, Banten, Lampung and West Java managed by BPK PENABUR, a Christian-based organization.
http://en.wikipedia.org/wiki/SMUK_1_Jakarta
Volunteer Experience
-
Tech Activist
Tech Activist in 2014 Indonesian Presidential Election (www.pilpres2014.org)
- Present 9 years 11 months
Politics
A monumental day in Indonesia, when the Elections General Commission (KPU) will be announcing who’s the next president of Indonesia based on the official voting tally. In the two weeks since the vote took place, both candidates declared that they have won based on different quick count results, and neither of them are backing down from their claims today. Because of this, many people in the country have turned to tech, creating initiatives such as online crowdsourced vote counts that aim to…
A monumental day in Indonesia, when the Elections General Commission (KPU) will be announcing who’s the next president of Indonesia based on the official voting tally. In the two weeks since the vote took place, both candidates declared that they have won based on different quick count results, and neither of them are backing down from their claims today. Because of this, many people in the country have turned to tech, creating initiatives such as online crowdsourced vote counts that aim to make the contested count more transparent.
Volunteered as a tech activist for a good cause during Indonesian Election 2014 to build a cloud-based (Azure) pipeline to automate real-time monitoring of vote counting by building an E2E pipeline that can automatically crawl the Election Commissioner site and display the tally almost in real-time. I led an initiative to develop pilpres2014.org through a 24 hours hackaton and invited other developers that share the same mission to help building the site to crawl, collect and display the election vote counter in real-time. The site was created as an independent watchdog to guard the clean process of democracy.
The project overview can be read from http://www.pilpres2014.org/AboutUs.html.
About Us
http://www.pilpres2014.org/AboutUs.html
Press coverage by TechInAsia:
https://www.techinasia.com/pilpres2014-open-source-indonesia-president-election-vote-counting-site/
Wikipedia:
http://id.wikipedia.org/wiki/Pemilihan_umum_Presiden_Indonesia_2014
Press coverage by Kompas.com (Indonesian largest online media):
http://tekno.kompas.com/read/2014/07/20/15310027/peneliti.microsoft.ikut.awasi.hitung.suara.pilpres.2014
http://tekno.kompas.com/read/2014/07/20/15310027/peneliti.microsoft.ikut.awasi.hitung.suara.pilpres.2014
http://tekno.kompas.com/read/2014/07/23/10405767/bikin.bangga.semangat.kolaborasi.teknologi.untuk.pilpres.2014
Publications
-
Maguro, a System for Indexing and Searching over Very Large Text Collections
ACM (WSDM Conference)
Maguro is a system for efficiently searching very large collections of text content of up to 1 trillion documents at low cost. Search engines span across content that is very dynamic and highly augmented with metadata to the tail content of the web. A long tail distribution of content calls for different trade-offs in the design space for good efficiency across the entire index range. Maguro is designed for the long tail of content with less dynamics and less metadata, but very good cost…
Maguro is a system for efficiently searching very large collections of text content of up to 1 trillion documents at low cost. Search engines span across content that is very dynamic and highly augmented with metadata to the tail content of the web. A long tail distribution of content calls for different trade-offs in the design space for good efficiency across the entire index range. Maguro is designed for the long tail of content with less dynamics and less metadata, but very good cost efficiency. Maguro is part of the serving stack in Bing and allows us to scale the index significantly better.
Other authorsSee publication -
Mining of Data with Complex Structures
Series: Studies in Computational Intelligence, Vol. 333
The primary audience is 3rd year, 4th year undergraduate students, Masters and PhD students and academics. The book can be used for both teaching and research. The secondary audiences are practitioners in industry, business, commerce, government and consortiums, alliances and partnerships to learn how to introduce and efficiently make use of the techniques for mining of data with complex structures into their applications. The scope of the book is both theoretical and practical and as such it…
The primary audience is 3rd year, 4th year undergraduate students, Masters and PhD students and academics. The book can be used for both teaching and research. The secondary audiences are practitioners in industry, business, commerce, government and consortiums, alliances and partnerships to learn how to introduce and efficiently make use of the techniques for mining of data with complex structures into their applications. The scope of the book is both theoretical and practical and as such it will reach a broad market both within academia and industry. In addition, its subject matter is a rapidly emerging field that is critical for efficient analysis of knowledge stored in various domains.
Other authorsSee publication -
Tree model guided candidate generation for mining frequent subtrees from XML documents
Journal ACM Transactions on Knowledge Discovery from Data (TKDD) TKDD Homepage archive Volume 2 Issue 2, July 2008 Article No. 9 ACM New York, NY, USA
Due to the inherent flexibilities in both structure and semantics, XML association rules mining faces few challenges, such as: a more complicated hierarchical data structure and ordered data context. Mining frequent patterns from XML documents can be recast as mining frequent tree structures from a database of XML documents. In this study, we model a database of XML documents as a database of rooted labeled ordered subtrees. In particular, we are mainly concerned with mining frequent induced…
Due to the inherent flexibilities in both structure and semantics, XML association rules mining faces few challenges, such as: a more complicated hierarchical data structure and ordered data context. Mining frequent patterns from XML documents can be recast as mining frequent tree structures from a database of XML documents. In this study, we model a database of XML documents as a database of rooted labeled ordered subtrees. In particular, we are mainly concerned with mining frequent induced and embedded ordered subtrees. Our main contributions are as follows. We describe our unique embedding list representation of the tree structure, which enables efficient implementation of our Tree Model Guided (TMG) candidate generation. TMG is an optimal, nonredundant enumeration strategy that enumerates all the valid candidates that conform to the structural aspects of the data. We show through a mathematical model and experiments that TMG has better complexity compared to the commonly used join approach. In this article, we propose two algorithms, MB3-Miner and iMB3-Miner. MB3-Miner mines embedded subtrees. iMB3-Miner mines induced and/or embedded subtrees by using the maximum level of embedding constraint. Our experiments with both synthetic and real datasets against two well-known algorithms for mining induced and embedded subtrees, demonstrate the effectiveness and the efficiency of the proposed techniques.
Other authorsSee publication -
IMB3-Miner: mining induced/embedded subtrees by constraining the level of embedding
Pacific-Asia Conference on Knowledge Discovery and Data Mining
Tree mining has recently attracted a lot of interest in areas such as Bioinformatics, XML mining, Web mining, etc. We are mainly concerned with mining frequent induced and embedded subtrees. While more interesting patterns can be obtained when mining embedded subtrees, unfortunately mining such embedding relationships can be very costly. In this paper, we propose an efficient approach to tackle the complexity of mining embedded subtrees by utilizing a novel Embedding List representation, Tree…
Tree mining has recently attracted a lot of interest in areas such as Bioinformatics, XML mining, Web mining, etc. We are mainly concerned with mining frequent induced and embedded subtrees. While more interesting patterns can be obtained when mining embedded subtrees, unfortunately mining such embedding relationships can be very costly. In this paper, we propose an efficient approach to tackle the complexity of mining embedded subtrees by utilizing a novel Embedding List representation, Tree Model Guided enumeration, and introducing the Level of Embedding constraint. Thus, when it is too costly to mine all frequent embedded subtrees, one can decrease the level of embedding constraint gradually up to 1, from which all the obtained frequent subtrees are induced subtrees. Our experiments with both synthetic and real datasets against two known algorithms for mining induced and embedded subtrees, FREQT and TreeMiner, demonstrate the effectiveness and the efficiency of the technique.
Other authorsSee publication -
MB3-Miner: Mining eMBedded subTREEs using tree model guided candidate generation
Proceedings of the 1st International Workshop on Mining Complex Data (MCD’05), Houston, TX, USA, pp. 103-110.
Patents
-
DYNAMIC QUERY MASTER AGENT FOR QUERY EXECUTION
Issued US 9195745
A preliminary segment root and a final segment root are selected for each segment. Each time a search query is received, a set of nodes in each segment that will be used to resolve the search query is identified. A preliminary segment root is selected from the set of nodes. Based on statistical data from each node in the set of nodes indicating each node's capability to act as a final segment root that assembles query-execution data, the preliminary segment root algorithmically selects the…
A preliminary segment root and a final segment root are selected for each segment. Each time a search query is received, a set of nodes in each segment that will be used to resolve the search query is identified. A preliminary segment root is selected from the set of nodes. Based on statistical data from each node in the set of nodes indicating each node's capability to act as a final segment root that assembles query-execution data, the preliminary segment root algorithmically selects the final segment root. The other nodes in the set of nodes are notified regarding the identity of the final segment root.
Other inventorsSee patent -
Processing data obtained from a presence-based system
Issued US 11772111
Functionality is described for collecting data from a presence-based system, such as an instant messaging system. The functionality can extract information from the collected data based on one or more rules. For instance, the functionality can identify presence data and/or message data that includes predetermined key words. The functionality can formulate result information based on the extracted information for presentation to a recipient. Based on these operations, the collected data supports…
Functionality is described for collecting data from a presence-based system, such as an instant messaging system. The functionality can extract information from the collected data based on one or more rules. For instance, the functionality can identify presence data and/or message data that includes predetermined key words. The functionality can formulate result information based on the extracted information for presentation to a recipient. Based on these operations, the collected data supports a data-mining operation, as well as the traditional role of facilitating communication among the participants of the presence-based system. The result information can correspond to a report that presents aggregated findings, optionally organized into one or more demographic categories, or the result information can correspond to an advertisement, etc.
Other inventorsSee patent
Projects
-
Multilayer Perceptron (Neural Network) as the basis for Gaming Motion Generation (Honours Thesis)
The computer gaming industry is large and growing rapidly. Consumers demand realistic motion in computer games. Consequently, the ability to realistically simulate motion is vital in computer games development. Currently, most simulation of motion is done through the use of a physics engine which essentially involves numerically solution of the set of differential equations which describe the corresponding physics situation in the real world. This paper proposes an alternative strategy using a…
The computer gaming industry is large and growing rapidly. Consumers demand realistic motion in computer games. Consequently, the ability to realistically simulate motion is vital in computer games development. Currently, most simulation of motion is done through the use of a physics engine which essentially involves numerically solution of the set of differential equations which describe the corresponding physics situation in the real world. This paper proposes an alternative strategy using a multilayer perceptron to generate these simulations. Benchmarking against traditional physics engines shows that there are considerable advantages to the new methodology.
Other creatorsSee project -
Google Cloud Video Intelligence
-
Video Intelligence API has pre-trained machine learning models that automatically recognize a vast number of objects, places, and actions in stored and streaming video. It’s highly efficient for common use cases and improves over time as new concepts are introduced.
-
Project Adam v2.0
-
Large deep neural network models have recently demonstrated state-of-the-art accuracy on hard visual recognition tasks. Unfortunately such models are extremely time consuming to train and require large amount of compute cycles. We describe the design and implementation of a distributed system called Adam comprised of commodity server machines to train such models that exhibits world-class performance, scaling and task accuracy on visual recognition tasks. Adam achieves high efficiency and…
Large deep neural network models have recently demonstrated state-of-the-art accuracy on hard visual recognition tasks. Unfortunately such models are extremely time consuming to train and require large amount of compute cycles. We describe the design and implementation of a distributed system called Adam comprised of commodity server machines to train such models that exhibits world-class performance, scaling and task accuracy on visual recognition tasks. Adam achieves high efficiency and scalability through whole system co-design that optimizes and balances workload computation and communication. We exploit asynchrony throughout the system to improve performance and show that it additionally improves the accuracy of trained models. Adam is significantly more efficient and scalable than was previously thought possible and used 30x fewer machines to train a large 2 billion connection model to 2x higher accuracy in comparable time on the ImageNet 22,000 category image classification task than the system that previously held the record for this benchmark. We also show that task accuracy improves with larger models. Our results provide compelling evidence that a distributed systems-driven approach to deep learning using current training algorithms is worth pursuing.
-
Tree Model Guided (TMG) Enumeration as the Basis for Mining Frequent Patterns from XML Documents
-
Association mining consists of two important problems, namely frequent patterns
discovery and rule construction. Because of its importance and application in a number of data mining
tasks, it has become the focus of many studies. A substantial amount of research has
gone into the development of efficient algorithms for mining patterns from large
structured or relational data. Compared with the fruitful achievements in mining
structured data, mining in the semi-structured…Association mining consists of two important problems, namely frequent patterns
discovery and rule construction. Because of its importance and application in a number of data mining
tasks, it has become the focus of many studies. A substantial amount of research has
gone into the development of efficient algorithms for mining patterns from large
structured or relational data. Compared with the fruitful achievements in mining
structured data, mining in the semi-structured world still remains at a preliminary stage.
The most popular representative of the semi-structured data is XML. Mining frequent
patterns from XML poses more challenges in comparison to mining frequent patterns
from relational data because XML is a tree-structured data and has an ordered data
context. Moreover, XML data in general is larger in data size due to richer contents and
more meta-data. The increase of
XML data and the need for mining semi-structured data has sparked a lot of interest in
finding frequent rooted trees in forests.
In this thesis, we aim to develop a framework to mine frequent patterns from XML
documents. The framework utilizes a structure-guided enumeration approach, Tree
Model Guided (TMG), for efficient enumeration of tree structure and it makes use of
novel structures for fast enumeration and frequency counting. By utilizing a novel
array-based structure, an embedded list (EL), the framework offers a simple sequence like tree enumeration technique. The effectiveness and extendibility of the framework is
demonstrated in that it can be utilized not only for enumerating ordered subtrees but
also for enumerating unordered subtrees and subsequences. The framework
tackles the unprecedented complexity in mining frequent tree-structured patterns by
generating only valid candidates with non-zero frequency count and employing a
constraint-driven approach.Other creatorsSee project -
Google Cloud TPU
-
Empowering businesses with Google Cloud AI
Machine learning has produced business and research breakthroughs ranging from network security to medical diagnoses. We built the Tensor Processing Unit (TPU) in order to make it possible for anyone to achieve similar breakthroughs. Cloud TPU is the custom-designed machine learning ASIC that powers Google products like Translate, Photos, Search, Assistant, and Gmail. Here’s how you can put the TPU and machine learning to work accelerating your…Empowering businesses with Google Cloud AI
Machine learning has produced business and research breakthroughs ranging from network security to medical diagnoses. We built the Tensor Processing Unit (TPU) in order to make it possible for anyone to achieve similar breakthroughs. Cloud TPU is the custom-designed machine learning ASIC that powers Google products like Translate, Photos, Search, Assistant, and Gmail. Here’s how you can put the TPU and machine learning to work accelerating your company’s success, especially at scale. -
Maguro, a system for indexing and searching over very large text collections
-
Maguro is a system for efficiently searching very large collections of text content of up to 1 trillion documents at low cost. Search engines span across content that is very dynamic and highly augmented with metadata to the tail content of the web. A long tail distribution of content calls for different trade-offs in the design space for good efficiency across the entire index range. Maguro is designed for the long tail of content with less dynamics and less metadata, but very good cost…
Maguro is a system for efficiently searching very large collections of text content of up to 1 trillion documents at low cost. Search engines span across content that is very dynamic and highly augmented with metadata to the tail content of the web. A long tail distribution of content calls for different trade-offs in the design space for good efficiency across the entire index range. Maguro is designed for the long tail of content with less dynamics and less metadata, but very good cost efficiency. Maguro is part of the serving stack in Bing and allows us to scale the index significantly better.
-
TensorFlow for TPU
-
The core open source library to help you develop and train ML models. Get started quickly by running Colab notebooks directly in your browser.
Languages
-
English
Native or bilingual proficiency
-
Indonesian
Native or bilingual proficiency
Recommendations received
7 people have recommended Henry
Join now to viewMore activity by Henry
-
Our next event at the King's Entrepreneurship Lab is tomorrow (25th of April), where Cheryl Misak, author of the widely acclaimed biography of Frank…
Our next event at the King's Entrepreneurship Lab is tomorrow (25th of April), where Cheryl Misak, author of the widely acclaimed biography of Frank…
Liked by Henry Tan
-
Today, Gemini model can access Google Search for grounded response that are more trustworthy, helpful and factual. We are releasing this API in…
Today, Gemini model can access Google Search for grounded response that are more trustworthy, helpful and factual. We are releasing this API in…
Liked by Henry Tan
-
As somebody that listens to a lot of music, and that has to have it playing almost non-stop (no matter what I’m doing), it’s useful to have playlists…
As somebody that listens to a lot of music, and that has to have it playing almost non-stop (no matter what I’m doing), it’s useful to have playlists…
Liked by Henry Tan
-
👏👏👏 Great news!!!! Welcome to the team Andrew Ng !!!!! Dr. Andrew Ng has been appointed to Amazon’s Board of Directors. I’m super fan of Andrew,…
👏👏👏 Great news!!!! Welcome to the team Andrew Ng !!!!! Dr. Andrew Ng has been appointed to Amazon’s Board of Directors. I’m super fan of Andrew,…
Liked by Henry Tan
-
Our goal this year at CS in Schools is to have 100,000 students take one of our coding courses in 2024 as part of their Australian school day; last…
Our goal this year at CS in Schools is to have 100,000 students take one of our coding courses in 2024 as part of their Australian school day; last…
Liked by Henry Tan
-
Thrive at the forefront of innovation and ambition with the MIT Executive MBA.
Thrive at the forefront of innovation and ambition with the MIT Executive MBA.
Liked by Henry Tan
-
Great catching up with our amazing University of Cambridge Executive #MBA students (2022 cohort) who are on their last teaching week! And what an…
Great catching up with our amazing University of Cambridge Executive #MBA students (2022 cohort) who are on their last teaching week! And what an…
Liked by Henry Tan
-
The #EU has the most advanced #green legislation in the world. But the bloc is not on track to meet its #climate targets, even as it approaches…
The #EU has the most advanced #green legislation in the world. But the bloc is not on track to meet its #climate targets, even as it approaches…
Liked by Henry Tan
-
Congrats to my brilliant PhD student Rhys Williams (Cambridge Judge Business School & King's College, Cambridge)! 🎉 🎈 🥂 Did you know that Rhys…
Congrats to my brilliant PhD student Rhys Williams (Cambridge Judge Business School & King's College, Cambridge)! 🎉 🎈 🥂 Did you know that Rhys…
Liked by Henry Tan
-
There is an explosion of total nonsense around LLM auto-eval in the space these days... be wary of approaches that claim to kick the human out of the…
There is an explosion of total nonsense around LLM auto-eval in the space these days... be wary of approaches that claim to kick the human out of the…
Liked by Henry Tan
-
Fantastic to host my co-author Xianwei Shi and Xingkun Liang who finally made it to St Edmund's College, Cambridge (Xingkun completed his PhD here)…
Fantastic to host my co-author Xianwei Shi and Xingkun Liang who finally made it to St Edmund's College, Cambridge (Xingkun completed his PhD here)…
Liked by Henry Tan
-
Happy #Easter! We have just published the latest #Mindsets piece (the King's Entrepreneurship Lab blog series) by our Senior Advisory Board Chair…
Happy #Easter! We have just published the latest #Mindsets piece (the King's Entrepreneurship Lab blog series) by our Senior Advisory Board Chair…
Liked by Henry Tan
-
Mistral Large, Mistral AI’s newest and most advanced LLM, is now avail on Amazon Bedrock. Mistral Large is terrific at a wide range of tasks across…
Mistral Large, Mistral AI’s newest and most advanced LLM, is now avail on Amazon Bedrock. Mistral Large is terrific at a wide range of tasks across…
Liked by Henry Tan
-
We just open-sourced Thunder, a new compiler for PyTorch! In LLM training tasks (e.g., Llama 2 7B), it can achieve a 40% speedup compared to regular…
We just open-sourced Thunder, a new compiler for PyTorch! In LLM training tasks (e.g., Llama 2 7B), it can achieve a 40% speedup compared to regular…
Liked by Henry Tan
People also viewed
Explore collaborative articles
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
Explore MoreOthers named Henry Tan in United States
-
Henry Tan
Risk Management | Statistics & Economics @ Cornell | Yayasan Khazanah Scholar
-
Henry Tan
-
Henry Tan
-
Henry Tan
President, International Leadership Consortium
52 others named Henry Tan in United States are on LinkedIn
See others named Henry Tan