As the title implies, a new add-in has been developed by Microsoft (as a garage project actually) to help transform data in seconds. It’s mainly targeting Excel users who face problems in data wrangling; that is transforming raw data into a different format so that it can be used for further analysis. Usually this task is one of the most tedious tasks for any Excel user since the data could be coming from different sources in different formats and the cleaning process to make it all "homogeneous" could be a little bit daunting with too many functions and formulas included. With the "Transform Data by Example" add-in, you can automatically find the relevant data transformation functions and compose them to solve your data wrangling task by giving Excel a few examples. How does it work? Let’s have a look at an example to better understand how it works. Given a column of dates in your spreadsheet as shown in column E in the figure below, suppose you want to extract the day-of-week for each date so that you can subsequently filter or group your data by day-of-week. Note that the input data might appear in different formats, and the day of week may not even exist as part of the input data. Excel today does not include a built-in function that does this transformation. Although suitable functions do exist in open source repositories such as .NET, GitHub and Stack Overflow, or perhaps in a repository within your own organization, it is not easy for you to discover the right function for your task. With the Transform Data by Example, all you need to do is to provide a few examples of the desired output (as shown in column F below). Once you click on Get Transformations, Transform Data by Example will return a ranked list of programs that are automatically generated using a large collection of transformation functions it indexes to perform your desired transformation tasks. When you click on a result, it applies that program and fills the output column with the transformed data (see Figure below). In this example, a program automatically generated using a standard .NET function (System.DateTime.Parse) was used in completing the task. What is happening in the backend? Transform Data by Example is powered by a backend service that runs on Azure. Like a web search engine, this service crawls and indexes many open source data transformation functions from sources such as .NET, GitHub and Stack Overflow and web services such as Bing Maps. By exploiting the powerful capabilities of the Azure platform, and using technology developed in Microsoft Research, the engine can quickly search through this large index and find functions best suited for the given task specified by input-output examples. What is next? While Transform Data by Example currently targets information workers in Excel who need to transform data residing within their spreadsheets, in the future the project members plan to make it easy to deploy the resulting transformations on large datasets residing in databases or data lakes. So, bring on your examples and let Transform Data by Example build the transformations you need! You can download the add in for FREE from this link Check out this video to learn more about the add-in. |
|