Important Announcement Regarding YUI

The Yahoo User Interface library (YUI) has been in use at Yahoo since 2005, and was first announced to the public on February 13, 2006. Although it has evolved tremendously since that time, YUI has always served the same overarching purpose of providing a comprehensive toolkit to make it easier for developers to create rich web applications. As such, YUI is an important part of Yahoo’s history: millions of lines of code relying on YUI have been written and are still in use at Yahoo today. However, it has become clear to us that the industry is now headed in a new direction. As most of you know, the web platform has been undergoing a drastic transformation over the past few years. JavaScript is now more ubiquitous than ever. The emergence of Node.JS has allowed JavaScript to be used on the server side, opening the door to creating isomorphic single page applications. New package managers (npm, bower) have spurred the rise of an ecosystem of 3rd party, open source, single-purpose tools that complement each other, embracing the UNIX philosophy and enabling very complex development use cases. New build tools (Grunt and its ecosystem of plugins, Broccoli, Gulp) have made it easier to assemble those tiny modules into large, cohesive applications. New application frameworks (Backbone, React, Ember, Polymer, Angular, etc.) have helped architect web applications in a more scalable and maintainable way. New testing tools (Mocha, Casper, Karma, etc.) have lowered the barrier of entry to building a solid continuous delivery pipeline. Standard bodies (W3C, Ecma) are standardizing what the large JavaScript frameworks have brought to the table over the years, making them available natively to a larger number of devices. Finally, browser vendors are now committed to making continuous improvements to their web browsers while aligning more closely with standards. With so called “evergreen web browsers”, which are making it easier for users to run the latest stable version of a web browser, we can expect a significant reduction in the amount of variance across user agents.

The consequence of this evolution in web technologies is that large JavaScript libraries, such as YUI, have been receiving less attention from the community. Many developers today look at large JavaScript libraries as walled gardens they don’t want to be locked into. As a result, the number of YUI issues and pull requests we’ve received in the past couple of years has slowly reduced to a trickle. Most core YUI modules do not have active maintainers, relying instead on a slow stream of occasional patches from external contributors. Few reviewers still have the time to ensure that the patches submitted are reviewed quickly and thoroughly.

Therefore, we have made the difficult decision to immediately stop all new development on YUI in order to focus our efforts on this new technology landscape. This means that, going forward, new YUI releases will likely be few and far between, and will only contain targeted fixes that are absolutely critical to Yahoo properties.

The mission of the YUI team at Yahoo continues to be to deliver the best next-generation presentation technologies with an initial focus on internal developers. We remain optimistic about the future of web presentation technologies and are eager to continue working with the external frontend community to share and learn together.

Julien Lecomte, Director of Engineering, Yahoo Presentation Technologies

How Math Helped Me Write Less Code and Made Our App Faster

When we set out to design the first two digital magazines at Yahoo, Yahoo Food and Yahoo Tech, we knew we wanted an immersive experience with the content front and center. We saw these traits in the design for Flickr, where large images are shown in an expansive grid view:

image

We decided to build the base of our magazines around this grid, but our designers wanted to take it one step further. Just as the content in our magazines is curated, we also wanted the layouts to feel hand chosen. Our designers devised a series of options and finally we came to a new row type where there would be one large tile that seamlessly punches through two normal rows (our team has become accostomed to calling these sorts of tiles “double talls”). Here’s an example of this new row type as seen in Yahoo Tech:

image

In order to maintain the feeling of a magazine-inspired layout, we knew we couldn’t take a shortcut and programmatically crop images to fit in our layout. Instead, we had to come up with a method to perfectly size and position tiles to fit in our grid.

I knew what my goal was, but I wasn’t quite sure how I was going to end up there. So I sat down and started drawing (I actually picked up a pen!) the double tall layout on paper. Our base grid was pretty easy to reason about and come up with a simple iterative algorithm: add tiles to the row until the row is too wide and then resize the tiles to fit perfectly. But this layout was different — there were too many interdependencies between the sizes of each tile. For example, increasing the size of the big one causes the rest of the tiles to change size, but by different amounts depending on which row they’re in and their particular aspect ratios. This new layout seemed complicated enough that I wouldn’t be able to stare at it and come up with a layout algorithm in my head.

I wanted to feel like I was making progress, so I made my next goal to write as many facts about the layout as I could. I looked at my diagram and started writing down equations, until I realized I had written down enough information to obtain a closed-form solution for the layout. A few hours later, I typed out these 5 lines of code:

var h_c  = (q_b - q_a + r_a * p + (r_b + r_a) *
             ((p + q_a - w) / r_a - p)) /
           (-r_b - r_b / r_a * c - c);
var h_b  = (p + q_a - w) / r_a - p + (1 + c / r_a) * h_c;
var h_a  = -p + h_c - h_b;
var w_c  = c * h_c;
var w_ab = q_a + r_a * h_a;

Don’t worry about the specific variables on the right side of the equations; the details are in the linked paper below. The important point is that the five equations represent the height of the double tall tile, the height of the bottom row, the height of the top row, the width of the double tall tile and the width of both of the subrows. With these equations solved, I could now implement the new layout using the same logic as in our existing Flickr-style layout code.

So how exactly did I get to these equations? I realized when I was sketching and writing down constraints, that I had enough constraints to solve the system of equations for all of the variables. I’ve written up this approach with a full explanation of the variables and constraints in a paper that you can find here (warning: it contains a small amount of linear algebra): Breaking Row Boundaries in a Fully Justified Tile Layout

An Alternative: a Constraint-Based Layout System

Instead of manually coming up with a closed form solution, I could have avoided solving these equations altogether and expressed the constraints directly in code using a constraint-based layout system. To make this easier to explain, here’s the diagram of our new layout from the paper:

image

For example, to enforce that all tiles of a row are a given height, we would say that a1.height must equal a2.height, and a2.height must equal a3.height, and so on. We can even introduce other quantities like padding into the equations: a1.left should equal c.right plus padding. Once you’ve declared all of these relationships, the constraint solver will tell you the numerical values of things like a2.height and a3.left.

This is a very powerful idea, and is particularly useful when there are many subtle layout variations and edge cases. In fact, Apple added a new constraint-based layout system in iOS 6 that’s based on several papers from the Cassowary constraint solving toolkit.

In the end, I decided that using a full constraint-based layout system would be overkill for this project. First of all, our default Flickr-like grid isn’t expressed well with constraints; we dynamically choose how many tiles to put in a row while we’re calculating the layout. A constraint-based approach would have us “guess” at different row compositions and then choose the best one. Second, we have extremely tight time budgets to calculate layout: we need our app to feel extremely responsive across a wide variety of devices and we need to lay out new rows while the user is scrolling.

As an experiment, I compared my closed-form layout solution to a constraint-based solution that used a Javascript implementation of Cassowary. On my local iMac, the constraint-based solution was able to lay out 171 rows per second, while my closed-form solution was able to lay out 735,000 rows per second. Results like these aren’t surprising; a constraint solver uses numerical methods to solve for arbitrary constraints, while our double tall rows have one very specific set of constraints. While 171 rows per second seems like plenty, we currently have plans to dynamically perform different candidate layouts and then choose the ideal layout. By ensuring that we use a very performant layout mechanism, we’ll be able to achieve more visually pleasing results in the future. Also, a constraint system is not a small dependency: minified, the Javascript version of Cassowary is 47 kb and adds substantial complexity to our codebase.

Results

We’re very happy with how this new type of layout looks in our apps. Go check out Yahoo Food and Yahoo Tech to see double talls in action! And if you’re interested in working on problems like these, we’re looking for more front-end developers at Yahoo Media.

Code coverage for executable Node.js scripts

The YUI team at Yahoo is serious about automated testing. YUI is a foundational part of Yahoo websites, so it’s very important that we keep quality high. Our test automation system has ran over 31 million tests in the last 6 months across over a dozen challenging browser environments, with an average of over 188,000 tests ran every day.

We build programs with Node.js that help build and test YUI, such as the open-source Yeti, our unique test result viewer, and various small command-line utilities that assist every step of our automated testing. Of course, these test automation programs themselves have their own tests with high code coverage to ensure quality.

Yahoo’s Istanbul makes it very easy for your Node.js project to benefit from code coverage statistics—often as easy as adding istanbul test to your npm test script.

If you use Mocha, your package.json might look like this with Istanbul:

{
  "name": "my-awesome-lib",
  "version": "1.0",
  "bin": "bin.js",
  "script": {
    "test": "istanbul test _mocha"
  },
  "devDependencies": {
    "chai": "~1.8.1",
    "istanbul": "~0.2.4",
    "mocha": "~1.17.1"
  }
}

Using npm test simply runs _mocha, but npm test --coverage will output handy coverage information with on-the-fly instrumentation.

> npm test --coverage

> mock-utf8-stream@0.1.0 test /Users/reid/Development/mock-utf8-stream
> istanbul test _mocha


  ...........

  11 passing (18ms)

=============================================================================
Writing coverage object [/Users/reid/Development/mock-utf8-stream/coverage/coverage.json]
Writing coverage reports at [/Users/reid/Development/mock-utf8-stream/coverage]
=============================================================================

=============================== Coverage summary ===============================
Statements   : 92% ( 46/50 )
Branches     : 75% ( 9/12 )
Functions    : 100% ( 10/10 )
Lines        : 92% ( 46/50 )
================================================================================

You also get nice HTML reports (example) that let you know exactly what code you’re testing.

It’s great. You should really use Istanbul.

Why _mocha?

Normally you’d run mocha to run Mocha tests, but astute observers may have noticed that my package.json uses _mocha instead. That’s because mocha is merely a small wrapper script that starts the real test script, _mocha. Since Istanbul works by hooking into Node’s module loader, it has no influence on the _mocha subprocess. So, we call _mocha directly, which works fine for the purposes of npm-test(1).

Code coverage for a child_process

The problem of crossing process boundaries occurs when attempting to test and collect code coverage for Node scripts that are executable scripts—the command-line interface to the rest of the program.

These should be tested like everything else, but testing them can be a challenge. The obvious way to test these scripts would be to use child_process.

var path = require("path");
var fs = require("fs");
var tmp = require("tmp");
var chai = require("chai");
var child_process = require("child_process");
var assert = chai.assert;

tmp.setGracefulCleanup();

describe("executable script", function () {
    it("should write data on success", function (done) {
        tmp.dir({
            unsafeCleanup: true, // remove contents on exit
        }, function onDirCreate(err, dir) {
            if (err) throw err;
            child_process.exec([
                path.join(__dirname, "../bin.js"), // script being tested
                path.join(dir, "success.txt")
            ].join(" "), function (err, stdout, stderr) {
                assert.isNull(err);
                assert.isTrue(fs.existsSync(path.join(dir, "success.txt")), "did not find success.txt");
                assert.match(stdout, /completed/, "Expected a success message");
                done();
            });
        });
    });
});

While these tests come very close to testing what an actual user would do, using child_process means that Istanbul cannot instrument the code used by the executable script.

Move it to lib

My solution for this problem is to make the executable script as small as possible. It usually looks like this, adapted from Yeti’s cli.js:

#!/usr/bin/env node

"use strict";

var bin = require("./lib/cli");

bin({
    stdin:  process.stdin,
    stdout: process.stdout,
    stderr: process.stderr,
    argv: process.argv
});

This file will not have code coverage reporting. But since we moved everything to the lib/cli module, we can now test the majority of the CLI by passing in a mock stdin, stdout, and stderr.

Instead of using console.log and related methods, we switch to using stdout.write.

// Old way.
module.exports = function bin(opts) {
    console.log("Command success!");
}

// New way.
module.exports = function bin(opts) {
    opts.stdout.write("Command success!\n");
}

Here’s a simple example of the new test using mock streams:

var path = require("path");
var fs = require("fs");
var tmp = require("tmp");
var chai = require("chai");
var bin = require("../lib/bin");
var stream = require("mock-utf8-stream");
var assert = chai.assert;

tmp.setGracefulCleanup();

describe("executable script lib", function () {
    it("should write data on success", function (done) {
        tmp.dir({
            unsafeCleanup: true, // remove contents on exit
        }, function onDirCreate(err, dir) {
            if (err) throw err;
            var stdout = new stream.MockWritableStream();
            var stderr = new stream.MockWritableStream();
            stdout.captureData();
            bin({
                stdout: stdout,
                stderr: stderr,
                argv: [
                    "node",
                    path.join(__dirname, "../bin.js"), // simulate real process.argv
                    path.join(dir, "success.txt")
                ]
            }, function (err) {
                assert.isNull(err);
                assert.isTrue(fs.existsSync(path.join(dir, "success.txt")), "did not find success.txt");
                assert.match(stdout.capturedData, /completed/, "Expected a success message");
                done();
            });
        });
    });
});

mock-utf8-stream

Mocking text streams is something I do often, so I published mock-utf8-stream to make this easier. It’s the same code that’s been used by Yeti‘s own tests and now I’m using it for other projects to increase code coverage. View it on GitHub. Happy testing!

Scheduled Rendering and Pipelining in Latency Sensitive Web Applications

Mojito Pipeline

Rendering views for a web app that has less than trivial backends can quickly become problematic as the app may end up holding onto resources while waiting for the full response from all backends. This can become a strategic pain point that monopolizes memory and blocks progress, resulting in seconds of nothing else than a blank page and an idle connection. At Yahoo Search, our dependence on several backends has forced us to become creative to step up end-to-end performance. To decrease perceived latency, even-out bandwidth usage, and free up frontend memory faster, we’ve adopted an approach similar to what Facebook detailed in this post.

Web page pipelining: the basics

Our goal is to be able to render sections of the page as soon as the corresponding data becomes available and concurrently flush rendered sections to the client, so the total time between transmitting the first to last byte of the response is significantly shorter. The process can be roughly decomposed as follows:

Step 1. When a request arrives, the page is divided into small, coherent sections (called mojits) and data is requested from the backends if necessary.

<html>
<head><!-- static assets, etc.. --></head>
<body>
<script type="text/javascript">
/**
* pipeline client that knows how to insert incoming markup.
*/
var pipeline = ...
pipeline.push = function (section) { ...
</script>
<div id="section1"><!-- this is an empty slot--></div>
<div id="section2"><!-- this is an empty slot--></div>

Step 2. At the same time, the frontend sends the client a ‘skeleton’ of the page containing empty placeholders to be filled by sections as they get flushed. Something like this:

Notice how the <body> tag is not closed.

Step 3. The backends start responding with the requested data. Once the data that a section needs arrives, the section is rendered and serialized within a <script> block, which is flushed to the client as soon as possible. Something like:

    <script>
        pipeline.push({
            id: "section1"
            markup: "<div>Hello, this is rendered section 1!!</div>"
        });
    </script>

Step 4. The client receives the script block containing the serialized section, and executes the script, which inserts the section into its corresponding placeholder on the page. Below is the simplified pipeline.push function that is called by the executed script.

    pipeline.push = function (section) {
        window.getElementById(section.id).innerHTML = section.markup;
    }

Step 5. Once the frontend is done rendering all the sections and there are no more <script> blocks to send, it sends the closing tags to the client and closes the connection:

    </body>
</html>

So what do you get for free (almost)?

This approach has several advantages.

  • It decreases the user perceived latency: now sections such as the logo, the search box and all the static stuff that appears on the page all the time can be sent immediately and the user can start typing instead of staring into cold nothingness.
  • It maximizes bandwidth usage: by flushing sections as soon as they are ready, we ensure that the available bandwidth is fully utilized when there is data to be sent.
  • It optimizes last-byte latency: periodic flushing also means that the last flush is a lot smaller and faster to transmit since it contains only the last section.
  • It optimizes memory usage: by rendering and flushing as soon as possible, we don’t have to hold onto the data of faster sections, and so less memory is used at any given point.

Okay, I’m pretty sure this is awesome - how do I get it?

You’re in luck: we made this stuff open-source and free (as in beer and as in speech) for our favorite Node.js app-framework at Search: mojito. Mojito makes it easy to divide your page into sections or “mojits” (which is also useful for reusability, maintainability, and overall spiritual wellness). We made it a package called  mojito-pipeline that you can get on github and npm, and you will be able to see how to get ninja powers with complex scheduling, dependencies between sections, conditional rules and more!

Jacques Arnoux (arnoux [at] yahoo-inc [dot] com) & Albert Jimenez (jimenez [at] yahoo-inc [dot] com) for Yahoo! Search Front-End Platform

The Spaghetti Problem of Low Coverage Features in Industrial Web Applications

Large applications deployed globally face an engineering complexity problem that directly stems from their size. As they become older, bigger, and go through more iterations, they tend to accumulate legacy code that eventually leads to paralysis. The initial owners left, 7000 people worked on it, and nobody has its overall architecture in mind anymore.

Lately, we at Yahoo Search decided to experiment (bucket test) at all levels of the app, much more rapidly and nimbly than before (think every day with a team of 6 developers). Each bit/feature of the page may be served in a different flavor for a small subset of the users, which lets us see how users react to a change in our app and retain only the winning changes. Since each user sees many of those page bits/features, each user is served with a combination of experiments. Where then, should we write those flavors so they can be combined dynamically, yet have no dependance to one another, be easily versionable, testable, maintainable and reusable? Depending on the architecture, it can be easy to just stitch-up a feature (think if-statement) on top of the big bowl of spaghetti rather than thinking of a modular, reusable, scalable way of writing it. Here at Yahoo, we use Mojito and the answer lies in how a mojito app is structured.

The tree that masks the forest

Mojito is an node.js app framework that helps you structure, route, configure you webapps. A Mojito app is roughly a node.js package with a config file (in json or yaml) plus a set of directories, each corresponding to an independant widget - called “mojit”. Each mojit has resources (files) for each: model, view, controller, client-side assets.

search-apps-news
|-- mojits
|   |-- SearchBox/
|   |   |-- models/
|   |   |-- views/
|   |   |    `-- index.html
|   |   |-- controller.js
|   |   |-- assets/
|   |   ...
|   |-- SearchResult/
|   |   |-- models/
|   |   |-- views/
|   |   |-- controller.js
|   |   |-- assets/
|   |   ...
|   ...
|-- package.json
|-- ...
// the configuration file
`-- application.json

That’s the base application, that’s the forest. Now, say you want to try changing the view of the search box to add a button for some users to see how they react, but you also want to keep the mainline version to be served to most of the users, so you can compare both sets of users simultaneously. You will want to change search-apps-news/mojits/SearchBox/views/index.html. Right?

Well no.

This is a recipe for a spaghetti code disaster when you have 40 experiments on that search box. Besides, the logic that decides what user should get what view should be reusable and in the app framework (not your app). If you believe that, then mojito-dimensions-base just became your best friend. By including that package in the package containing your experiments, you can then create “mask packages” that mimic the structure of your app only for those files that you want to override for that experiment.

So to come back to experimenting on that extra button for some users, make a node package for your searchbox experiments that looks like this:

mojito-dimensions-experiment_searchbox
 |-- extrabutton
 |   |-- mojits/
 |   |   `-- SearchBox/
 |   |      `-- views/
 |   |          `-- index.html
 |   `-- application.yaml
 ...
 |-- otherExperiment/
 ...
 |-- node_modules
 |   `-- mojito-dimensions-base/
 `-- package.json

Done. As you can see, the extrabutton/ directory structure mirrors your base app but only replaces one file:mojits/SearchBox/views/index.html. An experiment can involve many file substitutions, but in this case, we only need to change a single file. application.yaml is the config that tells your app when to trigger that “mask” (= who are the “some users”).

Et voila! If you now define mojito-dimensions-experiment_searchbox as a dependency of you app, the file loaded when your request matches the extrabutton configuration will be the one from the extrabutton package, not the baseline! You can now easily develop experiment packages that you can test and maintain outside your baseline app, activate and deactivate at will, and merge easily when you determine you have a winner.

Jacques Arnoux (arnoux [at] yahoo-inc [dot] com) & David Gomez (dgomez [at] yahoo-inc [dot] com) for Yahoo! Search Front-End Platform