View my Software Development Blog

New Job

I’ll be working on system performance as a Sr. Staff Software Engineer at LinkedIn. Make it fast. / Make it stout, / Out of things we know about.


Back to School

I’m a student again! Here’s why. (I’m keeping my day job.)


Electronic Crime Investigation in Python

If you allow user-generated content, some of that content will be evil. Your “eCrime” team will need tools to fight evil, both to keep users happy and to satisfy three-letter government agencies. In this talk, you’ll find out how one of the biggest gaming companies uses the power of Python to build an eCrime investigation system that runs reliably, autonomously, and economically.

PowerPoint slides with notes
PDF slides with notes
Video

How to Speed Up a Python Program by 114,000 Times

Today’s big buzzword is “scalability.” Users who flock to the hot app of the month and just as quickly move on cause heartburn for Ops, late nights for developers, and revenue loss for our corporate masters. Our purpose as engineers is to make trade-offs between competing goals such as performance, reliability, maintainability, and extensibility. An over-emphasis on scalability has pushed that aside. Tonight’s talk tries to move the pendulum back to the center by showing you how one not-terribly-smart guy sped up a critical Python program 114,000 times AND YOU CAN TOO. The resulting system handles predicted data volumes for several years out, avoiding the need to run on a cluster and the resulting additional failure modes. It is maintainable, extensible, and reliable, running for more than a year with no unscheduled downtime.

PyCon 2013 slides (PowerPoint format)
PyCon 2013 slides (PDF format)
Video from the presentation’s beta test at the San Francisco Python Meetup, Nov. 2012.

Escape the Curse of the Cluster and the Headache of Hadoop
or: How I (Re)learned to Write Fast, Reliable Code

On-line games produce lots of data. Extracting meaning from that data is a typical “Big Data” problem. Doing it in Python, on a single machine, with high reliability...is unusual. In this presentation, you’ll see how to build a large-scale, highly parallel, continuous flow processing system to handle billions of events per day. By design, it never crashes and can add new functionality without downtime, all without the cost and maintenance problems of clustered systems. There is no magic here, just straightforward engineering and YOU CAN DO IT TOO!

PyData 2013 slides (PowerPoint format)
PyData 2013 slides (PDF format)

Writing Fast Code

Zynga slides (PowerPoint format)
Zynga slides (PDF format)

List Comprehensions for Fun and Profit

SF Python Meetup lightning talk slides (PowerPoint format)
SF Python Meetup lightning talk slides (PDF format)

Python Tips for PHP Programmers

Zynga is mostly a PHP shop, with a smattering of Java and other languages. For a PHP programmer to review my Python code, I wrote a little primer, Python Tips for PHP Programmers.


Letter to An Aging Engineer

Leaving Zynga, a co-worker asked for advice on pursuing his career in high-tech. Here it is.

Scalability at Wooga

Jesper Richter-Reichhelm, Head of Engineering at Wooga, made a interesting presentation on Scalability. I took notes.

Performance and Reliability at Twitter

In early 2009, I noted that Twitter’s performance and reliability problems came from using a LAMP architecture spread across many servers in multiple tiers. The design was a poor match to the requirements, which was to implement a simplified version of a 1950’s era Telex switch. I wrote and submitted a spec proposal to solve the problems with a single server model, a “Twitter oneBox.”

The company chose a different direction, adding additional architectural tiers and tens of thousands of servers. Performance and reliability are still in question. On the plus side, Twitter’s spending on hardware and support staff is helping the economy.

Here’s my proposal: MS Word  PDF


Sleepycat/Oracle Professional Services

Sleepycat Software owned the well-known “Berkeley DB” database and the more interesting XQuery-based “Berkeley DB XML” database. About thirty hours before the news hit the wires that Oracle had bought Sleepycat, I gave a presentation on Berkeley DB. I knew that word of the acquisition would probably leak, so I practiced answering questions that might come up. See the last two slides: PowerPoint  PDF.

Sure enough, a MySQL guy sitting in the front row and grinning ear to ear asked if rumor was true. “What rumor is that?” I asked. That let him have the floor for a minute, so he could feel good about his contribution. My answer was that the odds I’d be authorized to break major news at a weekend LINUX convention in a hotel near the airport were low, so either the rumor was false and I’d say so or it was true and I’d lie, but no matter how you ask, the answer’s the same. The audience enjoyed the entertainment.

Keith Bostic and I did a Professional Services engagement with a client in the Northeast. There was dysfunction on the client team and a lack of knowledge about how compilers work. The main goal of the one-week engagement was done within a day, so I prepared an impromptu session on compiler implementation to help the client make better use of the technology. The slide content is ok though the graphics are rough: PowerPoint  PDF. I used the experience to write a technical tip as a Dashiell Hammett parody just to see if I could. MS Word  PDF.

I was asked to make some presentations on Berkeley DB products at Oracle’s 2006 convention in San Francisco. One presentation went well, the other was crippled by management snafus. That’s life in a tiny acquisition at an enormous company; we were told that Oracle spends more on paperclips than it did to buy us. Anyway, here are the slides from the good presentation: PowerPoint  PDF. And here are the slides from the presentation that didn’t go so well: PowerPoint  PDF.


Hewlett Packard Software Job Skills

From 1994 through 1999, I taught HP’s Software Job Skills course. The course material, originally created by Mark Rose, needed continual updating as the economy entered the .com bubble. SJS was a two-year college curriculum with a strong industrial bias compressed into seventeen weeks of 40 hours of classroom time and 40 hours of homework per week. The course description is here and the course material, all seven volumes of it, starts here.

For the last course section, July-Dec. 1999, we ran two classrooms simultaneously. I taught the Greeley, CO class, with about 28 students. Christine Stamper and Larry Lustig were my teaching assistants and Varda Blum tutored a student with special needs. The other class, in Palo Alto, CA, had a former student from my 1995 Colorado Springs class, Judy Brodhead, as the teacher and Gerhard Paseman as a co-instructor and assistant. That class had, I believe, 14-18 students.

Robert W. Miller took over the class in 2000, so I could start a new company, Synthespia. When Carly Fiorina, HP’s new CEO, took the company in a different direction, SJS was canceled. Bob went to graduate school. His dissertation was an investigation of the long-term effects of SJS on the 135 students he and I taught. Many of the students had made a successful career switch to software engineering but felt that they had to leave Hewlett Packard to be taken seriously.


Pen^2 Computing and Belmont Associates (with Richard C. Roistacher, Ph.D.)

Working in the too-early tablet computing market, 1992-1994, I proposed a system for Computer-Aided Drama, to speed up film and TV production while reducing costs. The system was “ScriptoMagic”. Available technology wasn’t adequate and too much customer education seemed to be needed.

Now, with tablets in common use and nearly ubiquitous connectivity, this might not be the case. So in early 2014, I recruited a team to build a prototype. In parallel, I talked to a dozen potential users and customers in 25 hours of interviews. The interviews showed that people who liked the idea had no money and people with money had no use for the idea, as they have an endless supply of interns who work for free. I cancelled further development and stopped pursuing the matter. More information at scriptomagic.tv.


Precision Standard Time, Inc.

PSTI made computer-controlled radio clocks and cluster synchronization software for VAX and MS-DOS systems. (This predates NTP and SNTP.) Our clocks were used to synchronize about 1,200 intersections in Los Angeles. According to the Wall Street Journal, this kept 300-700 tons of pollution out of the air annually.

I replaced the engineer who had built the prototype, to rewrite his software into an industrial form. With only 8kB of ROM and 8 kB of RAM connected to an 8-bit microcontroller, real-time signal analysis was a fun challenge. I implemented an Artificial Intelligence algorithm, writing hypotheses about the time on a “blackboard” and removing failed hypotheses as new data came in, until the only remaining hypothesis was strong enough to proclaim correct. The determination of “strong enough” depended on the noise level so the likelihood of a false lock was low.

There were no analog engineers on staff, only three digital guys. They couldn't figure out how to build an analog decoder to pick up a usable AM signal amidst the muck on the 2.5-20Mhz band. I suggested a hysteresis detector and within a few hours, we had a viable analog front end. Cool. There was no documentation, so I took on the task of writing clear, usable documentation. Two customers said it was the best firmware documentation they had seen.

Running the assembler, linker, locater, and EPROM blaster took too long, sapping productivity. I moved the tool chain and output files to a 128kB RAM disk on my Toshiba T1100+ laptop (serial #5) and wrote a serial port downloader to run the code out of RAM, eliminating the blaster step. Turnaround time dropped from 10 minutes to under 30 seconds. The Version 4.01 source code (Hitachi 6303 assembler), build chain, and documentation (XYWrite format) are here.


Biography

David Schachter studied Electrical Engineering and Computer Science at Princeton University and has 36 years of industry experience at Fortune 500 companies (Oracle and HP) and numerous startups, including two successful exits. He enjoys designing and implementing fast, scalable systems for analyzing large data flows in real time.

In recent projects, Mr. Schachter designed and implemented a “Big Data” continuous flow log analysis system in 99.44% pure Python (Disney Interactive) and an “eCrime Investigation Tool” in pure Python (Zynga). Mr. Schachter is currently Vice President, Hadoop Principal Data Modeling Architect at the Bank of New York Mellon. David’s first project for the bank was a real-time information security analytics system using IBM streaming technology similar to Apache Storm and Twitter Heron. Recently, he was part of the “Digital Pulse” initiative at the bank. His current project is developing generalized chatbot technology and specific chatbot interfaces at the bank’s Innovation Center Silicon Valley as part of the NEXEN initiative.

Sample Electric Car Placard (PowerPoint) Profile on LinkedIn