View my Software Development Blog
I’ll be working on system performance as a Sr. Staff Software Engineer at LinkedIn. Make it fast. / Make it stout, / Out of things we know about.
I’m a student again! Here’s why. (I’m keeping my day job.)
If you allow user-generated content, some of that content will be evil. Your “eCrime” team will need tools to fight evil, both to keep users happy and to satisfy three-letter government agencies. In this talk, you’ll find out how one of the biggest gaming companies uses the power of Python to build an eCrime investigation system that runs reliably, autonomously, and economically.PowerPoint slides with notes
Today’s big buzzword is “scalability.” Users who flock to the hot app of the month and just as quickly move on cause heartburn for Ops, late nights for developers, and revenue loss for our corporate masters. Our purpose as engineers is to make trade-offs between competing goals such as performance, reliability, maintainability, and extensibility. An over-emphasis on scalability has pushed that aside. Tonight’s talk tries to move the pendulum back to the center by showing you how one not-terribly-smart guy sped up a critical Python program 114,000 times AND YOU CAN TOO. The resulting system handles predicted data volumes for several years out, avoiding the need to run on a cluster and the resulting additional failure modes. It is maintainable, extensible, and reliable, running for more than a year with no unscheduled downtime.PyCon 2013 slides (PowerPoint format)
On-line games produce lots of data. Extracting meaning from that data is a typical “Big Data” problem. Doing it in Python, on a single machine, with high reliability...is unusual. In this presentation, you’ll see how to build a large-scale, highly parallel, continuous flow processing system to handle billions of events per day. By design, it never crashes and can add new functionality without downtime, all without the cost and maintenance problems of clustered systems. There is no magic here, just straightforward engineering and YOU CAN DO IT TOO!PyData 2013 slides (PowerPoint format)
Zynga is mostly a PHP shop, with a smattering of Java and other languages. For a PHP programmer to review my Python code, I wrote a little primer, Python Tips for PHP Programmers.
The company chose a different direction, adding additional architectural tiers and tens of thousands of servers. Performance and reliability are still in question. On the plus side, Twitter’s spending on hardware and support staff is helping the economy.
Here’s my proposal: MS Word PDF
Sure enough, a MySQL guy sitting in the front row and grinning ear to ear asked if rumor was true. “What rumor is that?” I asked. That let him have the floor for a minute, so he could feel good about his contribution. My answer was that the odds I’d be authorized to break major news at a weekend LINUX convention in a hotel near the airport were low, so either the rumor was false and I’d say so or it was true and I’d lie, but no matter how you ask, the answer’s the same. The audience enjoyed the entertainment.
Keith Bostic and I did a Professional Services engagement with a client in the Northeast. There was dysfunction on the client team and a lack of knowledge about how compilers work. The main goal of the one-week engagement was done within a day, so I prepared an impromptu session on compiler implementation to help the client make better use of the technology. The slide content is ok though the graphics are rough: PowerPoint PDF. I used the experience to write a technical tip as a Dashiell Hammett parody just to see if I could. MS Word PDF.
I was asked to make some presentations on Berkeley DB products at Oracle’s 2006 convention in San Francisco. One presentation went well, the other was crippled by management snafus. That’s life in a tiny acquisition at an enormous company; we were told that Oracle spends more on paperclips than it did to buy us. Anyway, here are the slides from the good presentation: PowerPoint PDF. And here are the slides from the presentation that didn’t go so well: PowerPoint PDF.
For the last course section, July-Dec. 1999, we ran two classrooms simultaneously. I taught the Greeley, CO class, with about 28 students. Christine Stamper and Larry Lustig were my teaching assistants and Varda Blum tutored a student with special needs. The other class, in Palo Alto, CA, had a former student from my 1995 Colorado Springs class, Judy Brodhead, as the teacher and Gerhard Paseman as a co-instructor and assistant. That class had, I believe, 14-18 students.
Robert W. Miller took over the class in 2000, so I could start a new company, Synthespia. When Carly Fiorina, HP’s new CEO, took the company in a different direction, SJS was canceled. Bob went to graduate school. His dissertation was an investigation of the long-term effects of SJS on the 135 students he and I taught. Many of the students had made a successful career switch to software engineering but felt that they had to leave Hewlett Packard to be taken seriously.
Now, with tablets in common use and nearly ubiquitous connectivity, this might not be the case. So in early 2014, I recruited a team to build a prototype. In parallel, I talked to a dozen potential users and customers in 25 hours of interviews. The interviews showed that people who liked the idea had no money and people with money had no use for the idea, as they have an endless supply of interns who work for free. I cancelled further development and stopped pursuing the matter. More information at scriptomagic.tv.
I replaced the engineer who had built the prototype, to rewrite his software into an industrial form. With only 8kB of ROM and 8 kB of RAM connected to an 8-bit microcontroller, real-time signal analysis was a fun challenge. I implemented an Artificial Intelligence algorithm, writing hypotheses about the time on a “blackboard” and removing failed hypotheses as new data came in, until the only remaining hypothesis was strong enough to proclaim correct. The determination of “strong enough” depended on the noise level so the likelihood of a false lock was low.
There were no analog engineers on staff, only three digital guys. They couldn't figure out how to build an analog decoder to pick up a usable AM signal amidst the muck on the 2.5-20Mhz band. I suggested a hysteresis detector and within a few hours, we had a viable analog front end. Cool. There was no documentation, so I took on the task of writing clear, usable documentation. Two customers said it was the best firmware documentation they had seen.
Running the assembler, linker, locater, and EPROM blaster took too long, sapping productivity. I moved the tool chain and output files to a 128kB RAM disk on my Toshiba T1100+ laptop (serial #5) and wrote a serial port downloader to run the code out of RAM, eliminating the blaster step. Turnaround time dropped from 10 minutes to under 30 seconds. The Version 4.01 source code (Hitachi 6303 assembler), build chain, and documentation (XYWrite format) are here.
David Schachter studied Electrical Engineering and Computer Science at Princeton University and has 36 years of industry experience at Fortune 500 companies (Oracle and HP) and numerous startups, including two successful exits. He enjoys designing and implementing fast, scalable systems for analyzing large data flows in real time.
In recent projects, Mr. Schachter designed and implemented a “Big Data” continuous flow log analysis system in 99.44% pure Python (Disney Interactive) and an “eCrime Investigation Tool” in pure Python (Zynga). Mr. Schachter is currently Vice President, Hadoop Principal Data Modeling Architect at the Bank of New York Mellon. David’s first project for the bank was a real-time information security analytics system using IBM streaming technology similar to Apache Storm and Twitter Heron. Recently, he was part of the “Digital Pulse” initiative at the bank. His current project is developing generalized chatbot technology and specific chatbot interfaces at the bank’s Innovation Center Silicon Valley as part of the NEXEN initiative.Sample Electric Car Placard (PowerPoint) Profile on LinkedIn