Welcome

My name is John Hinnegan.  Currently located in Southern California. 

I'm a software engineer.  I'm currently running the tech side of the house at ThinkNear, changing the way small businesses find and attract customers.

I like scaling big, distributed systems, finance, politics, crossfit, football, being productive, learning new things, and mastering old things.

Profile

Co-Founder & Head of Software Engineering at ThinkNear
Computer Software | Greater Los Angeles Area, US

Summary

Officially hiring software engineers. http://careers.thinknear.com

Experience

  • Oct 2010 - Present
    Head of Software Engineering / ThinkNear
  • Feb 2007 - Present
    Software Development / Amazon.com
    Engineering experience building a targeted marketing, content management, and rendering engine for all of Amazon's credit businesses. The system supported real-time customer identification and segregation based on business supplied heuristics, delivering personalized ads and supporting the ensuing credit application workflow. Also built a full, scalable solution supporting instant credit approval. Developed Amazon's next generation payment instrument management system. Responsible for securely storing credit cards and bank accounts, and aggregating access to dozens of methods of payment. Oversaw the scaling from 0 to dozens of clients, from tens to millions of transaction per hour while maintaining 99.99% availability. Managed the Merchant Ordering Experience team and owned the 11 related services. The team's scope was to support third party order fulfillment on the Amazon platform worldwide; responsibilities included team, project, and product management as well as technical leadership. In addition to the back end services and databases, the primary application had 14 merchant facing pages which together are the highest trafficked part of the sellercentral.amazon.com site. The primary application managed seller interactions with Amazon during order fulfillment: tracking orders, confirming shipments, and managing returns/refunds. Third party orders had an annual run rate in excess of $10 Billion. Primary technology experience was in Java, both Jetty and Tomcat platforms. My teams designed and built massively scalable and incredibly redundant database architectures using Oracle as our primary data stores. Gained lots of experience with concurrency, bottleneck and performance analysis, distributed systems architecture, and high pressure trouble-shooting. Other noteworthy technologies used included: EHCache, Hibernate, Perl, Mason, SQL, and a significant number of Amazon proprietary systems.
  • May 2005 - Present
    Software Engineer / Armonicos Co. Ltd., Hamamatsu, Japan
    Developed solutions for CAD products in C++. Implemented a new file format for handling large files which reduced file size by 75%, save time by 90%, and load time by 98%. Introduced scripting capability based on COM standards, allowing users to script operations in VBScript.
  • 2004 - Present
    Developer / CenterPoint GmbH, Villach, Austria
    Developed WSDL/SOAP-based distributed communication solutions in C++. Retrofitted a proprietary RMI library to support WSDL-defined communication across platforms. Upon completion, the library was capable of dynamically offering, discovering, and consuming web services.

Education

  • 2001 - 2006
    University of Waterloo
    B. Math in Honors Computer Science
    Activities: Sigma Chi Fraternity -- extensive involvement including President, Vice President, Treasurer, and more.
  • 1996 - 2001
    South Secondary School

Additional Information

Websites:
Interests:
CFA Candidate

Posts

January 05, 04:58 PM

Prediction: Eric Kessler will change his views on cable cutters, or he will no longer works at HBO.  Only a matter of time.  

He epitomizes how out of touch media companies are with technology.  This quote is:
"At that time, Kessler also said his company sees cable-cutting as no more than a temporary austerity measure that will cease as soon as the economy takes a turn for the better."

Anyone have a quote from telecoms a decade ago talking about cell phones not being a thread to phone land lines?

Permalink | Leave a comment  »

November 25, 01:53 PM

CS 193H High Performance Websites

Good work Stanford! Producing graduates with applicable skills — so valuable.

I’ve thought Universities needed more applicable tracks for a long time. I’ve head people argue against me, too, and I’ve never felt their arguments held much water. One that’s been brought up a few times is that “they teach concepts applicable to lots of scenarios, not specific implementations”. I think you can learn how real-world people have applied the concepts to real tools, debate their decisions, and come off with a well-rounded understanding of the theory plus an understanding of a specific example. As an employer, I’d prefer you to know how to tune a garbage collector — any garbage collector — than not. The concepts port well, but if you’ve never done it before, it takes a long time.

I think the real reason there isn’t more applicable classes at a lot of Universities is because the profs can’t teach them. A lot of profs have never worked. They got a Bachelors, a Masters, a PhD, and now they teach. Or else when they did work it was decades ago, or maybe in a research center, or wherever — regardless, I don’t believe there are many professors around with Twitter, Facebook, Amazon, or Google experience. By the end of their PhD program, they’re under-qualified to teach anything practical, because they may not know what is practical and what isn’t. (To be sure, this does not apply to all profs.)

Don’t get me wrong, PhD work is incredibly valuable, and leads to amazing real-world implementations of breakthrough theories, but the professional implementors are generally underrepresented at Universities. Students graduating with a Masters or a PhD are often less qualified for practical software engineering jobs. Stanford seems to be addressing this, and I hope other Universities follow.

Other classes I’d love to see:

CS 19X Latency-Constrained, High-Volume Services

  • load testing
  • tuning the JVM
  • detecting and determining bottlenecks
  • scale out, not up
  • estimating hardware needs
  • service redundancy
  • DB sharding
  • DB failover strategies and implications
  • DB growth analysis and strategies
  • seamless deployments and rolling back
  • threading horror stories
  • server security and authentication

CS 19Y Unix Administration for Developers

  • shell programming and the profile/zshrc/etc
  • applications run as users
  • what’s listening on that port? (and network strategies)
  • server tuning and hardware selection
  • logging
  • AWS-applied (from zero to service stack)
  • backups, archives, storage, and recovery
  • DNS
  • connectivity
  • keys, passwords, tunneling and other security concerns

CS 19Z Practical Development Practices and Tools

  • source control
  • unit testing
  • integration testing
  • continuous build & test, code coverage
  • dependency management 1, POM Hell, an introduction to Maven
  • dependency management 2, bundler internals
  • deployments
  • REST vs SOAP
  • real-world commenting, READMEs, and technical blogging
  • bug tracking & sprint planning

Permalink | Leave a comment  »

November 23, 01:52 PM

I just read a blog post off hacker news: Why loading third party scripts async is not good enough. It reminded me of someone I used to work with at Amazon who would regularly find errors in our applications. This was quite a feat at Amazon because we instrument everything. We have regex’s constantly parsing logs looking for errors, we have a dozen kinds of monitors collecting host metrics, server metrics, client metrics, business metrics, coffee temperature metrics, etc. all constantly checking “is your cpu load high?”, “do you have enough free memory?”, “how many times did you show pictures of the Twilight case?”, etc.

This one engineer (on a team of exceptional engineers) was consistently the only one to find errors. It was definitely very healthy for the team but …. engineers secretly hate this because, by definition, finding errors means he’s pointing out faults in your work. Managers less secretly hate this because it means he’s ‘creating’ high priority work that gets addressed ahead of their projects.

So with all these metrics and monitors on a team of high achievers, how did this one person on our team keep finding errors? He looked at the logs.

That was his secret weapon; reading logs! It’s like grade 1 of service maintenance. With all our monitoring, regex’s, and features we thought we were too good to ‘just’ read logs. The rest of the team would release features, put regex’s to detect our errors, trace a few requests after launching, and then move on to the next project. I honestly don’t know how much time he spent on it, but every week or two, he’d come in and explain how our programs were messing something up.

Things like:

  • requests to a dependency fails. We monitor overall failures, and accept failures of less than 0.1% (just hiccups and connection problems, right?). Turns out our dependency never worked for 0.1% of our customers.
  • we have a dependency known to have errors, but retries often succeed. We will retry every request once before raising an error. Our dependency makes a change which we don’t notice, but our retry rate goes from 2% to 50%.
  • you have ‘targeting’ params which you consume if available (i.e. the http referrer header). You make a change which loses this data in the course of a request and now you’re never using it to target.

There were three morals of this story:

  1. Drill down into your metrics and understand where they are coming from (and their deficiencies)
  2. Your monitoring will never be perfectly reliable — you regularly need to just randomly re-verify things are working
  3. Every time you catch a problem, install the proper monitoring to make sure it never happens again

In my experience, the most likely error is one you’ve seen before.

Permalink | Leave a comment  »

November 20, 02:41 PM

Applying for a job as a software engineer?

The odds are you have a bad resume (since >50% of the resumes I’ve seen are bad)

The Resume

Objective

No! Waste of space. It can only hurt. Your objective is to get a phone call or email. Don’t apply for jobs you don’t want. If you’re unsure, just apply like you want it and ask questions in the first conversation.

Length

2 pages max, 1 page if possible

Format

Prefer PDF — it’s more universal. Windows is losing market-share in development circles. Word is not universal.

Check the job description — if they indicate a format, follow it.

If you do use MS Word:

  • email it to yourself and verify it looks good in Google Docs (I’m not going to download it)
  • email it to a friend with a Mac and verify it looks good (some people do download it)

Credentials

  • CS Degree at the top
  • Certificates at the bottom or omitted
  • Github account at the top (as a link)

Experience

  • One entry per company
  • Include years worked there
  • Multiple roles => more bullet points
  • What was your role? What did you do?

Technologies

  • SMALL LIST (wow, you know XML? Really? Cause that’s hard to find and hard to teach … not)
  • technology stacks and platforms are enough
  • don’t list things you would be uncomfortable interviewing in (with 1 days notice)
  • Seriously! You list it, the interviewer can ask about it, and expect you to code in it

Good Examples

  • Java: Tomcat on AWS with MySQL
  • Ruby: Rails on Heroku with Postgres
  • Web UI: HTML/CSS/JS on PHP

Bad Examples

  • Java: Tomcat, Jetty, Maven, Ant, JUnit, XML, SOAP, Hibernate, JSP, blah blah blah
  • Ruby: Rails, Rake, JSON, Devise, Cucumber, Webrat, RSpec, VCR, FactoryGirl, ….

Small caveat

If you’re applying to big companies that are tech ignorant, you might need a laundry list to get past HR.

Extra Curriculars

  • yes if you’re just graduating for college
  • yes if they’re significant
  • for non-new grads, at most one line (unless you’re putting in 5+ hours per week)
  • don’t list every club you’ve ever been a part of
  • don’t list activities that include demographic information (Gay and Lesbian orgs, Christian Missionaries, etc)

Don’t list things that some people “don’t get”. No need to mention your involvement with D&D, Video Game communities, or Twilight fan clubs

Cover Letter

Generally not required.

When is it a good idea?

  • when you don’t have a CS degree
  • when you’re applying out of your depth (UI developer applying to be DBA?)
  • when you really want the job and can speak intelligently about the company

When is it a bad idea?

  • when you copy paste the same cover letter every time
  • when you have nothing to say (i.e. you’re summarizing your resume)

Coding

Active on the job market? You should have one of three things on your resume

  • a great school (Stanford, MIT, Waterloo, UIUC, Carnegie Melon, etc)
  • a great company (Google, Amazon, Apple, Facebook, or a known startup)
  • a Github account with a bunch of stuff in it

Permalink | Leave a comment  »

November 11, 11:16 PM

There's a very undervalued skill I've observed over the years writing software, and that is complaining (to be valuable, you do have to be good at it).  I've observed it in Amazon's trouble ticket system, in open source communities, asking questions on stack overflow, and pretty much anywhere else engineers interact collaboratively.  Learning to complain accurately and precisely is an incredibly valuable skill and is a hallmark of great software engineers.

Creating and working tickets at Amazon was great experience for developing this skillset, and one I greatly under-appreciated at the time.  At Amazon, every team has a ticket queue (or 5), and every problem can be assigned to a ticket queue.  Tickets can move between queues on a whim, but tickets must always remain in a queue and every queue has an owner -- thus, every ticket has an owner.  Additionally, there's pressure from management to have fewer tickets in your queue at all times, and to have fewer 'high severity' tickets pass through your queue.  The result of this environment is that when you open a ticket against another team, you want to be careful it doesn't come back to your own queue (more on how this works below).  Your software quality is measured indirectly through how many tickets you receive.  (It's more complicated than that, but the way I've seen it used it is a valid proxy.)

Lets take an example.  I'm working on some software, and I find what I believe to be a defect in someone else's code (Gasp!).  I might open a ticket against the owner of that software to request a fix.  Should they determine that their systems are working as designed (read as "you're the who made the mistake, check your own code"), they might reassign the ticket back to my queue -- after all, they have determined they have no action to take to resolve the supposed issue.  Yup, now both our metrics show that ticket passing through it, but I own the next step.  This pattern results in at least a moral incentive to not have tickets you create to come back to your own queue (from firsthand knowledge, it is rather embarrassing when it happens).  The only way to get them to 'accept' the defect report is to demonstrate or provide evidence of the error.  The result is that you learn to write great tickets.  This same skill applies to (some) questions on StackOverflow or issues on GitHub.  Being able to precisely and accurately diagnose the issue by identifying expected behavior and contrasting it with the observed behavior takes time and diligence.  Furthermore, it requires you to eliminate possible sources of error in code or systems you control before concluding the error lies outside of your domain.

Great complaints usually include enough detail to find and trace the issue in question (request id for log searching along with approximate time and date of request), logs clearly demonstrating the symptom in question, and/or code that can deterministically reproduce the issue.  (If it's non-deterministic, it's an order of magnitude harder to diagnose ... good luck).  

Once you've gone through this exercise a few times (and had other people politely reject your complaints due to lack of information), you start to get pretty good at creating tickets.  Also, the exercise is valuable in itself, and often leads to a resolution of the issue before you can complain.  On numerous occasions I've started asking a question on StackOverflow or on Github and in the course of asking the question and creating reproduction steps I have solved my own problem.  I will see a behavior that appears to be an error in someone else's code; I will start working on an issue; go looking for an example, some code, or logs to make my point, and discover the issue lies elsewhere.  I'd say this happens 80% of the time I start a question or issue.

In my interactions with other engineers, it's become clear that the ability to speak precisely about what is happening in and about code is correlated with those able to complain clearlyt.  It's hard to establish causality, but improved complaint quality appears to be highly correlated with engineering talent.  Not sure if I could actually screen for it, but it's an interesting question.

On a final note, while on one of my previous teams, we used to joke that we would get a large stuffed bear.  Before bothering another engineer for help, you would have to get the stuffed bear and explain your issue to the bear.  If you were unable to demonstrate your issue to the bear, you couldn't bother another engineer.  This would simply force the thought process of stepping back from an issue to a higher level, explaining the inputs and outputs, and tracing the code through holistically -- things like saying out loud what different variables contained.  90% of the time, you'd find your issue was a misassigned variable, or an instance of a class you were not expecting, or some other 'menial' error.  

I swear, if we had actually gotten the bear, he would have immediately become the most helpful member of the team and surely improved our ability to troubleshoot issues.

Note: Just so there's no confusion, I'm discussing one of many metrics used by Amazon internally, and even this one isn't applied as bluntly as I might have described.  The abstraction works for my use case, but please don't jump to conclusions about Amazon's management practices based on this.  Also worth noting, it's been more than a year since I last worked at Amazon, so it's possible the anecdote is no longer accurate.   

Permalink | Leave a comment  »

October 30, 03:22 PM

Part 1: Waterloo fails a lot of studnets majoring in Computer Science.

A lot!

When I was attending, you needed to maintain a 65% major average, and every class kept it's averages under 70%.  In my first year, I was enrolled in Software Engineering (the major), which was basically comprised of the top people from Computer Science and Computer Engineering in one class.  We had a midterm once with an average around 78%, and were told it was too high.  I've also heard profs say that 72% is too high for an average.  Averages under 60% are the only ones I've ever heard of as being too low.  From what I gathered as an undergrad, anywhere between 60-70% was okay, and profs tended to shoot for ~67%.

That's an average only slightly higher than the required major average.  The marks were not normally distributed, so the median was actually lower.  There are a group of exceptionally bright people at Waterloo, that create a small hump somewhere up in the 80-90% range, then the larger hump is right around 60%.  This is certainly generalizing, but I believe it to be close to truth for most classes.  (A 50% is a pass.  It's an 'honors program', so if you graduate, you get honors -- you can't graduate with CS and not get honors at Waterloo.  In theory, this makes a 65% equilvaent to an 80% at a lot of other schools.)  

Every class I took was essentially graded on a curve.  If the midterm was easy, you'd get slaughtered on the final.  If everyone failed the midterm, the final would be not so bad.  I had many profs resort to, what I consider, underhanded techniques to lower marks.  Assignments without marking ledgers, exams where 25% of the marks came from 1/2 of 1 lecture, etc.  

The result of failing about 5% of the class every term results in survival of the fittest.  Waterloo CS grads have a good reputation partially because they're the top half of their class by definition of being a graduate.  In retrospect, this actually makes a Waterloo CS degree that much more valuable, and improves the reputation of graduates that much more.  The only downside is that Waterloo's incredible bias towards lower marks (especially when compared to our neighbors in the US) makes it that much harder to go into grad work.  Your 65% at Waterloo might be equivalent to an 80% at most US schools, but no one knows it and so your applications will just get filtered out of consideration.

Part 2:  Am I right?

So, here we are.  I'm years out of school, and I've had this intuitive belief that Waterloo CS fails a lot of people, certainly way more than their advertised 12% drop out rate.  I've always believed the number to be closer to 40%. (http://analysis.uwaterloo.ca/statistics/cudo/cudo_2010/htmlSectionK.php#sectionk2a).

So I went and found the number of students enrolled as FTE in computer science: http://analysis.uwaterloo.ca/statistics/cubes/regist_page.php
And the number of degrees awarded in computer science:  http://analysis.uwaterloo.ca/statistics/cubes/degree_page.php

In the table below, student count is # of full time students in the start year.  By the FTE definition, this should capture every first year student in CS.  The Degrees Awarded is in the same major (CS) from 5 years forward.  So the 2005/2006 start year has the degrees awarded in 2010, which is the latest complete data available.  

Start Year Student Count Degrees Awareded Percentage
2005/06 280.8 222 79.06%
2004/05 416.8 215 51.58%
2003/04 548.2 339 61.84%
2002/03 667.5 354 53.03%
2001/02 604 448 74.17%
2000/01 668.5 514 76.89%
1999/00 618.5 462 74.70%
1998/99 548.5 421 76.75%
1997/98 641 387 60.37%
1996/97 513 310 60.43%

The average CS graduation rate over 10 years was 66.88%.  This is almost exactly failing 5% of the class every term, or failing 10% of the class every year.  Which works out to roughly what I believed intuitively.  

Some notes on the method:  

I tried looking through year-by-year to see if I could get a handle on the annual drop out rate, but it wasn't really possible because the definition of a full time student is not compatible with the co-op program (in which >80% of CS majors participate); and because the program has 4 year's worth of classes, but is a 5 year degree.  So we just look at starts and degrees.  Also, regular students (not co-op) graduate 1 year sooner than their co-op counterparts, and I also want to capture (without penalty) those graduating late.  For these reasons, I wouldn't interpret 2004 to have been the hardest year to start and 2005 as the easiest, but just that there was probably some skew in numbers here between co op and regular (though the they got much more selective in 2005, which could also explain the graduation rate.  

It's worth pointing out that in 2001, the Software Engineering program started, which potentially drew students to it who would otherwise have been in the CS program.  

Around 2005, Waterloo started offering a Bachelor of Computer Science to which it was easy to transfer from the existing CS program.  It's unclear to me how to account for the difference between the new CS degree and the old BMath/CS.  The interface for the data did not make it clear to me how to distinguish them.  That said, the Software Engineering program took in 103 students in 2004, and then awarded 52 degrees (50% success rate).  I suspect transfers from that program may have goosed the 2005 graduation rate numbers.  In theory, you fail out of SE, and finish a CS degree 1-2 terms behind your start year.  I don't have the data to substantiate this theory.

Since students transfer from Software Engineering into CS, but CS students (generally) cannot transfer the other way, it's possible I've overestimated the graduation rates (at least as they apply to your chance of graduating, given that you're in first year.).  On the other hand, if Waterloo is accounting for the BCS separately, my numbers could be off -- maybe the people who started in '02 and '03 switched to the BCS and earned that degree?  But then, how do I account for the difference between BMath/CS and BCS undergrads?  Did the BCS take over the 'CS' tag startin in '05?  Are the BMath/CS undergrads now lumped in with the BMath kids?  And finally, I'm not sure how students failing out during first year impact the numbers.  They definitely existed, but I'm getting my first year counts based on the number of students who took 2 full terms in their first year.  

Since these questions are still floating around, it's probably best to take my result with a grain of salt.  However, given that (AFAIK) there was a lot more program stability prior to 2001, and assuming that Waterloo hasn't fundamentally changed its policies on grades and pass rates, it would seem that even if my conclusion is not significantly far from reality.

And reality is that it's damn hard to get a degree from Waterloo with the words Computer or Software printed on it, and that's okay.

Permalink | Leave a comment  »

October 30, 01:18 PM

As I get deeper into AI, I realize that there was a math class I never had which would have been incredibly valuable as an undergrad.  This class should have been day 1 of my undergrad, perhaps repeated every year, and certainly addressed for 20-30 min of each math course I took.  That class would simply be a survey of mathematics, or maybe just, math from 10 000 feet.

As I dig deeper into unsupervised learning approaches, the solutions combine statistics with calculus and also require algebra to resolve.  The intuition for many of the solutions comes from graph theory and geometry.  Practical approaches require computational mathematics, statistics again (for estimations and acceptable error), and signal processing.  Finally, truly practical applications often require computational resources and data stores that are enabled by (to be grossly general) computer science while delivering them on schedule and in a maintainable state is software engineering.  

I disliked most of my math classes through college.  Many of my professors were very poor teachers which was compounded by their poor command of the English language.  The courses were poorly organized, exams poorly written, and notoriously hard to study for.  As an example, my Stats 231 (stats 2 for math majors) final exam was worth up to 80% of my mark, but marked out of just 38.  It was written by another prof, and required the cumulative learnings from stats 1 and stats 2, but advertised as not retesting material from stats 1.  A very unpleasant experience at the time, but quite successful at failing at least 10% of the class, which is about right.  (For reference, I started with somewhere in the neighborhood of 1000 students in my major, and somewhere around 600 graduated.)

I feel that keeping the bigger picture in view -- applications of the math we were learning, or maybe starting with 1-day intro to AI or other advanced topics and just point out the math needed to go into them -- would have been immensely valuable.

Permalink | Leave a comment  »

October 29, 09:45 PM

When we interview for technical ability here at ThinkNear, we are seeking raw talent. Raw, not to mean that you’re lacking experience, or applied ability, but to mean that underneath everything you can do, there is real, innate talent. The thought process is: anyone can get pretty good at something if they put in long hours, but some people have learned the fundamentals so well that they can learn new tricks much easier than others.

Technically, this means we want you to know a few languages, just because knowing multiple languages helps you be better at all of them, but you should know one really really well. The punchline is that I don’t care which one. Whichever one you choose, though, I hope you know it really really well.

If you know Lisp really really well, then I think you can pick up Java and be better than a lot of Java programmers quickly. Or Ruby, or Python, or C++, it doesn’t really matter, because you’ve learned one language so well. You’ve gone deep, optimized, written your own lists to suit your needs, or written your own packages to upgrade a toolset.

When we look for technical smarts at ThinkNear, we want to see evidence of your skill and past accomplishments. Today, on a blog post about .Net, my point was made great a couple of times. Here’s the post: http://samsaffron.com/archive/2011/10/28/in-managed-code-we-trust-our-recent-...

If you’ve tuned .Net, you are probably well qualified to tune Tomcat, or Django, or Rails. You just know what all the types of bottlenecks are.

I’ve had this on my mind for a long time, but Joel finally gave me a concrete example to make my point.

Obligatory Old Timey Story: in the 1980s garbage collection was a real performance problem on Lisp machines, which were lucky to have a meg of memory. There were a few elite Lisp hackers who practiced what they called “cons-less programming” – cons is the only thing that allocates memory in Lisp – to avoid GC. Since cons is really the fundamental building block of just about everything you do in Lisp this was kinda difficult, but the technique is almost identical to what you’re describing here. — Joel Spolsky

Roughly translated, if you were good at something a couple of decades ago, there’s a good chance you’d have the right intuition to help on this problem. The specific skills have become obsolete, but the raw talent would continue to pay dividends.

Permalink | Leave a comment  »

October 25, 07:58 PM

I just went to pay my bill on the LADWP (don't worry about it), and I didn't know what my username was.  I have had a few over the years.  Turns out the one I had used for this side was 'jehinnegan'.  In order to be reminded of this, I had to supply by email address and then answer one of my security questions.  

A security question?  Really?  For my username?  What, exactly, do you see about my username that you think I need to take a rather meaningless step in verifying my identity.  Given that my email address is just my name at gmail, it's unlikely that any of the superficial questions you asked me 'for security' would really do anything.

So that brings me to my real question -- why do I need a username?  Why can't I simply log in with my email address and password.  In this case I'm paying a bill, so I sure wouldn't want anyone to hack my account and pay it for me.   

This goes for lots of other sites -- Reddit and all my banks, to name a few.  Some sites, like Reddit, have a pseudonym that is surfaced to other users, there's no reason I need to know that in order to log in.  I can set it once as part of signing up, and then be able to edit it somewhere.  I should be able to log in with my email.

If my banks think that there's any security in having a username over an email, then I need to move my money to somewhere that understands internet security.  If they're really worried, they could just send extra emails anytime a substantial change happens -- in fact, I hope they're doing it already.  

Seriously, though, these logins are just taking up space in my head!  They add zero value to me, my experience, your website, or the world.  They are extra bytes in your databases and my brain, making both less efficient.

(I regularly clear browsers' memory and use multiple browsers in the course of my job.  Maybe everyone else has just trusted their browsers to remember all this for them?)

Permalink | Leave a comment  »

October 19, 06:03 PM

I posted on Hacker News today about trying to hire AI/big-data people, and what we're looking for (shameless plug: we're hiring careers.thinknear.com).  Aomeone sent me an email asking about how to get into the field from college.  Here was his question (slightly edited).

I'm just under two quarters away from finishing my Masters in Computer Science (..). I'm (..) in the process of evaluating my post-graduation opportunities at the moment, and have found it hard to have any real goals beyond wanting to be good at machine learning / data mining at some point in the future. I was wondering if you could offer me any advice about any particular things I should be focusing on at first, or if there is a particular progression that you think would make the journey easier?

My response was a lot longer than I had intended, so I'll share it here.  It's not terribly well structured, but hopefully someone else will get some value.

First and foremost, you need to be able to code.  There's a level of competency that a lot of engineers can hit within 2-3 years of graduation, that is basically 'professional competency'. It's important to hit this early.  People with your skillsets have to be able to model on their own (that is, get something working) and then be able to supervise/advice teams that implement full scale versions.  In a lot of companies, if you can't prototype, you're not terribly useful.

I would not shy away from big companies, but be very specific about what you will be doing or what team you will join.  A lot of big companies (Google, Amazon, etc) may allocate new grads somewhat randomly or where they need to put them.  If you go the route of a big company, just keep in mind that you have all the power -- you can choose where you want to work down to a team (as long as that team is hiring), because you can always go somewhere else.  They will accomodate your special requests if they like you.  Having a big name on your resume early on will really set the tone for your career.  Working at Google is kind of like going to Stanford -- it matters as much that you got in and didn't get fired as it does what you did there.

On the flip side, you will probably learn more practical skills and have more fun going to a smaller company like ours that need hands on data people.  There's a lot of 'career infrastructure' to get in place, and the sooner you can do that, the sooner you can work on the cool stuff.  The 'infrastructure' is stuff like: getting good at setting up hosts and environments and experience writing software well (using git, editors, code reviews, troubleshooting bugs from production).  Compared to academia, there's also something very alien about working on code for years, working on a product that is tens of thousands of lines of code running on dozens of hosts, and maintaining code that was written 5 years ago by someone you've never met.  These are just valuable skills wherever you go.

If you pick the right small company, it's better career wise than the wrong one, but the big companies that everyone knows are probably safer (possibly not as good career-wise, but pretty much guaranteed not to be bad).  If you have debt, it's hard to go wrong going to the big companies which definitely pay a premium (I say think its because they pay you to write software AND to tolerate a bunch of bureaucracy).  If you go to the startups, really meet and like your boss and the founders.  You're basically betting a lot of money on their success, so be sure you want to do that.  Don't buy hype or sales.  Look at their resumes -- look for experience and a track record of success.  (Or apply to YC or TechStars and try to start your own -- you'll probably fail, but you'll probably learn the most about yourself and life this way.)

Side interests are really important.  I just searched you, and didn't find anything on Github.  Create an account today.  Just upload assignments you have from school, or anything that you've coded (provided you're allowed).  Get a bunch of code samples out there.  Go grab a great text book and solve random problems, and post your solutions.  The goal is really just to show activity and interest more than that employers will look at specific code samples.  If you can, get involved in an open source project -- just fix a bug or two and get your name out there as a contributor.  If you're too busy, then just read blogs and ask questions.  People love it when you comment on their blogs, and are more than willing to help, even if the question is not 'a good one'.

If you're a writer, write a blog.  This is also a good idea if you know you suck at writing, as it's an opportunity to practice.  Pick an algorithm or something and just discuss it or explain it in your own words.  If you ever implement anything out of papers, discuss that.  Here's about the best example of a niche blog I've ever seen:  http://blog.echen.me/  Most of what he does is over my head, but I know he's really interested in AI and data analysis.  If you're not a writer, just be active in the comments of other people's blogs.  You have a very unique name, which is a great asset in the world with Google, but when I search, I get 1 page.  There's no excuse for the first 3 pages of Google not to all be about you -- on twitter, commenting on other people's blogs, asking or answering questions on stack overflow.

Okay, I wrote much more than I had intended.  The bottom line, is as a new grad, you need to get the 'software engineer' check box out of the way so you can get into AI and Data work.  Ideally, find a company where you can do that in the context of where you want to go.  Analytics groups at Google or Amazon, Search at Google, or even DuckDuckGo.  Personalization and recommendations at Facebook, Linkedin, or Amazon.  This early on, the wrong startup can be a land mine for your career (though the right one can be a gold mine).  Once you get 2-3 years of solid software experience, if you're not onto data-mining and AI problems at your current job, then it's time to shop around.  Just don't be in a hurry.  Realize that you'll be going up against people with decades of experience and PhD's.  You can leapfrog them over a few years, but don't think you'll do it 1 year out of college or underestimate how hard some people work on improving themselves on an ongoing basis.

Be humble, go where smart people are, and learn from them as much as you can.  Realize that learning is ongoing.  The most talented people out there are spending hours every week in study and practice not to mention their professional endeavors.  Strive to join their ranks.  

All the best.

John

Permalink | Leave a comment  »

October 18, 04:33 PM

A public service message to all colleges and universities:

On your main landing page, please provide a place/link/message for prospective employers.  A link like 'hire', 'recruit', something that takes employers to a page that teaches us how to get in touch.  As an employer at a small company, my thought process is "lets hire a college grad", then "such and such school has a good reputation, so lets see if we can get a student from there".  Then I go to your website, click around a few pages about how great your campus looks, how great your alumni are, and why, were I a high school student, I should consider applying.  Then I get a call that our servers have caught fire and never come back.  

Please make it easy!  At the very least, please have a general way to contact you front and center

A public service message to all high school students deciding between colleges and universities:

If you are deciding where to go, please look at the schools' websites through the eyes of a potential employer.  If you can't figure out what you'd do to hire a graduate from that school, the odds are other employers can't either.  That's a good indication that that school is out of touch with employers or may not be very helpful in helping you find a job after graduation.  If there's just a phone number, call it during regular business hours, just say something generic like, "I'm with a company and we're interested in hiring" and see what happens.  If you get through to someone (and it's quite possible the operator will be unable to help you), but that person is unable to help you, then expect real employers are having similar experiences.    

As a shameless plug, this is something my alma matter does this exceptionally well:  University of Waterloo http://uwaterloo.ca/ .  Right at the top, there's a link for me.  A very simple feature to emulate, and there's no excuse for schools not doing it.   

Permalink | Leave a comment  »

September 04, 04:41 PM

It seems that every few months another article about how to ‘hire a technical cofounder’ comes out — written for the multitude of would-be CEO-type co-founders. From the other perspective, the would-be CTO’s, I’ve not seen a lot of advice on how to choose a CEO. This is potentially a more difficult decision because you probably have more options and evaluating CEO talent is less objective than evaluating CTO talent (which is still not really objective, but more objective than evaluating CEO’s). The following are some criteria that I used, and would use again. I would say you need to answer an unqualified yes to each point, or you shouldn’t start the company.

0) Take your time. Don’t pick a CEO in a weekend, or a week. Take at least a month, if not more. You’re betting your time, energy, and (through opportunity cost) thousands of dollars on your partnership — don’t make the decision quickly.

1) Potentially the most obvious: is the CEO passionate about the idea? and are you just as passionate? If the idea is originally the CEO’s and he’s the first person you hear about it from, give yourself a month or two before committing. Wait for the ‘honeymoon’ to wear off, and make sure you still really care about your would-be customers and how you plan to serve them.

2) Do you agree on the premise of the business. It’s important that the you and the CEO are on the same page with regards to the premise or overall goal of the business vs a specific idea or implementation. Your first attempts will not work.

3) Have the conversation about funding and exiting. Are you going for VC money or planning to bootstrap? Are you open to being aquisi-hired by Facebook or Google? or are you only trying to be the next Google?

4) Would you bet $100, 000 that the CEO would find a way to succeed, with or without you? This is essentially what being a technical cofounder is doing. Presuming you’re well qualified, it is unlikely you will be able to pay yourself more than half what Google, Microsoft, or Amazon would be willing to pay for your services. You will effectively exchange the difference in liquid compensation for illiquid founders equity. Over a couple of years, the cash-equivalent difference is likely to exceed $100 000. So, before you choose to work with a CEO, make sure you would be prepared to bet $100K on his success. (If you think it’s a good bet, but not without you, then why would you partner with him?)

5) Is your CEO distinctly qualified in multiple ways? Before starting ThinkNear, Eli had run his own business, gotten his MBA from Harvard, and worked as a product manager at Amazon. If he had just one or two of these 3 ‘resume bullet points’, I’d have been far less likely to join him.

6) How great are his references? Get references. Get lots of references. It’s the easiest thing to check on, and a great CEO should be able to product dozens of great references on demand. If he doesn’t have a bunch of (non-technical) friends or colleagues who would start a business with him at the drop of a hat, something’s not right. Yes, at the micro level, each reference is hand-picked as someone who strongly endorses the person. You’re looking for the macro — does he have a lot of people with great resumes who would strongly endorse him?

7) Can he sell? Your CEO needs to be able to sell your product to customers, sell your company to investors, sell you to potential hires. You need someone who’s great at selling on the founding team, and if it’s the technical cofounder, you’ll be in trouble.

8) How do investors perceive him? This matters whether you’re fundraising or bootstrapping your way to success. You need a CEO who investors like, get along with, or at the least respect. Lacking respect will kill your company if you’re planning to fundraise (if investors don’t like your CEO but still respect him, it will just be harder but probably not impossible). Why does this matter if you’re bootstrapping? Because investors (by and large) know what they’re doing. If they don’t respect him, there’s probably a reason. At least one of the CEO’s references should be an investor.

9) Do you get along? Go grab some beer, and hang out for a couple of hours. Do this a few times. Bring up contentious issues — religion, politics, company values, etc. You’re trying to find something you’d argue over and see how you handle different opinions. Talk.

10) Does the CEO have similar economic standing (not off by, say, more than an order of magnitude) as you do? This should probably come up in prior points, but if selling for $30 mil vs $300 mil is the difference between life-altering sums of money for one of you and not the other, I would seriously consider not partnering.

11) Does the CEO have a track record of success? I would be more inclined to go work with someone on their first attempt at a company than I would someone who has started and failed at two or three businesses. There’s something to be said for persistence, but there’s also the track record. Without having any idea who the person was or anything about the companies, someone who has started more than a couple companies (which are not just consultancies or other lifestyle type businesses) and failed to grow or exit any of them successfully is either lacking some skill, competency, or trait; or is unlucky. Either way, I don’t want to bet on them.

There’s my list. Hopefully someone finds value in it.

Permalink | Leave a comment  »

August 08, 01:56 PM

Hack their products to do something cool.

"Mansour says that Hangouts are currently powered using the legacy Google Wave technology and jQuery using Google Shared Spaces. To build the extension, he imitated the YouTube Hangout Gadget and intercepted all messages coming from Google Wave to the extension. The extension broadcasts a unique ID which is stored in Wave that every participant can read when they join. That ID is persistant for that Hangout session only."

I predict, if he hasn't already been interviewed or approached by Google in the past, he will be soon.

Permalink | Leave a comment  »

August 06, 04:18 PM

Software Engineering is a skill. It’s like playing the piano, playing a sport, or being a surgeon. It takes a lot of practice and experience to reach a level of competency, and then an order of magnitude more to reach a level of mastery (or become a ‘rockstar’). I’ll be borrowing the musician analogy.

People who do not practice a skill may not be able to discern the difference between good and great. Do people who are not trained musicians expect to be able to tell the difference between good and great musicians? Probably. Can they? No. The same holds true for Software Engineering. Even within a group of people with a skill, deciding who is ‘better’ is at least partially a subjective process.

Just like other skills, software engineering takes a lot of time and effort to gain mastery. How much time? Probably in the neighborhood of 10 000 hours, maybe more. On top of that, it must be the right kind of effort.

To non-professionals, and even to software engineers early in their career, they may see some of the “kids” coming out of college who are rockstars and think: “it must not be that hard”, or “I can do that”, or even “that’s what I do”. The reality is that many of the “kids” coming out of college have over a decade of real practice behind them. Many start coding before highschool, and spend evenings and weekends practicing.

To think you will transition into software engineering / computer science and then go get a job at Google/Facebook/Amazon/etc in a couple of years is a lot like picking up the guitar and planning to star in a rock band in a couple of years. It’s been done, but success is uncommon. From personal experience, I also believe the effort required to become a great software engineer is systemically underestimated, and personal abilities are systemically overestimated.

I do not say this to dissuade those from entering the profession or dabbling on the weekends, but I think they should be realistic about the size of the task they are undertaking. That’s also not to say that being a rockstar is required to realize success. Pairing moderate programming skills with specific domain knowledge is certainly valuable, and, unlike musicians, there is significant demand for those who have not yet reached a level of mastery.

Through most of this article, I use the term ‘software engineering’, don’t I mean programming? No. This is another piece of the puzzle that is non-obvious, but there’s a very real difference. To keep the music analogy going, programming might be like playing an instrument, and software engineering is writing music. With music, you probably need a mastery of the former to be good at the latter, but it’s not necessarily a requirement (you can sell your music without having to play it). However, in Software Engineering, you usually have to play your own music and are judged on the final outcome. Great programmers without Software Engineering talent will play bad music exceptionally well, but it’s still bad music. Great Software Engineers who program poorly will have fantastic music in front of them but be unable to play it well. And rockstars are great at both.

The point of all this discussion is that we need to treat Software Engineering more like a skill and less like a profession. The skill must be developed and grown over time. You can lease out your skills for money, but doing that too much on jobs without challenges will cause your skills to stagnate unless you’re putting in extra practice. Working on the best challenges will help you improve significantly.

Also, a degree in computer science does not mean you are good at the skill, it just means you understand or have an overview of the skill, and (in the case of most post-grad degrees) that you have at one time demonstrated the capacity to competently solve problems in the space. A degree in computer science considerably improves your chances of being or becoming a ‘rockstar’, but is neither necessary nor sufficient.

This line of thought yields some advice that I think is particularly relevant to those in college or early in their career:

  1. Get really good at a programming language. I mean, really, really good. It probably doesn’t matter which one, but you need at least one. Getting really good probably takes 2-5 years of daily use.
  2. Don’t use an IDE when practicing. If there is a correlation between mastery of programming and mastery of an IDE, it’s negative (though mastery of an IDE is certainly an asset when not actively learning / practicing).
  3. Don’t use built in libraries you can’t write. If you can’t write your own hash/list/set/whatever, then you shouldn’t use the built in ones. The only way to know you can, is to do.
  4. Always be learning. Regularly learn new programming languages, algorithms, libraries, and technologies.
  5. Find a niche early, develop it, and then branch to other areas if you wish. Similar to 1, but conceptual. (A niche might be distributed systems, databases, UI, etc.) Also, realize that once you’ve been in a niche for 10 years, it’s very hard to change.

Permalink | Leave a comment  »

June 19, 02:33 PM

The best part of a startup is the opportunity to realize your own potential.  There is no upper limit to the amount you can accomplish, so it's an exercise to see how far you can go.  It's like entering a race, but no knowing if you're in a sprint, a marathon, or an Ironman.  The result is that you have to treat it like an Ironman, but push as hard as you can at every opportunity you get just in case it turns out to be a sprint.  And, as anyone who does crossfit or other endurance sports can identify, you need to be pushing hard 5 min into a workout in order to have a good time at the 20 min mark -- just not quite so hard that you fall over at 10 min.


To pick an analogy, being in a software startup is like running a marathon over unknown terrain, unpredictable weather, many paths to take and not knowing where the finish line is.  Maybe you take the path that is lined with cheering fans, but you can't see a finish line ahead.  Do you keep going down the path with the fans? Or do you turn and take the deserted path?  Is it harder when some of those fans cheering for you have been in the race before?  What if you know they've bet on you to win?  How does that change your perception of the advice and the path you select?  What if you decide on a path that the fans disapprove of, can you continue without their support?


What if you come to a mountain, and you think there's a chance of a finish line at the summit?  Do you run up the mountain?  Up thousands of feet of elevation through sleet and snow?  Or do run through the sunny meadow next to the mountain?  There's  nothing to say that there is not a finish line in the meadow -- maybe even a better finish line.  If you choose the mountain, and reach the top, but find nothing, do you have the energy to try another?  Was running up your mountain still worth it?


Through all this, there are other racers around you.  Some are climbing mountains, some are in the meadows, and some are spending a lot of time trying to choose the right shoes to wear.  Some are faster, some are slower, some have whole relay teams helping them get there, some have been training for years, while others can barely walk.  Do you like to run in crowds?  with friends?  or solo?  Will the other racers be there to help you, supporting other people running a race?  Or will they be trying to get to their finish line at the expense of those around them?


When you come to a fork in the road, you can see a runner ahead of you down one of the paths.  Do you follow the runner you see?  After all, they were recently where you are, and chose that path as the better one. If the other runner chose a path leading to a finish line then you're going to have to catch up to them before they get there.  Or you could take the path not traveled.  Is doing something different than others valuable in and of itself?

All finish lines are not created equal, nor are they placed with much reason.  After years of struggle, are you the kind of person who can find a finish line and choose to go find another hoping it will be better?  Some finish lines will win you rewards beyond your wildest dreams, others might offer a warm place to rest -- most will be in between the two.  How much do you want the warm place to rest?


To sum up my thoughts on the analogy:
  • Run as hard as you can every day.  Rest when you must, but you might be in a sprint.
  • All fans are not created equal: listen to those who have seen many races, have run the race before, and have bet money on you to win.
  • Lots of people have run in meadows, few people have climbed mountains.  Learning to climb mountains faster than anyone else is a valuable skill.
  • Focus on the race.  You're unlikely to reach the finish line ahead of the leaders -- so make sure you're a leader.
  • Don't spend too much time choosing your shoes.  Get in the race.
  • The more you run, the better runner you become.  Taking the wrong path 100 times will leave you a stronger runner than when you set out. 


Finally, be in it for the race, not the finish line.  If you enter a marathon because you want to cross the finish line, get the medal, and some free Gatorade along the way, you'll probably give up after a few miles you realize how hard it is.  The people who compete in marathons do it because they love running, competiting with themselves, and the journey.  Crossing a finish line is just a goal to help with motivation. 


Permalink | Leave a comment  »

June 15, 01:08 PM

Lets start out by stating that I have a lot to do.  I am not in the situation I once found myself in college where I had 3 tasks to do by Friday, and I could do them today or Thursday night.  That's procrastination, and not what I want to talk about.  I have orders of magnitude more work on my plate than I could hope to complete in months and I have a very real incentive to complete them as fast as possible.  I am highly motivated and focused on maximizing my "engineering throughput".  The only real governing factor is avoiding burnout. 

What I'd like to talk about is a state that I enter when I have a tonne of work, I sit down to do it, and end up spending a lot of time on the web (on non-productive sites).  For me, the symptoms are that every time I run tests or install gems (tasks that take a variable amount of time from 20s-2m), I check if anything new is on TechCrunch/HackerNews/GoogleReader, and may end up getting side-tracked for anywhere from 5-15min.  For others, it might be chatting on IMs, or they might get sidetracked fixing a bug in an open source project, whatever.  Individually, these distractions are not that important, but in aggregate, they can kill a whole day or longer.  (On a related note, 'procrastinated' is also what happens to developers working in corporate environments that have 3 30 min meetings equally spaced throughout the day.) 

So why is what I'm describing not just "procrastinating" or "not working"?  First off, unlike procrastination, being procrastinated is not voluntary.  You don't leave the office thinking you put off the task, you leave thinking you had an unproductive day.  Identifying when you've been procrastinated, and when getting out of that state, is very important.

I've found that the cause of me getting stuck in this unproductive state (becoming procrastinated) is that I don't really have a clear idea of what I should be doing, and don't realize it.  When I know that I need to add a page to our dashboard that pulls specific data from our database and show it to the user, I can execute that in a quick and efficient manner.  However, when I know that I need to "report on visits and conversion metrics", I risk becoming procrastinated.  In this case, I think I know exactly what to do and so I set out to do it.  But I don't actually know exactly what to do, so start out by writing the code to save the visit, then realize I haven't done the database work yet, so I switch to that, and then I remember I need a view for the new database table, etc.  I end up not focusing because, it turns out, I don't know what I'm doing.  I'm just doing "reporting on visits and conversion metrics".

The problem with becoming procrastinated is that it's hard to catch.  "Reporting on visits and conversion metrics" is something I've done before, it's not a tough problem, so I feel like I know what I'm doing.  I also think I could do this in a few hours.  The reality is that I need to decide what "conversion metrics" actually means, create new tables in the database and then the classes to access them, create new code in the controllers to catch all the visits, create new controllers for the new metrics pages, and then ... oh yeah, actually calculate the metrics.  There are even more pieces if you're operating at scale.  Each of those things I listed, I can do quickly and efficiently.  None of them is hard. 

What is hard is that there are too many parts to "reporting on visits and conversion metrics" to keep in my head, to estimate the work required accurately, or to be productive.  Worse, since it's such an easy problem to solve, I treat it like it's easy and so don't break it down in detail -- I just set out to do it.  And the worst part is, this leads me to become Procrastinated!

When I'm procrastinated, I have a fuzzy idea of what I'm doing, but I think I know exactly what I'm doing.  I'm not sure exactly what needs to be done right now, but I don't realize it.  Not being able to tell when you've been procrastinated is the worst part.  I'm still writing little pieces of code, my test aren't breaking, I'm making progress -- it's just not efficient at all. 

The reality is that I'm procrastinating on answering the question I must answer which is "exactly what do I need to accomplish to finish 'reporting on visits ...'?"  "And in what order do I need to do them?"  I'm avoiding detailed planning because that's hard.  Reading blogs and news is like watching TV, my brain has basically checked out -- no need for active thought.  Just read the headlines, see something shiny, click, skim, repeat.  Planning out a task in detail requires your brain to engage and think. 

What do I do to avoid being procrastinated?  I try to track all of my time. Sort of like changing your eating or spending habits, I try to track and review all of my time to look for inefficiencies.  What I've found working for me in the last few months is using Pivotal Tracker for everything technical I do, and entering tasks retro-actively.  So, when I get sidetracked on fixing some bug I find while implementing my new feature, then I add that task in separately after the fact.  This lets me look back and see everything technical I did in a day, week, month, and I can review it. 

The most important part of using Pivotal Tracker (or otherwise tracking everything you're doing) is that it makes me write down everything I'm going to do, and forces me to take bite-size pieces.  I'd never get away with a task in pivotal called "reporting on visits and conversion metrics", it would just look wrong to me.  Am I going to have that as a comment on a checkin?  No way!  It would force me to break that down to its component parts, and suddenly I'd be flying.

I suspect that being procrastinated is such a widespread problem, that, over time, it consumes huge percentages of potentially productive time, and that this is where pair-programming came from -- as a means to combat becoming procrastinated.  I wonder if it would pay to hire someone an intern whose job it was to sit beside me all day, and, every half hour, just ask "what are you working on?" and "is that the highest priority item right now?"?

Now that I think of it, a lot of the agile methodologies seem to be well-suited to address becoming procrastinated.

Lessons to take away:
1) Don't get Procrastinated
2) Learn to identify when you are Procrastinated, and identify why
3) Do that thing you're not doing that you identified in 2
4) Experiment with systems to either reduce the chance of you being procrastinated, and/or
5) Experiment with systems that will allow you to more quickly identify when you have become procrastinated

(I have so much more work than I can handle, that we're hiring engineers ;).  http://careers.thinknear.com/)

Permalink | Leave a comment  »

May 25, 05:47 PM

When the job title reads:  "Software Engineer - Rails focus"; and the job description includes passages such as:  "When most people rank themselves from 1-10 on a language, they overestimate by at least 2-5 points -- YOU are legitimately an 8-9 in Ruby, preferably on Rails."  You should probably have some sort of Ruby and/or Rails experience somewhere on your resume ... especially when there are other jobs listed by the same company that don't list knowing Ruby as a hard requirement. 

We have 4 positions officially open.  We're looking only for rockstar engineers.  3 of them don't require Ruby on Rails (just a willingness to learn and a track record of learning new technologies quickly).  If you don't know Ruby, but would like to apply, please apply to one of the others.

http://careers.thinknear.com

Permalink | Leave a comment  »

May 17, 05:22 PM

This is another post targeted at newly graduating college students, though this might also be relevant to self-taught, "career-transitioning", and "incompetent" professionals.  This could also be considered a brief overview to the important foundation of a CS education.  

I've done a lot of technical phone screens and interviews.  While at Amazon.com, I was involved in a lot hiring decisions and participated a lot in their college recruiting programs.  As an interviewer, we have anywhere from 30 minutes to 1 hour to make a hiring decision.  As an interviewee, you have 2-6 interviewers you must impress in 30 minutes to 1 hour each so that they unanimously decide to hire you.  How do you prepare?  What are they looking for?  What should I read?  

Part 1:  Content you should be intimately familiar with

1.  Implementations of standard library data structures.  Doesn't matter what language you prefer to work in, either the libraries or the language itself will offer common data structures like: maps, arrays, lists, stacks, and queues.  You should be familiar with their API's and implementations.  I say again, you should be familiar with their API's and implementations.  

Common interview questions are: implement a stack using a queue, implement a map using arrays, implement a list using arrays, etc.  Other interview questions might start out with:  what are the 2 main functions of a queue?  what methods would always be present on a stack?

2.  How the standard data structures represented in your language.  I won't care that you don't know Java, but you have to know something, and you have to be familiar with enough languages or abstractions to discuss your language in abstract terms.  We need to be able to converse using terms like map, stack, struct, pointer regardless of the language you choose to interview in.  (You do not what makes a pointer a pointer, right?)  The alternative is that I make everyone interview in the languages I know, but then I'm unable to interview great php, python, javascript, and lisp developers.  

This is really just terminology, but if you only know one or two languages (and especially if those languages are very similar like C# and Java), then you need to learn abstract lingo (or, better yet, another programming language).  Wikipedia the data structures from 1 and know the difference between your language and their general representations (does Java have a key-value map? or just a HashMap? how might they be different?)

3.  You should be able to implement all the methods on common data structures.  Yes, third point in a row about data structures, but you really do need to know these cold.  Things like dynamically resizing arrays need to be easy for you.  If you can't write code to insert an element into the middle of an array in under 2 min, practice until you can.  In C++, start with an Array and practice implementing Vector.  In Java, start with an array and practice implementing ArrayList.  See where I'm going with this?  Satisfy the whole interface, then do it again faster; repeat.

You probably won't be asked this directly in an interview, but lots of solutions might require you to resize arrays or count odd elements starting with the letter t.  It's just something that comes up in a lot of problems.

4.  Non-standard data-structures:  namely, trees.  You should be familiar with binary search trees, balanced binary search tries, b-trees, and tries.  These are favorites in interviews because solutions often involve recursion, and recursion is a good way to test in 30 min if someone has potential to actually be a great engineer.  Lots of people believe Joel's assertion that you need to know pointers and recursion, whether they're familiar with Joel or not.  http://www.joelonsoftware.com/articles/ThePerilsofJavaSchools.html

Interview questions might include:  what's a binary search tree?  What's it good for?  How does it compare to a balanced binary search tree?  etc.  

4b.  Traversals of trees.  Questions requiring tree-traversal are favorites because they require some deep thought but not a lot of code to be written -- perfect to completely judge someones technical worth in 30 minutes.  You should be familiar with in-order, depth-first, and breadth-first traversals.  This is another one worth repeating, you should be able to write any of the traversals in less than 5 min, and once you have one traversal, the others should take less than 1 min.  

There are a million and a half interview questions around trees.  Just know your traversals cold because the questions will invariably require you to do something extra with them.  (Traversals really aren't that interesting on their own.)

5.  Code.  Yes, you need to be familiar with code.  You need to be able to write (near-)perfect code without the help of Google or an IDE.  On the whiteboard, on a piece of paper, in notepad, wherever.  If you can't write code that would compile, and be able to be confident of that by eye-balling it, I assert that you don't really know how to code.  If you can't do this, or think you might need practice, then practice.  Take one of my problems from above and practice coding it on paper or in notepad.  Then copy what you wrote on paper into something that compiles and see if it works.  The trick is to really try to be 100% done before checking if it compiles or not.  

(Note that I didn't say in what language you need to code in.  Most places worth working at will let you code in whatever you feel most comfortable in.  So, just choose your strongest and make sure you're comfortable with it.)

6.  Algorithms.  When I say algorithms, I mean the ones you learned in first or second year of University, not the ones you learned in third and fourth year.  Graph Theory, minimal coverings, dynamic programming are all great classes of problems, but not often applicable to the real world, and hard to test in an interview format (unless you have a PhD in Algorithms and are interviewing at Google, then I might expect you to know this stuff.)  What is fair game are things we've already covered around data structures and traversals.  

Also, think about what else do a lot of rich standard libraries offer.  Thing like indexOf, sort, min, max, first(n), shift, set union, and set difference.  Ruby actually bakes a lot of these things into their main classes, so check out http://www.ruby-doc.org/core/classes/Array.html.  If I were going to interview in Java, I would practice building a data structure in Java that had all the functions that Ruby's Array class does (maybe skip the meta-programming methods).

Okay.  There's obviously a bunch more of interesting stuff out there: compilers, OS, concurrency, networking, distributed programming, etc.  But, this is pretty much the core. You absolutely must be able to do these things.  Though I can't predict what others will ask in their interviews, I feel pretty confident in predicting that an engineer without a solid foundation in these 6 areas would not pass interviews from Amazon, Google, Microsoft, Facebook, or other company known as a good place to work.

Hidden #7:  If you don't get asked about this stuff when you interview, then the odds are your peers didn't either.  I don't know about you, but I don't want to work somewhere that employes software engineers that might not know this stuff.  And you do need to know this stuff at careers.thinknear.com.

Permalink | Leave a comment  »

May 10, 03:00 PM

We've officially started hiring here at ThinkNear (http://careers.thinknear.com), and that means for the next month or so I'm 30%-50% technical recruiter/interviewer.  As I'm sure anyone who's been hiring knows, getting resumes out of college is easier than out of industry.  With that in mind, I've been interviewing a few college grads.  Here are some tips they should take to heart, both for when applying to ThinkNear, but also for other companies.

1) Do read the job description, and follow instructions.  If the job posting asks for your resume in a specific format, you should submit your resume in that format.  I ask specifically for a PDF because I'm crazy and quirky, and I still get a lot of word docs which I never open.

2) Do read emails and prepare accordingly.  If you're invited for a tech screen and told we'll be working in Google Docs, and you've never used Google Docs before or don't know what it is, you should probably do a bit of preparation.  I had a candidate take 20 min to 'create a doc and share it with me'.  I didn't used to think this as being part of my technical interview.

3) Don't apply to multiple jobs at the same company.  We officially have 4 positions open, all the resumes come to me.  Even at big companies like Google, they probably have one recruiter for your school who gets all the resumes.  If you have skills relevant to multiple positions, apply to the one you think you're the strongest candidate for, and then mention you're also interested in the other in your cover letter.

4) Don't exaggerate your skills.  As a college graduate, you are most likely not 'vastly experienced', 'an expert', or 'incredibly talented' at anything.  That's okay.  We posted on your college job board, just put emphasis on attitude.  Corollary:  don't apply to any jobs with 'Sr.', 'SR', or 'Senior' in the title.   

5) Don't misrepresent your skills.  Any computer language you list on your resume is fair game in a phone screen.  If you say you know Perl, you should know it well enough to interview in it.

6) Do mention skills in your education and experience.  Having a list of languages or acronyms along the top of the resume like XML, WSDL, SOAP, etc is fine, but supposing that's what I'm looking for (it's not) I'm going to look through the rest of your resume to find out if you used this in school, in a job, or what? And how long ago was it.  If I can't tell where you picked that up, then I might conclude you don't really know it, see 5.

7) Do look up the company you've applied to before phone calls.  We are the easiest example:  if you go towww.thinknear.com, there's just a video to watch that takes all of 1 min.  That's pretty much all there is other than some press.  I understand the 'spray and pray' approach to applying to jobs when graduating from school -- but if a company wants to follow up, you should take some more time to learn about them.

8) Do prepare a personalized cover letter, or don't send one at all.  If the job description is asking about 'front end experience' and 'how cool are your HTML/CSS pages?', then cover letters describing your experience writing complex db access code in Java is not particularly relevant.  Reuse elements, have different 'standard' cover letters for different types of positions, whatever works -- but don't send cover letters that aren't relevant.  Corollary:  when applying by email, cover letters go in the body of the email or as the first page of your resume.  Sending CV's as separate docs is a great way to confuse me.

9) Do be humble and patient.  Yes, you're a rockstar.  You can code circles around all your classmates.  Why don't employers get it and just hire you?  Because your employers are interviewing all your friends that somehow got through a masters program without learning how to code.  Seriously, happens all the time.  The result is that the person you're interviewing with is doing a lot of these phone screens, and a lot of them end in failure.  Please be patient.  If you're awesome, they'll notice and appreciate that you jumped through their hoops to prove it to them.

10) Do keep open lines of dialogue with employers.  If you need to reschedule interviews, you can do it once, and it doesn't impact negatively at all.  If you get an offer from another company, but are even mildly interested in others, its okay to tell them you have an offer and see if they'll hop to get your interview done faster (unless you know you're going to accept the offer anyway, then please don't waste our time.)

And finally:
11) Do interview your interviewer.  If you don't ask any questions, I assume you don't have a lot of options.  What technologies are you using?  What are the working conditions like?  Do you read JoelOnSoftware.com?  How close is the nearest Starbucks?  All acceptable questions.  If it's the first time we've spoken, and you have no questions, that is a negative.

Now, go forth and become employed. 

Permalink | Leave a comment  »

May 06, 01:35 PM

This past winter at TechStars NYC, I worked harder and longer than I ever had before in my life, and I sustained it for weeks on end.  We were building a product, new priorities and crises every week, and launching features faster than I ever had before.  We were building a business at lightning speed.

ThinkNear is entering a second phase of its growth.  This phase won't be like the first phase, we'll be building up from a base we've established (and growing -- http://careers.thinknear.com).  In the first phase, we had to build the base as fast as we could, and we had a real sense of urgency -- the end of TechStars and then our runway was just a few months away.

In this second phase, there is no more TechStars, and our runway will be more than just a couple months away.  We will have the same pressing deadlines with our partners, but we will need to establish our own milestones.  More importantly, it's important to find the right balance of work and not work so that we don't burn ourselves out.  We need to find a balance between the urgency we need to make phase 2 a great success ensuring there is a phase 3, and being alive and ready to face phase 3.

I'm calling whatever this point is, Maximum Sustainable Effort, and I'm trying to find what mine is.  I've never had anything push me as hard as this before, never had anything ask me to give everything I had, then ask why there isn't more.  This, I'm told, is the thrill of being at a startup.

I've established that my Maximum Sustainable Effort is greater than 60 hours per week.  Going home at 6pm on a Friday feels like taking half a day off.  A quiet morning reading with coffee reenergizes me more than a whole weekend used to.  I've also established that it is less than 90.  There were a few 100+ hour weeks during TechStars, and I came dangerously close to burning out for more than a few days.  I'm starting to think that my MSE exists in the 70's.

I'm also starting to believe that MSE is like muscle -- you can train it.  Books I've read about training talent agree with me -- if raising your MSE is a talent.  You can train yourself to focus on tasks for longer, and you can train your routine to become more efficient. 

One of my personal motivators has always been to stretch my own potential to realize my full potential.  What I'm working on here has been a great motivator, a great training ground, and a great field on which to play.  I'm really excited both about the future of ThinkNear and the personal growth I will get to realize by being part of an amazing team going after such an interesting problem in a huge market.

What's your Maximum Sustainable Effort?  How did you reach it?

Tangent Story: We were moving so fast that at one point this past winter, we got an email asking for a feature we didn't have.  As soon as this partner of ours asked for it, it was immediately obvious to me that we should have had it, and replying that we did not have it would have been embarrassing and probably ruined the partnership.  So I did the only logical thing I could think of:  I coded and launched the feature before replying to the email so that I could say we did indeed have it.  This took a bit more than a day, so he still got a reply in a timely manner.

Permalink | Leave a comment  »

Posts

March 11, 08:12 PM

When I code, I often have an instance of the Rails console running in another window. Any time I get to a part of my code that I’m not 100% sure on, I just play around with it. It’s especially helpful when calling API’s or dissecting large structures of hashes.

One problem I have, is that I code in multiple projects, and each of those projects have multiple environments on Heroku. Sometimes when troubleshooting I have my local Rails console open, and a couple of consoles of my app running on Heroku (am I getting the same results on Heroku that I am getting locally? how about sandbox vs. prod?). Sometimes it gets worse — our apps talk to each other. So maybe I have even more windows open for different apps.

My solution was to monkey patch IRB to let me drop in my own prompt config that works for Rails. It appears that there are no hooks into IRB to allow a custom commandline, so for right now, monkey patching appears to be the only way to go.

My code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
require 'irb'

module IRB
  class << self
    alias :orig_init_config :init_config

    def init_config(ap_path)
      begin
        puts "loading init config: #{Rails.env}"
        # Set up the prompt to be slightly more informative
        rails_env = Rails.env
        current_app = Rails.application.class.parent_name

        # a string we can sub the 2nd to last character depending on the context
        prompt_string_root = "#{current_app}(#{rails_env})%%3n%c> "
        # calculate manually since we don't count the trailing '> ' and %100n will be 3 char
        normal_prompt_string_length = current_app.length + 1 + rails_env.length + 1 + 3
        empty_string_root = "#{' ' * normal_prompt_string_length}%c> "

        # http://tagaholic.me/2009/05/29/exploring-how-to-configure-irb.html#prompt
        prompt_config = {
          # Normal prompt (%zn – Line number with optional number z for printf width)
          :PROMPT_I => sprintf(prompt_string_root, '>'),
          # indent prompt
          :PROMPT_N => sprintf(empty_string_root, ' '),
          # string continue prompt
          :PROMPT_S => sprintf(empty_string_root, '"'),
          # statement continue prompt
          :PROMPT_C => sprintf(empty_string_root, '?'),
          # prefix output (leaves a space between the side effect ouput and return val)
          :RETURN => "\n=> %s\n"
        }

        #actaully do the init
        orig_init_config(ap_path)
        # and override with our preferred prompt
        @CONF[:PROMPT].reverse_merge!(:RAILS_ENV => prompt_config)
        @CONF[:PROMPT_MODE] = :RAILS_ENV
      rescue Exception => e
        puts "Error loading IRB PROMPT", e
      end
    end
  end
end

Anyone know how to do this without monkey patching?

References:

Putting config into an .irbrc or .railsrc doesn’t work for my use case since I wanted this to work on Heroku (which doesn’t support irbrc, right?) and IRB isn’t defined when Rails/config/initializers run.

While Googling around, I found some more tools I’m excited to try:

Permalink | Leave a comment  »

February 16, 09:42 PM

Update:  I haven't figured out how to recover the elements in order using the Synchronized Priority Queue, so I'm using the TreeSet implementation.

I needed a data structure to store a series of elements in order.  They were just numbers, but I wanted to retain all elements (so the traditional definition of a Set wouldn't work, since I don't want to lose duplicates).  I decided that collecting them in order and paying the computational price on each insert was probably going to work better for my use case than sorting at the end.

A PriorityQueue would be natural, but I need it to be threadsafe.  The docs on the PriorityQueue recommend using a PriorityBlockingQueue for concurrent use (I don't care about the "blocking" part of it).  So I used that.  Decided to test it after the fact against a couple of other options (with test results):  

1) PriorityBlockingQueue
java -Xmx1024m TestHarness  14.23s user 22.78s system 110% cpu 33.351 total

2) TreeSet passed through Collections.synchronizedSortedSet*
java -Xmx1024m TestHarness  5.51s user 1.32s system 171% cpu 3.992 total

3) PriorityQueue passed through Collections.synchronizedCollection
java -Xmx1024m TestHarness  1.07s user 0.48s system 150% cpu 1.025 total

* In the case of the the TreeSet, I pass a custom comparator that prevents items from being evaluated as equals (i.e. the comparator cannot return 0).

So, pretty telling numbers there.  Of note, it's clear that the PriorityBlockingQueue has an internal resource causing contention, since basically only one of my CPUs was able to be involved most of the time.  Moving the TreeSet and the PriorityQueue onto Round 2, I add a check to the size on every insertion.

1) TreeSet
java -Xmx1024m TestHarness  6.19s user 1.11s system 158% cpu 4.598 total

2) PriorityQueue
java -Xmx1024m TestHarness  1.39s user 0.70s system 151% cpu 1.379 total

And I have my winner.

Permalink | Leave a comment  »

February 16, 05:47 PM

We are using CloudWatch for a couple of custom metrics, but we collect the metrics at a rate beyond what CloudWatch will handle.  so I wrote a class on our side to collect stats for us, which we can then periodically publish to CloudWatch as a StatisticSet.  

Under our old system, each thread would create their own Datum, and then just add that datum to a shared, synchronized data structure -- just one point of contention across threads, and, what should be, a very fast one without the chance of conflicts.  

However, writing our new implementation, we want to count along several dimensions for each thread.  This will be a highly contested piece of code -- we serve hundreds of requests per second inside a single host and our latency constraints are very tight.  I opted for a solution using several AtomicCounters

I didn't just synchronize the method because I thought that that might introduce bigger (and more costly) points of contention.  Having thought on this more, I don't have any confidence in saying that one implementation would perform better than another.  Better to test both and figure it out.

Test:

10 threads, each one performing 20 billion runs.  Each run increments 2 values, 25% of the time we increment a third counter,  6% of the time we increment a fourth counter, and we track a maximum.  This roughly mirrors the logic I'm using in my production implementation, at least the part that's under contention.

Source code is on github:  https://github.com/softwaregravy/AtomicVsSynchronized

I ran it a few times just using the unix time command.  The results were consistently close with the two samples shown below:

AtomicRecorder

java TestHarness  16.58s user 0.09s system 190% cpu 8.756 total

SynchronizedRecorder

java TestHarness  12.87s user 10.12s system 165% cpu 13.845 total

So, with the results in hand, I feel justified with my decision to use the Atomic values vs the synchronized method.  Under heavy contention, which I expect in this particular component, they perform significantly better (overall).  

That said, the synchronized version is using less cpu than the atomic version.  The synchronized version is also clearly the simpler of the two.  This becomes more apparent as you grow the class beyond trivial.  And maybe that particular piece of my code wouldn't be a contended as I think it will be?

Another test, this time with zero contention:  1 Thread performing 200 billion runs.

AtomicRecorder

java TestHarness  7.75s user 0.06s system 100% cpu 7.738 total

SynchronizedRecorder

java TestHarness  7.16s user 0.06s system 100% cpu 7.181 total

Synchronized is slightly faster with no contention, which makes sense intuitively.  

Overall, it's a tradeoff.  Using Atomics is significantly faster, but only under extremely high contention.  They also make your code more complex.  For 99% of the use cases out there, synchronized is probably the much better choice.  For that other 1%, I only suggest making sure you're in the 1% to make sure it's worth it.  Otherwise wasting optimization effort.

Permalink | Leave a comment  »

January 19, 04:07 PM

Subscribe to an SNS notification with your service via HTTP? It sends a confirmation message in a post. I couldn’t find this documented anywhere, so here’s wat I got (a couple of random characters removed from signatures and endpoints):

#json
{
    "Type" : "SubscriptionConfirmation",
    "MessageId" : "84508b4c-be5d-4ac4-9c1d-96eab2b6fe6e",
    "Token" : "2336412f37fb687f5d51e6e241d09c805f2fe718d402ad3291fbbf65d1089c48a1573ff939ca4584a571e9aa496256c6ec9a73bc8e450e4fea07d51d3ed3bf2cd1814e095f19d3671c5566850da313940d0b006a00ccc6226e8f0fa774831c5aabc015eb563f6418c01855f144c1453",
    "TopicArn" : "arn:aws:sns:us-east-1:35308450730:ThisIsATest",
    "Message" : "You have chosen to subscribe to the topic arn:aws:sns:us-east-1:35308450730:ThisIsATest.\nTo confirm the subscription, visit the SubscribeURL included in this message.",
    "SubscribeURL" : "https://sns.us-east-1.amazonaws.com/?Action=ConfirmSubscription&TopicArn=arn:aws:sns:us-east-1:35308450730:ThisIsATest&Token=2336412f37fb687f5d51e6e241d09c805f2fe718d402ad3291fbbf65d1089c48a1573ff939ca4584a571e9aa496256c6ec9a73bc8e450e4fea07d51d3ed3bf2cd1814e095f19d3671c5566850da313940d0b006a00ccc6226e8f0fa774831c5aabc015eb563f6418c01855f144c14535",
    "Timestamp" : "2012-01-19T20:13:34.281Z",
    "SignatureVersion" : "1",
    "Signature" : "bpfXFxjcDPYDxAikOCiYrYHWgcmJKDelkrckTtYaM6IuBZgOcHedP3bxuCONwHVQRGnBMPk6/RT8nOjkX54ntWz3/2Z7YZNprDE1qJUplF0AcVPd2dPYcwy+mbE2qCs6PtqPAJ10Qz475BqFF9nHE07A9MSG8RXHQh1t0GMs=",
    "SigningCertURL" : "https://sns.us-east-1.amazonaws.com/SimpleNotificationService-f3ecf7224c7233fe7bb5f59f96de52f.pem"
  }

Permalink | Leave a comment  »

December 14, 02:18 PM

I’ve been working with a lot of AWS lately, and I’ve decided it’s a great background that engineers should have. Not AWS, per se, but just a deep understanding of infrastructure and infrastructure problems. Most graduates of CS programs lack this. To my knowledge, colleges are not offering AWS 101 nor a theoretical counterpart (scalable architecture 101). Maybe they should. Not because AWS is great in itself, but it exposes people to all the complex infrastructure decisions that go on at some level under every organization. Also, if you ‘get’ AWS, the skills will be very transferable to other cloud providers.

If there were an AWS course, I think it would cover the following:

  • spin up an EC2 instance (bare Amazon 64-bit AMI), and log into it
  • mount EBS volumes (8 1TB drives in RAID0 anyone?)
  • install Sun Java 6_latest
  • create your own AMI
  • build a .war with maven
  • install Tomcat on EC2 and run a simple webapp
  • customize Tomcat’s server.xml (just make a simple change)
  • set up a Mongo replication set on their own AMI
  • set up Mongo sharding
  • be able to have simple writes from your Tomcat webapp into Mongo
  • simulate failures of mongo instances
  • take backups (snapshots) of your mongo data
  • restore from your backups without losing things
  • run elastic load balancer against many instances in EC2
  • simulate failure of an availability zone — your app and db’s should continue to run
  • run autoscaling (your apps are all your ami and start as a service, right?)
  • run a load test against your set up with JMeter, also running in the cloud
  • use Elasticache to store the last 10000 read values from Mongo, and rerun the load test
  • tune Tomcat, the JVM, and Mongo to improve results
  • build and deploy your code with 1 command

Do all of the above from the command-line tools from Amazon. Write your own scripts in Ruby.

Document every step on a blog.

Put all your scripts and stuff in Github.

You now have an amazing resume item for any new grad (and pretty good one for most professionals, too). Total time invested will be less than most University courses, cost is most likely under tuition for 1 course (most of this is free), and you now have a great understanding of a very common web architecture. In fact, if a new or soon to be graduate of a CS program did all this, I would promise an interview or, if you’re still a ways away from graduating, an internship at ThinkNear. (I’m also looking to talk to anyone with experience doing this in industry as well, but this is particularly impressive for someone early in their career.)

Complete this list and get an internship or interview with ThinkNear

Note: Maven, Tomcat, Mongo, and JMeter were selected ‘at random’ because they’re open source and well-known with lots of docs out there and good communities. I think learning any technologies is valuable, so feel free to sub out your favorites: Ant, Rake, Jetty, Rails, DJango, etc etc.

For bonus:

  • run inside a VPC
  • use Route53 to give all your hosts friendly names
  • have new hosts that start up auto-register themselves with Route53 and load balancers to start taking load
  • run a load test, and watch your fleet autoscale up to meet demand (you’ve got autoscaling working, right?)
  • simulate an availability zone failure, and watch your fleet autoscale up to meet demand

Permalink | Leave a comment  »

October 26, 12:20 AM

It’s taken a few years, but I am firmly on the Test-Driven Development band wagon. Testing is everywhere in the Ruby community. I can honestly say that RSpec changed the way I work. The way it ties language into tests just makes coding up tests really really useful. They truly are the spec — and you can read them. How about that! I can seriously print out my tests, give them to our business, and just say — that’s what I’ve built.

Now, when I code, I start with the specs. The odd time I code a method first, if I’m figuring something out, but once I’ve got it I comment it out and go back to the specs.

  • I organize my specs by method
  • make heavy use of contexts and before blocks
  • It blocks should be short and sweet, I abuse the commentless variety

Step 0) sanity

  • latest code from origin
  • pivotal task started
  • all specs pass
  • guard running

Step 1) specs

describe "#method_name" do
  context "when the user does not exist" do
    it "should post to Airbrake"
    it "should return a 500"
    it "should render the 'does not exist' page"
  end
end

Step 2) setup

describe "#method_name" do
  context "when the user does not exist" do
    let (:user_id) { 500 }
    it "sanity test" do
      User.find_by_id(user_id).should be_nil
    end
    it "should post to Airbrake"
    it "should return a 500"
    it "should render the 'does not exist' page"
  end
end
  • use the before block to make your context statement true
  • where I assume, I sanity test them prior to other tests

Step 3) Specs

it "should post to Airbrake" do
    Airbrake.should_receive(:notify)
    get :method_name, :user_id => user_id
end
it "should return a 500" do
    get :method_name, :user_id => user_id
    response.code.should == '500'
end
it "should render the 'does not exist' page" do
    get :method_name, :user_id => user_id
    response.should render_template("error")
end

Step 4) Write the code

Exercise for the reader :)

General principles

  • specs before code
  • when you have a bug, get a test to make it fail before fixing it

Your specs are an asset. They specify your system, the better your code coverage, the better they specify it. Forcing yourself to write specs first forces you to really think through what you’re doing, see clearly how you will solve the problem before solving it, and, god forbid, write the same mistake down in 2 places. To introduce a bug, you either need to neglect testing, overlook something, or write the same error twice. Testing leaves only overlooking things, which is actually a smaller percentage of bugs, at least for me, than off-by-ones, nil variables, and other errors.

If there were a way to force the diligence that tests put you through — the thought process, the error checking, the edge case consideration, they would be less valuable. They’d still be valuable for the sake of the spec. However, I’ve found no such methodology, procedure, or tool which can force upon me the same discipline of thought that actually writing out hundreds of lines of testing does.

Permalink | Leave a comment  »

October 16, 10:36 PM

Git is great. After years of being a perforce user, it took a while to adapt, but now I’m very happy with git. Git is powerful, flexible, and generally awesome. The one thing git doesn’t have going for it is that if you ask several developers ‘how do you use git?’, you can get several answers. The good is that git is flexible enough to have ‘good’ workflows for teams from 1 to 100000, the bad news is that those workflows are often different, not well standardized, and it’s easy to bludgeon your way to a mostly working workflow that is badly suited for your needs (or downright wrong).

As our engineering team is growing, I’ve been doing some research on the ‘right’ workflow for us to use going forward. I looked at a number of resources, and have settled on the ‘rebase workflow’. There are some good write-ups out there, but the simplest and best suited to our needs I could find was on Rein Henrichs blog ( http://reinh.com/blog/2009/03/02/a-git-workflow-for-agile-teams.html). The following is how we’ll use git at thinknear, and we borrow heavily from Rein’s model.

Workflow

0) Get a story or bug from Pivotal Tracker. If you need to make a change that’s not in Pivotal, create a story in Pivotal to track your work. * mark a task as started as soon as you start it so others know you’ve started it

1) Get a local, up to date copy of master

git fetch origin
git checkout master
git merge origin/master

At this point, git diff master..origin/master should be empty.

2) Create a branch for the feature with a relevant name

git checkout -b fixin_broken_stuff

3) Do work on your branch

  • check in often, lots of small working commits
  • check for/get the latest from origin/master at least once per day
    • use rebase when getting the latest
  • optionally, and carefully, use origin as a place to backup your work
    • a good idea if you’ll be several days away from master

3.5) Update from origin/master using rebase

git fetch origin
git rebase origin/master

4) When finished, rebase the work to master

git fetch origin
git rebase -i origin/master

5) Merge to master and push to origin

rspec spec
# add any additional tests here, point is, everything works before commit to master
git checkout master
# we just fetched from origin a second ago, right?
git merge origin/master
git merge fixin_broken_stuff
rspec spec
git push origin master

6) Mark the tasks in Pivotal as Finished once the code is in and pushed to origin/master

  • accepted once they’re in production
  • between ‘finished’ and ‘accepted’ is our deployments and verification testing, which will have to be in another post

General principals with this workflow are

  • commit early and often while working
  • favor larger commits in master over many smaller commits
  • 1 branch per developer, push to origin if you want, but the only source of collaboration is through master
  • master should pass tests at all times
  • master should be ‘pushable-to-prod’ at any time
  • code reviews happen retrospectively through comments in github
  • Pivotal should contain a record of all work completed and every commit should be somehow related to a pivotal story (see step 0)

Resources:

Git

Git Workflow

Permalink | Leave a comment  »

October 13, 01:03 PM

I’ve been working on making my specs faster for a while ago. Now that my largest project has thousands of specs, they take minutes to run. I’ve found two easy fixes that I’m actively applying.

Avoid Create

A lot of my model specs center around creating objects and testing their methods in different states. I’ve found generally that 90% of a model can be tested without create. I now use new heavily.

As an example, I have a fairly simple class called TimePeriod. Does fairly obvious things, like has a duration, and such. One thing we do is compare them. Here’s an exert from it’s spec file.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

# before
describe "<=>" do
  it "first compares days" do
    l = TimePeriod.create(:start_day => 1, :start_time => 10.hours.to_i, :duration => 1.hours.to_i)
    r = TimePeriod.create(:start_day => 2, :start_time => 10.hours.to_i, :duration => 1.hours.to_i)
    (l <=> r).should == -1
    (r <=> l).should == 1
  end
end

# after
describe "<=>" do
  it "first compares days" do
    l = TimePeriod.new(:start_day => 1, :start_time => 10.hours.to_i, :duration => 1.hours.to_i)
    r = TimePeriod.new(:start_day => 2, :start_time => 10.hours.to_i, :duration => 1.hours.to_i)
    (l <=> r).should == -1
    (r <=> l).should == 1
  end
end

It’s actually a fairly complex class under the hood, and has a lot of edge cases. In total, I have 130 specs for this class, though most are fairly simple. In total, there were 83 instances where I could replace create with new.

In 3 runnings, the 130 specs before the change took 3.61, 3.67, and 3.9 seconds. After making the change, they took 3.08, 3.15, and 3.1 seconds. All times reported by rspec, running inside guard, against spork with pandora running in the background, and multiple browser windows with dozens of tabs each — a fair approximation of my normal development conditions. From my sample size, that’s a 17% improvement.

Avoid Create 2 => Avoiding Factory ===

Turns out this was a very powerful idea that I can apply more generally — there are hidden calls to create all over the place. Lets take a look at my business class. In my project, a business belongs to a user. Now, the user isn’t actually being tested in my business_spec, and there is no behavior of my business that depends on any state in the user, but my business is not valid without a user. Turns out I was creating a user (yeah, in the db) for every spec. My business has 138 specs. On 3 runs, they took 11.01, 10.95, and 10.84 seconds.

So I made the following change to avoid the ‘create’ call implicit in the factory:

1
2
3
4
5
6
7
8
9
10
11
12
# before
let (:user) { Factory(:user) }


# after
before :all do
  @user = Factory(:user)
end
after :all do
  @user.destroy
end
let (:user) { @user }

After the change, they took 6.03, 6.45, and 6.3 seconds. From this sample, that’s a 43% improvement. Here’s my factory:

1
2
3
4
5
6
Factory.define :user do |user|
  user.sequence(:email) {|n| "1test#{n}@sample.com"}
  user.password 'secret'
  user.password_confirmation 'secret'
end
 

Pretty simple, eh?

Quick Summary

Applying these two principals will give differing results depending on the object under test, but in general, it’s worth being mindful of object creation. The obvious alternative that would have improved speed greatly would have been to abandon the principal of 1 should per spec. To a very large degree, I follow this mantra — combining my 130 specs (many of which are 1 line) into 30 big specs, or even 20 mega specs, would certainly have improved at least as great a speed performance. Even then, these would help avoid some costly creates.

  • rails 3.0.10
  • rspec 2.6.0
  • factory_girl 2.1.2
  • spork 0.9.9rc9
  • guard 0.6.3

Someone appears to be ahead of me https://github.com/pcreux/rspec-set

Permalink | Leave a comment  »

August 29, 11:16 PM

I has in a partial for the fields of a form today where f was my form builder. One of the fields on my model was ‘text’, but I am using the serialize method, so I actually needed to take in an array of values. I couldn’t figure out any method of f that could give me access to the array, so, I took the HTML normally generated by f.text_field and just added [] to the end of the name in order to pass it as an array value. But with just the raw HTML, I need to populate the value — but all I had was a reference to f. Turns out, the object being build by f is in the object field.

Here you can see a contrived example with a model, the view before I made the modification, and the view that now lets me edit elements in the serialized array.

1
2
3
4
# my_value_array :text
class MyModel < AciveRecord::Base
  serialize :my_value_array, Array
end

Since I wasn’t being clever, I just gave myself space to add at most 2 values to the array at a time. However, this meant I needed to clean up empty values in my controller.

1
2
3
4
5
6
7
def update
  @my_model = MyModel.find_by_id(params[:id])
  params[:my_model].try(:[], "my_value_array").try(:reject!){|v| v.blank? }
  if @my_model.update_attributes(params[:my_model])
   # blah blah blah
  end
end

Full disclosure, I boiled the sample code down from a much more complex example. I think it should work, or at least get you on the right track.

Permalink | Leave a comment  »

August 06, 11:55 PM

I recently built an app with a multistep, implicit registration. By that I mean that there are multiple pages of the registration process, and we create an account for the user implicitly (we just email them their password after we create an account for the). Someone asked me about it, so I created a little app to demonstrate roughly what I implemented in my real app.

The app is online here. The source is available here.

I used Implicit registration Good Idea

I do like the implicit registration bit, but I do not like the multistep registration flow. The implicit registration is just great from a UX perspective. A bit weaker from the security perspective, but still not terrible. The security weakness could be greatly mitigated by requiring a password change on a subsequent log in, or otherwise expiring it after a certain period of time.

1
2
3
4
5
6
7
8
  def register_user
    password = User.send(:generate_token, 'encrypted_password').slice(0, 6)
    user = User.create!(:name => name, :email => email, :age => age,
                            :password => password, :password_confirmation => password)
    Notifications.signup(user, password).deliver
    self.user = user
    self.save!
  end

I used a state machine to drive the process. Bad Idea

1
2
3
4
5
6
7
8
9
10
11
  state_machine do
    state :email, :exit => lambda {|reg| reg.errors.clear }
    state :age, :exit => lambda {|reg| reg.errors.clear }
    state :complete, :enter => :register_user

    event :next do
      transitions :from => :email, :to => :age, :guard => :guard_to_age
      transitions :from => :age, :to => :complete, :guard => :guard_to_complete
      # do not allow complete => complete
    end
  end

This was a lot more effort than it was worth. I saw something similar used while browsing the source code of spree and I wanted to give it a try. It seems so clean at first, but then you have to start building in one-off cases here and there, and soon your controller is a mess because everything is in update. The update is already a mess in this extremely simple example, it just gets worse.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
  def update
    @registration = Registration.find(params[:id])
    redirect_to users_path(@registration.user_id) and return if @registration.state == 'complete'

    if @registration.update_attributes(params[:registration])
      if @registration.next!
        if @registration.state == 'complete'
          #can only reach this block on first completion -- or next will have failed
          sign_in(:user, @registration.user)
          redirect_to user_show_path and return
        else
          redirect_to registration_state_path(@registration, @registration.state) and return
        end
      else
        flash[:alert] = @registration.errors
        render :template => get_template_for_state(@registration, @registration.state) and return
      end
    else
      flash[:alert] = @registration.errors
      respond_with(@registration, :location => registration_state_path(@registration, @registration.state))
    end
  end

I strongly regret not just having a different action for each view, with a different action for each update of the step. Live and learn.

Duplicate Data or Manual Validations or Duplicate Manual Validation

With a multistep registration flow, I realized that you’re left with a tradeoff: you can either get into validations manually or you can duplicate data. If you use a registration object, like I did in the example, I’m collecting data from the user which I will use to build the actual model. The result is that I need to perform any data validations that the User model wants on the data as I collect it. The example from my code is that I need to check on the uniqueness of an email, required by the User, during the registration.

1
2
3
4
5
6
7
  def require_email
    errors.add(:email, "Email is required") unless email.present?
    if email.present?
      errors.add(:email, "Email is already taken") if email_in_use?(email)
    end
    errors.empty?
  end

The alternative is to put the data you’re collecting directly into the model as you collect it so the validations happen as you go, but then you probably need to have a ‘not ready’ state/flag on the model. To illustrate, suppose we create a User at the time we create the Registration. However, to be valid, our User requires a name, email, and age. Until the User has all those values, they are not valid. If users have only given us their name and email, but not their age, they should not be able to log in or use the system. This means that you’ll need to account for this everywhere you interact with the User model. You could create a default scope, but then you need to be aware when to undo that.

Between the two options, I lean towards the ‘not ready’ flag. This will scale as you add values, data and validates live in one model, and, if you use default scoping, you should be able to isolate the knowledge of the update flag to the Registration object. However, if you cannot do away with the registration object, then you will have large ugly update methods, because they will have to update both the Registration and the User.

Conclusion

The final conclusion is that I really dislike multistep registration. All the approaches I’ve found have tradeoffs. Having built a complex registration flow once, I’m much stronger for it.

Permalink | Leave a comment  »

July 24, 01:18 AM

I really do not like Rails Time Zones. More specifically, I do not like solutions and questions that are easily findable by Google. I'm sending an email, and had the following string in the text: "... is only valid until 2011-06-10 03:30:00 UTC". Not what I want to be sending. So how do I convert that? Googling is a pain because most people seem to be handling this in a 'global' way by setting Time.zone. Ryan Bates does it here, http://railscasts.com/episodes/106-time-zones-in-rails-2-1. Here's a typical answer on StackOverflow. They're all just doing a `Time.zone=`.

Why don't I like that approach? Well at first it was setting off my concurrency-spidy sense. After seeing the solution everywhere, I checked the source, and that method is (unintuitively) setting a thread-local variable. All I wanted to do was change the view! I don't want to have to worry if this could impact other parts of my code, even if it was just in that request. (The root cause of this is that I have a date-heavy part of my code which I know is not time zone aware. Yes, I have to fix it.)

Anyway, after hunting through the source, I finally found in_time_zone. Now all you need is the magic string to pass into it. I wanted Eastern, so I needed

Time.now.in_time_zone("Eastern Time (US & Canada)

All magic strings available in ActiveSupport::TimeZone.us_zones

$ ActiveSupport::TimeZone.us_zones => [(GMT-10:00) Hawaii, (GMT-09:00) Alaska, (GMT-08:00) Pacific Time (US & Canada), (GMT-07:00) Arizona, (GMT-07:00) Mountain Time (US & Canada), (GMT-06:00) Central Time (US & Canada), (GMT-05:00) Eastern Time (US & Canada), (GMT-05:00) Indiana (East)]

Great learning experience for me.

Permalink | Leave a comment  »

July 20, 09:14 AM

Today I spent much of the day trying to get a migration to run in Rails. It's on a table with hundreds of thousands of rows and a couple hundred MB (not really that big, IMHO). The job consistently failed with a 'failed to allocate memory' error at approximately row 150 000.

The migration is a logical one -- we've added a new column, and this migration is a backfill. We have some business logic around what we want in this column, so it's something I want ruby code to set rather than recreate our business logic in SQL.  I spent a lot of time with find_each, trying to wrap the jobs, pulling the logic into its own rake job, and anything else I could find. No go.

In the end, I can get 100 000 rows to migrate (back-fill), so here we are running a script locally to migrate our table 100K rows at a time. In case anyone wanted to view the horribleness that is how we're doing our backfill, it's included below. What's going wrong in for_each that it's not able to operate over more than a 150K rows? Since I'm saving back during each block, is that causing issues? Here's the (scrubbed) code

namespace :logical_migration do   task :task_name, [:from_index, :to_index] => :environment do |t, args|      last_min = args.from_index.to_i      (args.from_indx.to_i..args.to_index.to_i).step(1000).each do |this_max|       Model.find(:all, :conditions => ["id >= ? and id < ?", last_min, this_max]).each do |m|          m.make_our_change          m.save!          puts m.id.to_s if m.id % 500 == 0          last_min = this_max        end      end    end end The puts there is because this job runs for quite a while, and I like feedback so I don't panic and kill it.

Permalink | Leave a comment  »

July 03, 08:31 AM

This one caught me totally off-guard. My error definitely comes from ignorance about how the built-in Rails cache works. The built in Rails cache, reachable with `Rails.cache` is file-based by default. Now that I'm at this juncture, it was wrong of me to have assumed anything. It clearly names the default in the Rails Guides. So, if you're doing `Rails.cache.fetch` anywhere in your code-base, you've got to be aware that this causes causes caching between runs of rspec. This means that, say, were you trying to simulate a good result on one run and an error case on another, that you'd have trouble reproducing the error-case. I've written up an example to illustrate. The change is available on github.

UPDATE: So, I've decided that there's something up since there's a flag set in the env test.rb file that should have disabled caching for tests. I'm in the process of following up with rspec-users and, if needed, a rails group.  

UPDATE 2: The flag in test.rb is for the controller level caching.  I do a Rails.cache.clear befor every test now.

 

 

Permalink | Leave a comment  »

June 23, 01:31 AM

How often do you write code like this?

if x
x.do_something_awesome
end

Or maybe you do stuff like:

if my_hash && my_hash["key"]
# do something aweseome
end

The point is, I very routinely use conditional statements that rely on nil evaluating to false. This is all fantastic, and very convenient, but it’s important to note that nil != false.

ruby-1.9.2-p180 :055 > nil == false
=> false

Lets do an example:

ruby-1.9.2-p180 :069 > nothing = nil
=> nil
ruby-1.9.2-p180 :070 > if nothing
ruby-1.9.2-p180 :071?> puts "turns out we have something out of nothing"
ruby-1.9.2-p180 :072?> else
ruby-1.9.2-p180 :073 > puts "sure enough, nothing is nothing"
ruby-1.9.2-p180 :074?> end
"sure enough, nothing is nothing"
=> nil

Hopefully this was the result you were expecting. Now, lets add a very slight twist. Lets try to use our nothing in other logical operations.

ruby-1.9.2-p180 :106 > new_nothing = true && nil
=> nil

We still have nil. To me, I would have expected this operation to return false, but it still evaluates to false, so maybe we’re good.

Lets look at an example where using nil as if it were false can get us into trouble:

ruby-1.9.2-p180 :107 > {"my_param_1" => false, "my_param_2" => nothing}
=> {"my_param_1"=>false, "my_param_2"=>nil}
ruby-1.9.2-p180 :108 > {"my_param_1" => false, "my_param_2" => nothing}.to_json
=> "{\"my_param_1\":false,\"my_param_2\":null}"

This seems like a contrived example, but if your controllers support json or xml responses, it might be more of a danger than you think. The worse part is if you have meta data, and you want to do something like:

ruby-1.9.2-p180 :109 > {"my_param_3" => true && nothing}.to_json
=> "{\"my_param_3\":null}"

Now hopefully we see the danger. The way around this is to force an evaluation to false. The easiest way that I know of is to use the double-bang operator (which is really just the bang operator twice): !!

ruby-1.9.2-p180 :110 > {"my_param_3" => !!(true && nothing)}.to_json
=> "{\"my_param_3\":false}"
ruby-1.9.2-p180 :111 > {"my_param_3" => !!(true && true)}.to_json
=> "{\"my_param_3\":true}"

Now we can rest easily.

Also, this is a good lesson on why testing trivial things from your models and controllers is often not a waste of time.

A slightly more concrete example? Okay. How about sharks with lasers attached to their head? … well, couldn’t find any of those, so:

class SeaBass
attr_accessible :mutated, :ill_tempered, :laser_equipped
def ready_to_impersonate_shark?
self.mutated && self.ill_tempered && self.laser_equipped
end
end

Now lets say we were doing an inventory of our SeaBass from our web console, and want to have a simple 2 column view:

SeaBass ID Ready to Impersonate Shark
1 true
2 false
3 true
4 false

You might be tempted to write something like:

<table>

<tr>
<th>SeaBass ID</th>
<th>Ready to Impersonate Shark</th>
</tr>
<% @seabass.each do |seabass| %>
<tr>
<td><%= @seabass.id %></td>
<td><%= seabass.ready_to_impersonate_sharks? %></td>
</tr>
<% end %>
</table>

 

This would leave you with

SeaBass ID Ready to Impersonate Shark
1 true
2
3 true
4

So it doesn’t look perfect, so now you’re in the fire. What a great place where the !! operator could have been used.

Didn’t get the jokes?

Permalink | Leave a comment  »

June 21, 06:36 PM

Firefox 5 Looks Great, but ... I use a tonne of plugins. Which ones work with Firefox 5? Is there a way to know how horribly broken I will be if I upgrade, before I upgrade? Maybe I'll just wait for 6 to come out before upgrading to 5.

Permalink | Leave a comment  »

June 21, 09:34 AM

It's been a little over two weeks since I upgraded my MacBook Pro to 8 GBs of RAM and a SSD. I have been extremely pleased.

The Details

I have a 15" 2.66 Core 2 Duo MacBook Pro from 2009. It's detailed specs are here.

I first went to a Fry's store nearby to buy the memory and hard drive. I went under-prepared thinking the technicians there would be knowledgeable and helpful. They were not. Returned everything, and tried again.

My second attempt a lot better, and I recommend following in my footsteps. I went through Other World Computing, and I bought:

  • 1 of 115GB Mercury EXTREME Pro SSD 2.5" Serial-ATA 9.5mm Solid State Drive
  • 1 of DIY KIT: 115GB Mercury EXTREME Pro SSD +OWC USB 2.0 Express 2.5" Enclosure Kit
  • 1 of 8.0GB (4.0GB + 4.0GB Kit) PC-8500 DDR3 kit
The DIY kit for the SSD is worth it. Comes with a USB enclosure for your old (or backup) HD as well as the screwdrivers needed to get into a MacBook. Note, that there are 2 really obnoxious 5-pointed screws holding in the battery in place. If you're thinking of putting a hard drive into your optical drive bay, you'll need different equipment than what comes in the DIY kit. The word 'kit' for the RAM is misleading, it is just the RAM. I also already had another external drive available. If you will need more than 100 GB, I suggest getting another external drive.

Step 1: Install Carbon Copy Cloner. What a GREAT product! Seriously! Step 2: If needed, reduce the contents of your primary hard drive to under 100 GB. I did this by moving movies onto my external drive. Step 3: Hook up one of your new SSDs into the external USB enclosure that came in the kit, and format it. (Use the built in Disk Utilities program: Applications > Utilities > Disk Utilities. Here's a how-to. Step 4: Clone your hard drive to the external drive with Carbon Copy. Step 4.5: At this point, you can try out booting from your newly cloned HD to make sure it works before you replace your existing drive. (Hold the option key as you restart to get the choice of which to boot from. Took me a few tries to get it. Apple's article.) Step 5: Shut down your MacBook. Step 6: Take off the bottom. (The screws were really really tight for me. Took a while to get them out. Keep a note of where they go, 3 are significantly bigger.) Step 7: Swap out the RAM. Here's a video of someone doing that. Step 8: Swap out your HD for the new SSD that you just cloned to. Step 9: Re-assemble and power on.

Potential problem: * it doesn't turn on, it just beeps. This happened to me on my first attempt. The cause was that the memory the guy at Fry's sold me wasn't compatible with my MacBook.

Now, why did I buy the second SSD? Because I heard that SSD's fail like mad. So I backup my HD every night using a scheduled Carbon Copy job. If anything every happens, I'll just be able to swap out my SSD.

The Results

First off, I am a power user.  On a regular day, I'm writing software, using numerous terminal windows, I often have multiple servers running, and maybe there's also a database in there.  I also have both Chrome and Firefox open, each with a bagillion tabs open.  On top of that, I'm usually listening to music, reviewing something in Preview, skyping, gchatting, etc etc. Secondly, this is totally anecdotal and non-scientific. Under my old setup, startup was slow.  Every couple of days I'd need to reboot, and Firefox would routinely spiral out of control and need to be restarted. That is to say, my computer would just get really sluggish.  Firefox would show the worst of the symptoms, but it wasn't just Firefox.  Restarting Firefox buys some time, restarting the computer is 'the cure'. In my current setup, startup is almost instant.  That is to say, from logging in to ready to go with browsers open is super fast.  I also went almost 2 weeks without needing to reboot.  Today was the first time I needed to shut down due to sluggishness.  I was hoping to notice increased speed in my tests, but I do not.  However, I have noticed that I am able to run 'significantly' more servers.  In my old setup, I remember looking through terminals to find a Rails process that was still running because I suspected it of causing other activities to slow down.  I have not done that yet -- usually I'm running out of open ports on which to start listening. Anyway, bottom line is: was it money well spent?  and Would I do it again? Yes and Yes. I really like this new setup, and I wish I had made the upgrade sooner. This journey inspired by Coding Horror.

Permalink | Leave a comment  »

June 18, 09:20 PM

The answer is: a free mug. Yesterday, The Daily WTF had mugs, and then ran out. I was bummed. I need a new coffee mug, I've been looking around online. I was all set to one with a vi reference mug, but I have a $5 gift card with ThinkGeek, and they've been out of stock for a while now. To today I see that Microsoft is offering a free Daily WTF mug to sign up for a no-obligation trial of Azure. And you know what? I plan on at least digging through the docs and spinning up an instance or two to see how it works. Incredible the power of a promotion specifically tied to a site I visit regularly can have. For ~15$ per relevant person, it's probably money well spent. If they could tie the promotion to actually spinning up an instance, then they could probably offer bigger incentives. I think they should work on that so that I can get a Hacker News Hoodie.

Permalink | Leave a comment  »

June 18, 08:12 AM

The other day I tweeted that Paypal is now threatening to cut me off if I don't agree to accept any statements they want to send me by email. No, I don't think that Paypal has ever mailed me anything, but there it is nonetheless. Today, I was on Netflix to reactivate my account, and chose to pay by Paypal. I've done this before, and I've noted then that Paypal tries to get you to use direct debit from banks over other forms of payment. There'$ an obviou$ reason for this: direct debit is significantly cheaper for Paypal than processing credit cards is. An obvious way for Paypal to boost margins is to drive more traffic to direct debit and away from credit cards. Anyway, as of today, I cannot choose my payment method. I can simply agree to pay by Paypal, and they will choose my payment method for me each month as my subscription fee is due. Here's what that looks like. Note that the "Payment Method" is simply "Available Funding Sources" with no options there. (I clicked the other links on the page, none would allow me to change the method.) So I checked out their policies. The relevant section of the policies are listed below. Suffice it to say, that Paypal will default to using the payment methods most favorable to them. Specifically, if you have a balance, then you must use that balance, then onto Paypal-affiliated payment methods, and so on. I do not want to have my Netflix subscription randomly withdrawing from money someone else has sent me through paypal, withdrawing from my bank, or charging my credit card, depending my respective balances. (Also, were I running close to zero in my bank account, I'm sure my bank would be happy to approve an $8 auto-debit reduction and then charge me a $30 overdraft fee. Any reporterlitigious lawyer want to dig for a conspiracy here?) The result of this might be nothing. I'm picky and whiny by nature, and I like to do things my way. However, what I see is the dominant player in an industry restricting the flexibility of its products, reducing the satisfaction they give to their customers, and lowering the value of their products. The last point is because I didn't sign up for Netflix as a result -- I probably will later, but my wallet is somewhere in the other room and I'm THAT lazy. Netflix will undoubtedly see me as a statistic of abandonment in the Paypal checkout pipeline, but maybe others are uncomfortable giving Paypal that kind of leeway, too. Either way, every day longer it takes me to go sign up for Netflix is lost revenue which can be attributed to a Paypal product. Paypal looks very vulnerable to me. They had a very innovative service 10 years ago, and it hasn't materially changed much. They had a big lead, and are milking it quarter by quarter, but they seem to be really lacking in innovation. There are a tonne of start ups getting traction in the payment space *cough*Square*cough* while Paypal's bread and butter eBay play is eroding (eBay's "growing" at 13% vs. Amazon's 84%). So what's Paypal's strategy to defend and extend their lead? Make their customer experience worse in order to improve margins! Will Paypal roll over and die tomorrow, certainly not; but it's no growth play. It's going to do exactly what Microsoft did 10 years ago, and just go sideways (exactly as it has been doing for 5 years now). It will come out with some interesting tidbits here and there, but just keep relying on it's old trick. Square (or someone else) is going to be the Google or Apple to Paypal's Microsoft. So ... going to go short on eBay. I figure I can't lose -- the stock is certainly not going up.

Default Payment Methods PayPal will fund your transaction from your payment sources on file with PayPal in this order, unless you make a change as described below: PayPal Balance Instant transfer from your bank account (if eligible) PayPal Credit (Bill Me Later, PayPal Extras Card, or PayPal Smart Connect) Debit card Credit card eCheck (a delayed transfer from your bank account - may result in significantly slower shipping by seller) Changing the Payment Method You may change the Payment Method at the time you make a payment by clicking the 'More Options'/'Change' link on the Confirm Your Payment/Review Your Information page and then selecting a payment method on the "More Funding Options" page. You may do this each time you make a payment if you do not have a Balance. If you have a Balance, you must use your entire Balance before you can change the payment method. You cannot select a payment method for all future transactions, except that if you have been approved for PayPal Credit you may select PayPal Credit as your preferred payment method. You may do so by logging in to your Account, selecting “Profile”, selecting your PayPal Credit product, and then setting it as your preferred funding source. Payment Methods may be limited for a transaction, including if you make a PayPal payment through certain third party websites or applications. For Business Payments, you are limited to funding your PayPal payment with either (or both) your Balance or by eCheck.

Permalink | Leave a comment  »

June 16, 06:57 PM

Someone was asking about how to write a compiler. An excerpt of an answer was "Go to college, specialize in software engineering." LOL http://programmers.stackexchange.com/questions/84278/how-do-i-create-my-own-p...

Permalink | Leave a comment  »

Posts

March 13, 04:31 PM

1
elastic-beanstalk-describe-application-versions -a APPLICATION_NAME | sed '1,2d' | sed '/CURR_APP_VERSION/d' | cut -d '|' -f 6 | xargs -L 1 elastic-beanstalk-delete-application-version -a APPLICATION_NAME -l

sub in application name and sed out currently-used version

Permalink | Leave a comment  »

March 06, 10:34 PM

plutil -lint filename.plist

Permalink | Leave a comment  »

January 08, 11:01 PM

I deploy to AWS Elastic Beanstalk via API using scripted/templated config files.

This is the entry that controls logs being pushed to S3.  Disabled by default, does not appear to be anywhere to specify the bucket.

{
  "Namespace": "aws:elasticbeanstalk:hostmanager",
  "OptionName": "LogPublicationControl",
  "Value": "true"                                                                                                                                                                                                                                                             
},

Permalink | Leave a comment  »

January 03, 10:48 PM

I had a problem with a UniformInterfaceException today.  Took me a while to track down.  The problem is that I have nested Java objects in my response object getting serialized, and I had deleted the default constructor of one of them.  Not exactly sure what Jersey needs with the default constructor to serialize (makes total sense for deserialization), but that's what caused the error.  Was particularly nefarious to track down because everything in my code worked great, but my client was getting 500's.  This was because the error happens on serialization which happens after my code returns.  

Permalink | Leave a comment  »

November 24, 01:42 AM

I can't believe this page exists and I'm just finding out about it.

Permalink | Leave a comment  »

November 23, 06:13 PM

For some reason, I have trouble finding this list.

Permalink | Leave a comment  »

November 23, 02:50 PM

Quite possibly the best tool of all time when working with AWS.

JSONViewer

Permalink | Leave a comment  »

October 13, 04:20 PM

For search and replace, to get it to be interactive (prompt at each occurrence whether to make the substitution or not), use c

%s/old/new/g

Permalink | Leave a comment  »

October 08, 01:43 AM

I feel like I matured some time in the last few years and never realized it. I've been working full time with Ruby for more than a year now, and I just learned how to create global variables -- prefix them with $. So $var is accessible from everywhere. I have not used them yet, and I don't intend to start, still, TIL.

Permalink | Leave a comment  »

September 11, 06:19 PM

When I’m coding up a feature, and I accept options to be passed in a hash, I often take a shortcut. Rather than check if the key is there, and if the key has a value, like so:

1
2
3
4
5
6
# The way I basically always check
def my_method(options = {})
  if options[:key]
    # do something using the option here
  end
end

I would go out on a limb and say that this (or something very close) is the proper way to check this:

1
2
3
4
5
6
# Probably the proper way to check a hash for options
def my_method(options = {})
  if options.has_key?(:key) && !options[:key].nil?
    # do something using the option here
  end
end

Today, the shortcut (or my poor option-naming skills) cost me a bit of time. I had an option, which was either true or false. The problem is that the first form of my option checking won’t execute the option is set to false. Even though I want to take action whenever the option in question is set (to true or false). Here’s what I really wanted my logic to be:

1
2
3
4
5
6
7
8
9
10
# What I really wanted my logic to be
def my_method(options = {})
  if options.has_key?(:key) && !options[:key].nil?
    if options[:key]
      # do something using the option here
    else
      # do something else
    end
  end
end

I’ve decided my true error was in the way I named my option. The shortcuts work if you follow convention, and the amount of effort to force the rest of the program to conform to my first instinct in naming is big.

Something to watch out for.

Permalink | Leave a comment  »

August 23, 02:26 PM

http://www.sqlite.org/lang_expr.html
http://www.postgresql.org/docs/8.2/static/functions-comparison.html

Long story short, doing something like this in sqlite works just fine:  `SELECT * FROM table_name WHERE column_name IS NOT "sample string"`.  The same query in postgresSQL raises an error.  In this case, I should have been using either the != or <>.  In Postgres, the IS NOT is used for comparisons to NULL.  Fixed by using:  `SELECT * FROM table_name WHERE column_name <> "sample string"` which is working for me now in both databases.  

Permalink | Leave a comment  »

August 16, 02:49 PM

https://github.com/rspec/rspec-core/blob/master/lib/rspec/core/hooks.rb#L146

As if ?!?  I feel like I've written a million (:each) statements for nothing.

Permalink | Leave a comment  »

August 15, 05:20 PM

Ever have multiple callbacks? Ever have them depend on each other. I have, sometimes without realizing it. You can put them in order, but it’s not obvious to someone coming along that there’s any dependency, and it could be a nasty bug to track down. Here’s an example of an order dependent callback.

1
2
3
4
5
6
7
8
9
10
11
12
class MyModel < ActiveRecord::Base
  before_create :action_1
  before_create :action_2 # might depend on action_1

  def action_1
    self.mydata = "default"
  end

  def action_2
    self.mycomplexdata = default + "more data"
  end
end

I used to put the two methods inside another method, but that required another method, often with a much less descriptive name than the individual actions to be taken, and required you to navigate the file looking for callback definitions. A much better way, IMHO, is to use lambda.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
class MyModel < ActiveRecord::Base
  before_create lambda {
    action_1
    action_2
  }

  def action_1
    self.mydata = "default"
  end

  def action_2
    self.mycomplexdata = default + "more data"
  end
end

Clean callbacks at the top of the file, and order is less likely to be broken by ‘standard’ refactoring.

Permalink | Leave a comment  »

August 15, 02:00 AM

I’m developing a feature tonight in which we have a new email that we’re going to start sending. I’m developing on my own branch. 1) I like just linking to images rather than shipping them as payload 2) I want to test what my email looks like fully rendered 3) The images aren’t in my master branch of my code (and thus, not deployed)

Thanks to Jason Rudolph for this one http://jasonrudolph.com/blog/2009/02/25/git-tip-how-to-merge-specific-files-f...

Now I just git checkout my images into master ahead of my features. They can go to production ahead of my changes going to master. When I’m ready, I can merge and deploy. I haven’t tested this, but since I’m doing this through git, I would fully expect that, should I choose to change or delete these images, when I eventually merge my branch to master everything will get cleaned up.

Joy

Permalink | Leave a comment  »

August 12, 05:54 PM

array.should =~ another_array

Update: mmm, I've just run another test with this and have == returning true, but =~ returning false.  

Permalink | Leave a comment  »

Latest checkin

Badges

Checkin history

Friends

Posts

May 19, 08:39 PM

number, I know, but that's the way it's worked out.

May 19, 05:14 PM

ms the body through its role in diabetes, obesity and fatty liver, this study is the first to uncover how the sweetener influences the brain.    Sources of fructose in the Western diet include cane sugar (sucrose) and high-fructose corn syrup, an inexpensive liquid sweetener. The syrup is widely added to processed foods, including soft drinks, condiments, applesauce and baby food. The average American consumes roughly 47 pounds of cane sugar and 35 pounds of high-fructose corn syrup per year, according to the U.S. Department of Agriculture.

May 18, 01:38 PM
May 18, 11:55 AM

ontinues to get stronger every year?  It doesn’t have to be that way.  I was wearing progressively stronger lenses for my nearsightedness until ten years ago I accidentally stumbled upon a method that allowed me to acheive 20/20 vision and throw away my glasses within a year.  For the past decade I have not worn glasses or contacts, but I am able to drive, read, and see everything clearly and sharply.  The secret was learning how to actually change my eyes so that they could focus clearly on any objects — near or far, without wearing glasses.  The method I used is one of the best examples of the self-strengthening technique called Hormetism, the focus of my blog, which I’ve ap

abcdefghijklmnopqrstuvwxyz abcdefghijklmnopqrstuvwxyz