credit: https://twitter.com/winsontang/status/1118062617661071360 (via Noam Ross?)

Don’t use DataCamp

A timeline of transgressions & community response

Daniel McNichol
18 min readJul 6, 2020

--

DataCamp has repeatedly shown itself to be an unethical company acting in bad faith & causing harm in the community which it exploits to build its brand. It doesn’t deserve our support, is easily replaced & is frankly not a great product anyway.

If you care, & can: boycott this unethical, negative force in data science.

TL;DR

After an initial incident involving the CEO’s sexual harassment of an employee, DataCamp had multiple opportunities to do right, regain trust & attempt to rectify the harm done. Instead they perpetuated a string of disingenuous, unethical actions, compounding past transgressions & violating the trust of the community they exploited to build their brand.

The latest of these transgressions is DataCamp’s decision to sue RStudio for ‘defamation’, deceitfully describing them as ‘competitors’ when in fact they were partners across distinctly different industries, prior to RStudio severing their relationship due to DataCamp’s unethical actions*, which endanger the community.

* as RStudio’s initial statement indicates, this occurred during contract renegotiations, which DataCamp is apparently now using to distort RStudio’s ethical stance as anti-competitive behavior, which is (imo) preposterous given RStudio & DC’s respective business interests.

This post will lay out the timeline of these events as concisely, exhaustively & objectively as possible, with evidence, to paint the full picture of DataCamp’s poisonous actions in the data science community.

But don’t just take my word for it. A broad group of long-standing trusted & respected community members have spoken out on this issue, including dozens of DataCamp instructors, so I’ll collect many of those statements first.

As a rule, I won’t include direct links to the victim’s personal accounts, or those close to her, although she has bravely endured re-traumatization time & time again by publicly identifying herself & commenting repeatedly. I don’t want to add to that cycle or make her any more of a target for backlash, but think it is important to foreground her experience & statements. (Open to feedback here)

If you’re a community member or organization of conscience in the data world & have a similar reaction to DataCamp’s behavior & endangerment of the community, please speak out publicly as well. We can’t allow unethical profit-driven companies to succeed in silencing criticism of their pernicious actions, as they are attempting to do with this frivolous lawsuit, & by sending their “general counsel” out on Twitter to scaremonger individuals with threats of defamation from an apparent burner account.

Disclaimer & Motivation

I have no inside information or perspective on this issue. I have never worked for DataCamp or RStudio, but have been a customer of both. I’m just a data nerd, in my capacity as a private citizen in a community which I’d like to be as ethical and humane as possible. My motivation here, & in most of my life, is to apply effort to areas of potential leverage in contributing to public goods. This seems worth the effort, for whatever leverage it can muster.

Data, particularly as it increasingly overlaps with Tech, suffers from lack of diversity on several fronts: race, gender, class etc. This makes it prone to harmful power dynamics in the form of racism, anti-blackness, sexism, misogyny etc. These are often systemic, subtle & insidiously perpetuated. So it’s important to vigorously oppose these flagrant, highly visible instances, & ensure continued transgression is not tolerated. Otherwise, there’s no hope of building more just & equitable communities.

Public statements from trusted orgs & individuals

It’s hard to get the full context of these complex situations. But when a critical mass of serious, compassionate, notable figures in the community (including dozens of DataCamp instructors) speak out on an issue instead of taking the path of least resistance & “keeping their head down”, maybe it’s not “groupthink”, maybe the actions in question were truly condemnable & worthy of being identified as such.

Journalistic accounts

I’ll start with several news accounts from publications with serious editorial standards which cover various aspects of the saga. Whatever your opinions of ‘the media’, serious journalistic institutions (imperfect as they are) have the most evolved standards & practices for general public truth-telling in our society. (Let’s just ignore cable news.)

  • Buzzfeed News (the Pulitzer finalists, not the cat quiz dept.) published a major investigative piece about the incident & aftermath, including a description of the original offense. Though DataCamp now claims that the article ‘misrepresents’ the incident, its own 3rd party ‘independent investigation’ [pdf], basically confirms Buzzfeed’s account, but with more anodyne, legalistic language (see screenshots below). The author of the Buzzfeed piece (now, appropriately enough, covering misinformation & tech at the New York Times), stands by the piece & gave some background on her reporting process. The piece gives a crucial account of the early timeline of events, including DataCamp’s self-serving half-steps, callousness, employee-retaliation & mismanagement of the scandal. However, published in May 2019, it doesn’t cover more recent harms perpetuated by DC, see below for full timeline.
Buzzfeed account (1st pic) foregrounds the victim’s experience. DataCamp’s independent 3rd party review (2nd pic) confirms the substance of Buzzfeed’s account, but uses more clinical, euphemistic & minimizing language, foregrounding the perpetrator’s perspective. One apparent discrepancy is the latter’s weird choice to note that no one used the word “grope” in their interviews, as if it that’s a meaningful fact given their own description of events fits the dictionary definition of that word. DataCamp’s public statements regularly tout this report as being more exculpatory than it actually is, probably because they figure the 30-page length & dry, bureaucratic style will deter casual readers & dilute the severity of the described transgressions, so they can project whatever ‘reality’ they want onto it, Trump-style. This will become a hallmark of DataCamp’s harmful, disingenuous, self-serving behavior.

Organizational responses

  • R-ladies is a “world-wide organization to promote gender diversity in the R community”, & frankly beacon of goodness in the generally monolithic & amoral tech/data/stats world. They were among the first organizations to speak out about DataCamp’s behavior, first on twitter, then in two detailed blog posts: the first (April 2019) a general response to the newly public incident, with many links to further reactions from the community, the second (Oct 2019) a comprehensive response & criticism of DataCamp’s problematic aforementioned 3rd party assessment report.
  • Women+ in ML/DS is a non-profit who’s mission is “to support and promote women and gender minorities who are practicing, studying or are interested in the fields of machine learning and data science”. They put out a similar statement in early April 2019.
  • RStudio are the developers of the eponymous IDE & innumerable other open source data science tools. They recently incorporated as a Public Benefit Corporation & Certified “B” Corp to enshrine their public-interest mission & values. More on the (imo) frivolous & laughable lawsuit by DataCamp at the bottom of this article. But RStudio announced they would sever their partnership with DataCamp on their “RStudio track” of courses in early April 2019.

We felt then, and still do, that DataCamp took insufficient action and they have sent the message that you can get away with sexual assault and sexual harassment if you’re in a position of power.

  • Pycon, a conference for Python users, did not have policy in place to remove sponsors, but released a statement saying:

The PyCon staff is saddened to hear that one of our sponsors, DataCamp, had an incident where one of their employees was sexually harassed. We were also distressed to find it was unclear if Datacamp had addressed this incident with the seriousness it requires.

Individual responses (including DC instructors & employees)

Mostly random via my own timeline, in no particular order except leading with Julia. (And again, intentionally not linking directly to the victim’s profile)

  • Julia Silge is a superstar data scientist in the R community with a PhD in astronomy who co-wrote the book on Tidy Text Mining in R. She was working at Stack Overflow & as an instructor for DataCamp during the initial phase of the scandal, & now works at RStudio. She’s been one of the foremost voices trying to hold DataCamp accountable, from both (semi)inside & out. Her humane, tireless & dauntless efforts have been an inspiration to many, including me. Her initial thread on the matter raised my awareness of the issue & linked to statements of others. But her letter to DataCamp is probably the most powerful & illuminating depiction of all of the machinations & underhanded behavior (e.g. exploitative instructor contracts & hiding their “public apology” from search engines) that would come to typify all of DataCamp’s known actions in the year that followed. She wrote in part [emphases added]:

I am deeply disappointed that this was their response. The main problem is how the leadership of DataCamp has chosen to deal with and disclose an incident like this. Although the post does clearly say that what happened was inappropriate and that the dynamic between an executive and an employee makes that particularly egregious, detail is used in harmful and victim-blaming ways. Every detail that might possibly put DataCamp in a better light is included, and details that provide a counternarrative are excluded. This is particularly frustrating to me because I have given feedback to multiple individuals at DataCamp that this kind of language is unproductive and unhelpful for rebuilding trust with instructors and the broader community, as well as largely unpersuasive to most readers.

Perhaps you have noticed that searching for information about sexual misconduct at DataCamp does not surface their own post. This is because the company added a noindex flag to this post (and only this post, unlike their other blog posts) so that it would not be indexed by search engines like Google.
This particular choice on the part of DataCamp is true to the character of the rest of my interactions with the company over the past year or so. I have hesitated to go into a lot of detail publicly about what’s happened with me because others have experienced much worse, but I will share a few things for context. Employees refused to respond in email/writing about concerns I raised, and instead always deferred to scheduling (time-consuming and yet unproductive) one-on-one calls. There was one group meeting for instructors who had raised concerns, but it was organized as a webinar where instructors could not speak, could not see who else was in the meeting, and could not see questions typed by other participants.

Highly recommend reading the whole post if you really want to understand the context here.

Two DataCamp employees who spoke out internally were fired under extremely sketchy circumstances, & were offered more money in exchange for signing NDA’s replete with threats that any mention of the sexual harassment & assault would be “at your own peril”. Both refused & spoke out publicly:

[July 2020 Update] Dhavide continues to courageously speak out about DataCamp’s behavior, on twitter & most recently at SciPy2020 on a diversity & inclusion panel:

Many other DC instructors spoke out & asked the community to boycott their own courses on DataCamp (as they were unable to remove the content):

  • Noam Ross, PhD & Principal Scientist for Computational Research at EcoHealth Alliance, was another early galvanizing force behind the community response, & his first blog post has been updated with TONs of info & links to other responses & resources.
  • Ines Montani, another data-sci luminary, co-founder of Explosion, the makers of spaCy, a leading open-source library for Natural Language Processing in Python, and Prodigy, a machine teaching tool powered by active learning:
  • Dhavide Aruliah (PhD researcher & former DC employee mentioned above, fired under sketchy circumstances), poignantly captures the poisonous dynamics of the larger situation:
  • Os Keyes, PhD student researching gender, disability, technology and power
  • …realizing that Noam Ross did a better job at this than I can hope to, so check out the bottom of his post for a fuller picture:
screenshots from Noam Ross’ live link collage of DC instructors urging a boycott of DC

Note what happened here: all of these instructors had to appeal to the community to boycott their own course, because DataCamp would not honor their requests to take down the content they created.

DataCamp sought out respected community members to create content, leveraged the general good will within the community to sign them to (often unknowingly) unfavorable contracts, exploited their reputations to build DC’s brand, then ignored their ethically-driven concerns & requests to remove their content.

Exploitative, predatory behavior.

DataCamp employees likely silenced by NDA

We’ve already seen DataCamp’s paranoid, vindictive predilection for opacity, deflection & mendacity, including the weaponization of NDAs. So it’s unsurprising that otherwise empathetic & outspoken DC employees (as opposed to instructor contractors) were conspicuously silent on the matter for most of the saga. There was a brief exception in late April 2019, which had the feel of an internal HR/PR-approval: (notably, both Hugo & Dave have since left DC)

Responses from the broader data community

Again, this outpouring was (& continues to be) too broad & varied to fully capture here, so I’ll just leave some of the higher profile instances that I’ve noticed:

  • Angela Bassa, head of Analytics, Data Science, & Machine Learning
    at iRobot, MIT-trained mathematician & IMO all-around badass & impeccably correct person, has spoken out often:
  • Mara Averick, data influencer extraordinaire, Tidyverse advocate at RStudio, has been regularly outspoken & on-point, especially re: the typical yet insidious patterns of bad faith & misogyny reflected in DataCamp’s behavior:
  • Jesse Mostipak, educator interested in data science &culturally responsive pedagogy, founder of the R4DS learning community, currently Community Advocate at Kaggle, has also been a consistent courageous voice contextualizing the transgressions of DataCamp within the larger context of misogyny in tech:
  • Hilary Parker, PhD, Data Scientist at StitchFix, co-host of popular data science podcast @NSSDeviations. (To be fair, Hilary has been mostly expressing genuine WTF!??ness, but is also illustrative of growing recognition of DataCamp’s malicious nature, IMO.)

Again, this is a grossly incomplete, mostly random (via my own TL) sampling of responses, meant to show that the uproar is not your typical “twitter mob” of faceless bots & burner accounts. This group of compassionate, ethically-minded, well-established orgs & individuals reflect a tremendous resource of the R & data-science community, & should be celebrated, supported & emulated by aspiring folks in this space (& others).

A timeline & list of DataCamp transgressions

This is the best timeline I can establish based on publicly available information. Much of it is covered in more detail in links provided above. Please comment with any corrections, qualifications or additions.

Timeline TL;DR

As previously said by myself & others (including the victim herself), the original incident, though egregious & unacceptable, could have been reasonably rectified by good faith accountability, apology & genuine effort to improve. Many long-standing, well-respected community members were employees or contributors to DataCamp, so they started with tremendous good will from the community. Instead, they systematically squandered that good will, revealing a rotten core of self-interest above all, amorality, deflection, minimization, deception, victim-blaming, gas-lighting, retaliation & playing-the-victim, etc.

To be clear, I don’t think DataCamp set out to be malicious or cause harm. But that is, unmistakably IMO, where they ended up & continue to entrench themselves, whatever the motivation or thought-processes at work. They did some very basic, bare minimum things right. But when you’re at fault for a fatal car crash, you don’t get extra points for wearing a seatbelt or paying your taxes.

Timeline of DataCamp’s transgressions

With sources hyperlinked. Transgressions bolded.

October 2017

January 2018

January 25 - March 6, 2018

June 2018

  • Two employees who had raised concerns internally about the incident were fired on the same day. Both were offered extra pay in exchange for strict NDAs, which would forbid them from speaking publicly about the incident (albeit via abstruse, ominous language). Both refused & later wrote about their experience (1a, 1b, 2). In one of DataCamp’s typically obfuscating & nakedly liability-mitigating blog posts, they…obfuscate & mitigate liability.
    (
    A clear picture of the company’s core ethos begins to come into focus here)

December 2018 - April 2019

April 2019

  • DataCamp posted the blog post which set off the initial public furor.
  • (the following verbatim from DataCamp’s 3rd party report):
  • The note was the first public acknowledgement of the incident and the company’s response and was emailed to every DataCamp instructor.
  • The statement did not specifically identify Mr. Cornelissen [CEO] as the executive in question.
  • The company did not inform Ms. Woo or DataCamp employees prior to posting the note,
  • — and the note initially contained a no-index code which meant it would not appear in general online searches about the company. This generated a new controversy over whether this was an effort by the company to hide the note from public view.
  • The note also included language and characterizations that were critiqued in some online postings.
  • (^^ that’s a characteristic understatement, see above, particularly the collage from the bottom of Noam Ross’ blog post.)

April 24, 2019

  • DataCamp announces:
  • — CEO will step down “for an indefinite leave of absence without pay”, but remained as board chair at this time
  • — it will bring in the 3rd party which eventually produces this report
  • — it will establish Instructor Advisory Board to “better hear and integrate the concerns and recommendations raised by the instructor community, and help hold the company and leadership team accountable to that input.” (this will never happen to the satisfaction of most observers)

April 30, 2019

  • After pressure, an update to the April 24th blog post announces that CEO will also step down as board chair & “be recused from the [3rd party review] and any decisions relating to his future role at DataCamp.” (stay tuned)

May 2019 - Oct 2019

Oct 2019

  • DataCamp posts the 3rd party report to its blog, ‘recapping’ selective findings which cast it in the most favorable light. Many in the community again speak out about some of the flagrant issues with the report, it’s framing & lack of concrete accountability attached. Sources: 1, 2, 3, 4, 5, 6 etc
  • Allen Downey, member of the Instructor Advisory Board put out an open call for feedback, to which the community obliged at length.

Nov 2019

  • DataCamp announced that interim CEO was now permanent CEO, & former CEO (perpetrator) will “support DataCamp with regular advice”. There is no explicit mention of the events leading up to this, nor any clarity around the former CEO’s continued role or compensation. (continued obfuscation, lack of accountability)

Dec 2019

  • DataCamp obscurely publishes an “IAB Q4 2019 Update” which is effectively void of content. (continued obfuscation, lack of accountability)

Jan 2020

May 2020

June 2020

July 2020

Totally normal non-comicbook-villain stuff.

Dec 2020

wayback machine snapshot
  • In response, Anaconda Inc announced that they “officially terminated our content license agreement with DataCamp”, earning applause from Dhavide Aruliah, former employee of both companies & DataCamp whistleblower:

…I might come back to write more about the silly lawsuit, but that’s enough for now.

If you appear in this post & don’t want to, please DM me on twitter or email me: daniel.mcnichol at gmail. Same for any other sensitive feedback. Otherwise, please comment with feedback.

--

--

Daniel McNichol

Founder & Chief Scientist @ Coεmeta (coemeta.xyz) | formerly Associate Director of Analytics & Decision Science @ the Philadelphia Inquirer