Showing 761 posts tagged data

Part of Spark’s appeal is that it can process data in computer memory, as opposed to just using hard disks, much move at slower speeds. But because the amount of data that can fit in-memory is limited, the tool can process data on disks as well, and that’s what Databricks was trying to highlight as it sought to break’s Yahoo’s record on the Gray Sort, which measures the time it takes to sort 100 terabytes of data, aka 100,000 gigabytes. Yahoo did the sort in 72 minutes with a cluster of 2,100 machines using Hadoop MapReduce last year. Databricks was able to process the same amount of data in 23 minutes using Spark, using only 206 virtual machines running on Amazon’s cloud service. It also sorted a petabtye of data—about 1,000 terabytes — in less than four hours using 190 machines.

Startup Crunches 100 Terabytes of Data in a Record 23 Minutes | WIRED

In successive waves, innovation pioneered in the financial markets has been adopted to dating. Online dating’s initial trading platforms—Match created in 1995, JDate in 1997, etc.—were the relationship equivalent to the online trading sites that first allowed investors to directly manage their own portfolios….

Then came quantitative trading. EHarmony’s “scientific approach” came out in 2000, with later editions augmented by an “algorithm of love.” OkCupid, launched in 2004, has brought us big-data dating…

Then came high-frequency trading. Sites like Grindr, launched in 2009, or Tinder, launched in 2012, give a whole new meaning to what Michael Lewis has described as “flash boys” in the financial markets. We now swipe left or right so quickly that we can’t even fully process the transactions—in this case, people—flashing across our screens.

How both dating and finance have been screwed by the Internet. (via interestingsnippets)

(via interestingsnippets)

So, here’s an idea: Any library that would like to make its usage data public is encouraged to create a “stackscore” for each item in its collection. A stackscore is a number from 1 to 100 that represents how relevant an item is to the library’s patrons as measured by how they’ve used it. There are many types of relevant data: Check-ins. Usage broken down by class of patron (faculty? grad student? undergrad?). Renewals. Number of copies in the collection. Whether an item has been put on reserve for a course. Inclusion in a librarian-created guide. Ratings by users on the library’s website. Early call-backs from loans. Citations. Being listed on a syllabus. Being added to a user-created list. Which of these factors should figure into stackscore? It’s the sort of question standards committees argue about until they are red in the face. There is no right answer. So, stackscore gives up. Each library is left to compute its stackscore using whatever metrics it wants, giving factors whatever relative values they want. In the interest of transparency, libraries should publish their formulae, but they are not beholden to any other library’s idea of relevance.

A Good, Dumb Way to Learn From Libraries – The Conversation - Blogs - The Chronicle of Higher Education

When government doesn’t keep its own records well enough, it’s fundamentally corrosive to people’s faith in government,” said William B. McAllister, director of special projects in the U.S. Department of State’s Office of the Historian, who stressed that he was sharing his own views, not the government’s. “If you don’t have somebody ensuring accountability, then you’re almost always going to have a problem…. I would pressure [the ARL] to consider advocacy and creating consortia with other groups that are interested in these issues to see if you can get a place at the table when the specs are made in the first place.

At research library conference, speakers push for advocacy @insidehighered

We wanted to say to parents: ‘No one’s going to sell your kids’ data; nobody’s going to track your child around the Internet; no one’s going to compile a profile that is used against your child when they apply for a job 20 years later,’ ” said Jules Polonetsky, executive director of the Future Privacy Forum, which has received financing from technology companies, including some of the signatories to the privacy pledge. “We hope this is a useful way for companies that want to be trusted partners in schools to make it clear they are on the side of responsible data use.

Microsoft and Other Firms Pledge to Protect Student Data -

According to the US Census Bureau’s world population counter, we are currently just a few million shy of 7.2 billion people on the planet. That figure is growing by about 2 people per second, or 1.2 percent annually. But head over to the counter of active mobile connections maintained by mobile analysis firm GSMA Intelligence and you’ll see that there’s currently over 7.2 billion SIM cards operating in the world right now, and that figure is growing over five times faster than the population counter.

There are now more gadgets on Earth than people - CNET

Invisibles will create a world in which we don’t see technology or sensors; they are seamlessly integrated into the human body. We won’t worry about slick aluminum, glass, or steel. Technology will become human. We will return to ourselves. We have projects in which minimally invasive sensors are implanted into the human body and the biometric data is seamlessly connecting to a mobile device. Medical device innovators are betting millions of dollars in the belief that invisibles will change behavior, help people adhere to new treatments, and create a better dialogue between caregivers and patients.

Invisibles, Not Wearables, Will Profoundly Change Health Care | Co.Exist | ideas impact

By using GPS coordinates, Tocker was able to track cab traffic to and from strip clubs located in Hell’s Kitchen between the hours of midnight and 6 a.m. By pinpointing the pickup and drop-off zones, Tockar could tell, with frightening precision, where a loyal customer of Larry Flynt’s Hustler club might reside. “The potential consequences of this analysis cannot be overstated,” writes Tockar. “Using this freely-obtainable, easily-created map, one can find out where many of Hustler’s customers live, as there are only a handful of locations possible for each point.”

NYC Taxi Data Blunder Reveals Which Celebs Don’t Tip—And Who Frequents Strip Clubs | Fast Company | Business Innovation

While most providers have installed some kind of electronic record system, two recent studies have found that fewer than half of the nation’s hospitals can transmit a patient care document, while only 14 percent of physicians can exchange patient data with outside hospitals or other providers. “We’ve spent half a million dollars on an electronic health record system about three years ago, and I’m faxing all day long. I can’t send anything electronically over it,” said Dr. William L. Rich III, a member of a nine-person ophthalmology practice in Northern Virginia and medical director of health policy for the American Academy of Ophthalmology.

Doctors Hit a Snag In the Rush to Connect -

Already the new phone has led to an eruption from the director of the F.B.I., James B. Comey. At a news conference on Thursday devoted largely to combating terror threats from the Islamic State, Mr. Comey said, “What concerns me about this is companies marketing something expressly to allow people to hold themselves beyond the law.” He cited kidnapping cases, in which exploiting the contents of a seized phone could lead to finding a victim, and predicted there would be moments when parents would come to him “with tears in their eyes, look at me and say, ‘What do you mean you can’t’ ” decode the contents of a phone. “The notion that someone would market a closet that could never be opened — even if it involves a case involving a child kidnapper and a court order — to me does not make any sense.”

Signaling Post-Snowden Era, New iPhone Locks Out N.S.A. -