fbpx
Home Blog Page 1357

Not All Data Is Created Equal

0
Not All Data Is Created Equal

High-quality and relevant data can be a powerful force for good, but flawed data only perpetuates inequalities under the guise of fairness.

At its best, data science can impact global societies in incredible ways. It can work to enhance ocean health, identify and deliver food surpluses to feed the hungry, and use cellphone data to standardize public transportation routes in developing areas like Nairobi.

Data scientists in both the public and private sectors must understand the underlying opportunities to use data in new applications, address potential ethical and bias risks, and weigh the need for data regulation.

Before algorithms can be used appropriately, it’s necessary to access good data sources and evaluate the quality of all available data. According to Vinod Bakthavachalam, a senior data scientist at Coursera, critical questions to ask before using a data set in any application include: Is there measurement error? Do I understand how the data was captured? Are there weird outliers or other abnormal numbers?

“Even if the data on its own is good, there’s always a chance it may be unusable if it’s not right for a specific purpose,” he says.

For example, you may have high-quality data on a consumer’s willingness to spend over $100 on shoes, but perhaps that data was collected during the holiday season when shoppers traditionally spend more and is thus inapplicable to predicting year-round shopping trends. In other words, it may be the best data in the world, but whether it’s the most relevant data is an entirely different matter.

Data scientists must also understand that although algorithms can make a positive difference in society, there is a risk that some algorithms instead further entrench cultural prejudice and bias.

Machine learning algorithms are one of the most common data algorithms in daily life. They are frequently used to suggest products for consumers on e-commerce sites, and they’re also increasingly applied in cases like hiring or lending decisions. Used correctly, such algorithms can remove racial or gender bias by focusing on internal characteristics that predict success, thereby ignoring the human tendency to prefer people who are similar to themselves.

However, used incorrectly, these models simply provide a veneer of respectability to an otherwise unethical process. An algorithm that sees bias in its training data will spit out biased conclusions when fed new data because machine learning algorithms don’t make the best decisions; they make the decision the human that “trained” it would have made. For example, if a company has only hired white males in the past and trains its hiring algorithm using that data, it will perpetuate such hiring practices. Biased data, then, leads to biased results.

To avoid such biases, Coursera deliberately chose to ignore gender when training its machine learning algorithms to recommend classes to potential students.

“In the U.S., women are less likely to enroll in STEM classes, so if we used gender, it wouldn’t recommend certain courses to women,” Bakthavachalam says. “We want to encourage women to enroll in STEM classes and avoid any biases in the algorithms.”

Coursera’s experience underscores the fact that although there is no silver bullet for avoiding algorithmic bias, it’s also not too complicated a problem to fix, either. In fact, it’s more a matter of awareness than a difficult engineering problem to solve, and it begins with the knowledge that artificial intelligence is by no means perfect. According to Bakthavachalam, data scientists must avoid treating machine learning algorithms as black boxes because “if you don’t know what’s going on under the hood, it’s hard to imagine and diagnose issues.”

Data scientists must also be vigilant in their initial examination of training data, a process that needs to have a diverse team and, in some situations, outside reviewers. The biggest risk, according to Bakthavachalam, is that data scientists realize the potential for data misuse, but don’t put in the necessary work to rectify potential issues.

“Everyone has different value systems, and being open and upfront about the algorithm can lead collectively to the right decision,” says Bakthavachalam.

On a positive note, data science makes it easier to eliminate bias by quantifying prejudices and highlighting trends that may otherwise go unnoticed. This allows data scientists to remove bias by analyzing only legitimately relevant information, therefore empowering companies to provide services to previously underserved populations, especially in the financial services realm.

An example is MyBucks, the fintech company powered by a machine learning-enabled, credit-scoring engine that serves the underbanked in 11 African nations. By aggregating large amounts of data, MyBucks has greater insight into which individuals are likely to default, allowing them to move beyond a reliance on more simplistic predictors like credit score.

In Kenya, for instance, data is pulled solely from an individual’s phone, and loans are paid directly into mobile wallets within minutes.

This service is especially important in nations where schools require full tuition payment upfront, historically a significant barrier to pursuing an education in some poorer countries.

Above all, data scientists must avoid getting lost in the techniques and methods of their trade. They must ask questions about who will be affected by the work and how are they ensuring that by doing “good” for one group, they don’t inadvertently harm another.

It’s through transparency about how data is collected, how it’s defined, and its limitations that analysts working together can get the most impactful results. Machines can learn, but it’s the human insights and supervision that enable organizations to balance power and fairness.

 

25 Terms Every Data Scientist Should Know

0
25 Terms Every Data Scientist Should Know

Common data science terms your manager will expect you to know.

Data science is, among other things, a language, according to Robert Brunner, a professor in the School of Information Sciences at the University of Illinois. This concept might come as a shock to those who associate data science jobs with numbers alone.

Data scientists increasingly work across entire organizations, and communication skills are as important as technical ability. Data science is booming in every industry, as more people and companies are investing their time to better understand this constantly expanding field. The ability to communicate effectively is a key talent differentiator.

Whether you pursue a deeper knowledge of data science by learning a specialty, or simply want to gain a smart overview of the field, mastering the right terms will fast-track you to success on your educational and professional journey.

According to Vinod Bakthavachalam, a senior data scientist at Coursera, using the following data science terms accurately will help you stand out from the crowd:

  1. Business Intelligence (BI). BI is the process of analyzing and reporting historical data to guide future decision-making. BI helps leaders make better strategic decisions moving forward by determining what happened in the past using data, like sales statistics and operational metrics.
  2. Data Engineering. Data engineers build the infrastructure through which data is gathered, cleaned, stored and prepped for use by data scientists. Good engineers are invaluable, and building a data science team without them is a “cart before the horse” approach.
  3. Decision Science. Under the umbrella of data science, decision scientists apply math and technology to solve business problems and add in behavioral science and design thinking (a process that aims to better understand the end user).
  4. Artificial Intelligence (AI). AI computer systems can perform tasks that normally require human intelligence. This doesn’t necessarily mean replicating the human mind, but instead involves using human reasoning as a model to provide better services or create better products, such as speech recognition, decision-making and language translation.
  5. Machine Learning. A subset of AI, machine learning refers to the process by which a system learns from inputted data by identifying patterns in that data, and then applying those patterns to new problems or requests. It allows data scientists to teach a computer to carry out tasks, rather than programming it to carry out each task step-by-step. It’s used, for example, to learn a consumer’s preferences and buying patterns to recommend products on Amazon or sift through resumes to identify the highest-potential job candidates based on key words and phrases.
  6.  Supervised Learning. This is a specific type of machine learning that involves the data scientist acting as a guide to teach the desired conclusion to the algorithm. For instance, the computer learns to identify animals by being trained on a dataset of images that are properly labeled with each species and its characteristics.
  7. Classification is an example of supervised learning in which an algorithm puts a new piece of data under a pre-existing category, based on a set of characteristics for which the category is already known. For example, it can be used to determine if a customer is likely to spend over $20 online, based on their similarity to other customers who have previously spent that amount.
  8. Cross validation is a method to validate the stability, or accuracy, of your machine-learning model. Although there are several types of cross validation, the most basic one involves splitting your training set in two and training the algorithm on one subset before applying it the second subset. Because you know what output you should receive, you can assess a model’s validity.
  9. Clustering is classification but without the supervised learning aspect. With clustering, the algorithm receives inputted data and finds similarities in the data itself by grouping data points together that are alike.
  10. Deep Learning. A more advanced form of machine learning, deep learning refers to systems with multiple input/output layers, as opposed to shallow systems with one input/output layer. In deep learning, there are several rounds of data input/output required to assist computers to solve complex, real-world problems. A deep dive can be found here.
  11. Linear Regression. Linear regression models the relationship between two variables by fitting a linear equation to the observed data. By doing so, you can predict an unknown variable based on its related known variable. A simple example is the relationship between an individual’s height and weight.
  12. A/B Testing. Generally used in product development, A/B testing is a randomized experiment in which you test two variants to determine the best course of action. For example, Google famously tested various shades of blue to determine which shade earned the most clicks.
  13. Hypothesis Testing. Hypothesis testing is the use of statistics to determine the probability that a given hypothesis is true. It’s frequently used in clinical research.
  14. Statistical Power. Statistical power is the probability of making the correct decision to reject the null hypothesis when the null hypothesis is false. In other words, it’s the likelihood a study will detect an effect when there is an effect to be detected. A high statistical power means a lower likelihood of concluding incorrectly that a variable has no effect.
  15. Standard Error. Standard error is the measure of the statistical accuracy of an estimate. A larger sample size decreases the standard error.
  16. Causal inference is a process that tests whether there is a relationship between cause and effect in a given situation—the goal of many data analyses in social and health sciences. They typically require not only good data and algorithms, but also subject-matter expertise.
  17. Exploratory Data Analysis (EDA). EDA is often the first step when analyzing datasets. With EDA techniques, data scientists can summarize a dataset’s main characteristics and inform the development of more complex models or logical next steps.
  18. Data Visualization. A key component of data science, data visualizations are the visual representations of text-based information to better detect and recognize patterns, trends and correlations. It helps people understand the significance of data by placing it in a visual context.
  19. R. R is a programming language and software environment for statistical computing. The R language is widely used among statisticians and data miners for developing statistical software and data analysis.
  20. Python is a programming language for general-purpose programming and is one language used to manipulate and store data. Many highly trafficked websites, such as YouTube, are created using Python.
  21. SQL. Structured Query Language, or SQL, is another programming language that is used to perform tasks, such as updating or retrieving data for a database.
  22. ETL. ETL is a type of data integration that refers to the three steps (extract, transform, load) used to blend data from multiple sources. It’s often deployed to build a data warehouse. An important aspect of this data warehousing is that it consolidates data from multiple sources and transforms it into a common, useful format. For example, ETL normalizes data from multiple business departments and processes to make it standardized and consistent.
  23. GitHub. GitHub is a code-sharing and publishing service, as well as a community for developers. It provides access control and several collaboration features, such as bug tracking, feature requests, task management and wikis for every project. GitHub offers both private repositories and free accounts, which are commonly used to host open-source software projects.
  24. Data Models define how datasets are connected to each other and how they are processed and stored inside a system. Data models show the structure of a database, including the relationships and constraints, which helps data scientists understand how the data can best be stored and manipulated.
  25. Data Warehouse. A data warehouse is a repository where all the data collected by an organization is stored and used as a guide to make management decisions.

Mastering these terms is an excellent first step towards a durable data science career. Equally important is ensuring they’re understood throughout your organization so that data scientists can operate more efficiently and effectively with their non-data science partners. Like anything, this takes practice, but by putting these data science building blocks in place, you’ll be at a natural advantage when opportunities arise.

1

How Data Analytics Is Revolutionizing Work

0
How Data Analytics Is Revolutionizing Work

Transformation is achieved through numbers, and nowhere is that understanding more important than in the exploding field of data science.

Data is to this century what fossil fuel was to the last: an accelerator of growth, disruption and change. Today’s world bristles with connected sensors in everything from wristwatches to wind turbines, which collect and transmit steady streams of data. The insights and knowledge pulled from those rapid, real-time flows of structured and unstructured data—which includes words, numbers, videos and photos—are creating new infrastructures, new businesses, new industries and new job descriptions. “Increasingly, you’ll see the merging of data science into the very patterns of our codes, creating new ways of solving really hard problems,” says Bob Lord, chief digital officer of IBM.

As more businesses look to extract value from data-driven technologies like artificial intelligence, the need for talented workers who can interpret the data is expected to rise across all industries. In fact, IBM predicts that the demand for data scientists will soar 28 percent by 2020. The responsibilities of those data scientists are shifting as well—becoming akin to the civil engineers of the 1940s and 1950s who designed bridges and roads—to create our nation’s latest form of infrastructure.

Technology will continue to accelerate transformation as cloud-based solutions allow easier and more secure access to big data sets, which in turn arm companies with the information they need to bring relevant products to market. As all of these changes come online, those who understand the underlying algorithms can have a huge effect on business and society. Here are some examples of those who are already making a difference.

Creating Customized Care

Our bodies are a complex biological roadmap, as unique as our fingerprints. “Our current one-size-fits-all healthcare mentality is often one-size-fits-none,” says Daniel Kraft, a physician and scientist who explores developing technologies in biomedicine and healthcare as chair of Exponential Medicine at Singularity University. “Right now, our healthcare is primarily based on intermittent data—the doctor checking your blood pressure and cholesterol levels in the office during a visit—and it’s very reactive. The move is to look at larger data sets to pick up disease early, then provide personalized care and therapy that maps to exactly what you need.”

Precision healthcare actually tailors treatments to a patient’s unique genetic code and can turn that data into meaningful actions. This data—gathered by connected devices ranging from blood pressure monitors and scales to smart watches and thermometers—will form the basis for personalized and proactive wellness plans that can even include recommended vitamins and exercise. Data scientists will store it on a cloud-based platform, where doctors can also upload the latest related information, such as lab results, to create a more complete health profile. This lays the groundwork for a revolutionary opportunity in personalized treatments—bespoke medicine, if you will—that will drive the next wave of medical breakthroughs.

Improving Urban Logistics

 

 

 

 

 

 

 

 

 

Every city relies on a complex web of data-driven systems and services to survive. And yet, myriad problems still plague the most advanced cities, including bad road quality. The data scientists in Kansas City, MO, however, are doing something about it. Their latest gambit: the development of “pothole prediction” technology.

Bob Bennett, the city’s chief innovation officer, says his teams have worked with Chicago-based Xaqt to create a system that uses various data streams to dynamically plan city operations. Predictor variables include the number of freeze-and-thaw cycles, traffic counts, bus routes and pavement conditions.

The hope is that work crews can focus on more preventative maintenance—stopping a pothole before it starts—rather than a full-scale street repair after a pothole has occurred. In other words, now, with the help of data scientists, your ride across town might be a lot less bumpy.

Up-Leveling Public Health

Data can be just as valuable as cash or equipment in slowing the spread of some of the world’s most vexing problems. When halting the spread of infectious disease, for example, the rapid scraping and analysis of digital data is literally a life-saving tool. Governments, companies, NGOs and specialists need to obtain good data quickly to know where the outbreaks are, how to target them and if the solutions are working.  

Nations work side by side with telecom companies worldwide, linking mobile networks with health services to create a powerful disease detection system. In Pakistan, for instance, health officials partnering with data scientists predicted local Dengue fever outbreaks weeks earlier using smartphone data than they previously would have through traditional means. The network used anonymized electronic surveys to create accurate predictive models that allowed for epidemic preparedness and containment of the virus.

Refining Education

Data scientists at the University of Maryland have begun to use predictive analytics to analyze student behavior, searching for undergraduates who are at risk of dropping out. University system officials say the practice—which may review anything from grades and financial aid information to how often students swipe their ID cards at the library or the dining hall—could help educators assist struggling undergrads. It could also help them identify roadblocks, such as a single difficult class or a combination of pressures that hit at the same time, that lead students to drop out.

The enormous value of data science is becoming clearer every day and so are the opportunities to directly impact people, companies and society. For those who love solving problems or transforming the ordinary into the extraordinary, data science is for you.

 

Weekly Digest #137: Lessons Learned From Learning Scientists Teacher Workshops

0
Weekly Digest #137: Lessons Learned From Learning Scientists Teacher Workshops

In the beginning of January, we were on tour in England to provide workshops to teachers. We enjoyed this opportunity tremendously because it gave us not only an opportunity to reach out to teachers and to disseminate knowledge about learning and teaching strategies from Cognitive Psychology, but also allowed us to learn what strategies teachers are currently using in their classrooms. Furthermore, we had engaging Q&A sessions with quite hard questions from the audience. We did our best to provide answers, but in many cases it became apparent that research is still a long way from addressing all important practical questions and that further research is needed to close knowledge gaps. This was an exciting experience. In today’s weekly digest we want to take the opportunity to, first of all, thank all teachers in the audience of our workshops for their input and questions: Thank You! Second of all, we would like to highlight blog posts by teachers who took the time to write a reflection on the lessons they learned from our workshops. Enjoy!

*Header image by Mark Miller (@MarkMillerTeach)

Cheers to 2019!

0
Cheers to 2019!

Since we were all in the same place – a rare occurrence – we were able to talk about the Learning Scientists Project going forward. To that end, we have a few announcements.

Firstly, Yana Weinstein is no longer with the Learning Scientists team. Yana is taking on some exciting new business opportunities of her own, and we wish her all the best. 

We’ve decided to continue to create new content while focusing on some of the things that we do best. New content will now come out on the blog on Thursdays (emails on Fridays) for consistency. We will be rotating between blogs by us, guest blogs, digests, and podcasts!

We are looking forward to all the adventures awaiting us this year and hope to interact with many of you via different channels. We will keep running our #LrnSciChat on Twitter at the end of each month. So, if you are on Twitter, keep your eyes peeled for it. Our next #LrnSciChat will take place on 23 January at 8pm (UK time) | 3pm (Boston time). We are also continuing research projects investigating the best way to teach students to utilize effective learning strategies.

Thank you for your continuous support of the Learning Scientist project and we wish you a successful year.

See you in 2019!

0
See you in 2019!

Left: Althea on her wedding day with her husband, September 2018

Top center: Cindy with her family

Bottom center: Carolina and her family

Right: Megan and her husband

Not pictured: Yana

We hope you’re enjoying this holiday season, whether you’re celebrating a holiday, the end of the year, or just time away from the typical grind. We’re taking a break to spend some time with family and friends. In early January, 2019, the Learning Scientists will be touring England, going to Great Yarmouth, Bedford, and London! When we return from our tour we’ll be back to blogging. Our next post will be on January 17, 2019! Happy holidays, and a very happy new year to all.

Weekly Digest #136: Optimizing Lecture Capture

0
Weekly Digest #136: Optimizing Lecture Capture

Today’s weekly digest is motivated by a paper on lecture capture that I (Carolina) am currently co-writing (1). I thought it would be a good idea to put together a digest summarizing the evidence behind the benefits or pitfalls of recording university lectures. While students are eager to get their hands on lecture recordings, lecturers are usually more hesitant to provide such recordings – fearing that attendance rates will drop substantially. However, it does look like that the future in Higher Education will move towards lecture recordings as a standard practice and it is therefore important to understand ways to optimize their use. Essentially, it will boil down to informing students and lecturers how to make the best use of lecture recordings (1).

Image from Pixabay

Image from Pixabay

1)      Capturing The Lecture by Emily Nordmann, @emilynordmann

In this post some of the main fears that come with lecture capture are described and solutions are proposed. The focus here is on creating good policies that facilitate the implementation of effective use of lecture capture.

 

2)      The Complete Guide To Lecture Capture by Justin Simon via TechSmith, @TechSmith

In this guide the whys and hows of lecture capture are described. It also contains information of of software to use and tips on making effective recordings.

 

3)      Lecture Capture: What Can We Learn From The Research? by Gabi Witthaus, @twitthaus

This article gives a research overview of the effects of lecture capture on student learning and student perception. The author provides a wonderful account of the literature, which holds important practical implications.

Image from Pixabay

Image from Pixabay

 

4)      Lecture Recording: What Does Research Say About Its Effect On Attendance? by Karoline Nanfeldt, @knanfeldt

This is a post by a former 4th Year student at University of Edinburgh. She provides a brief summary of the effects of lecture capture on lecture attendance. This account is particularly interesting because it captures the student voice.

 

5)      Lecture Attendance, Lecture Recordings, And Student Performance: A Complex, But Noteworthy Relationship by Carolina Kuepper-Tetzel, @pimpmymemory

This blog post summarizes a study that looked into the complicated relationship between lecture recordings, attendance, student characteristics, and student performance. It provides a good idea of the many factors that play a role in investigating the benefits of lecture capture.


References

(1)    Nordmann E., Kuepper-Tetzel, C. E., Robson, L., Phillipson, S., Lipan, G. I., & McGeorge, P. (2018, December 11). Lecture capture: Practical recommendations for students and lecturers. Retrieved from psyarxiv.com/sd7u4.


Every Sunday, we pick a theme and provide a curated list of links. If you have a theme suggestion, please don’t hesitate to contact us! Occasionally we publish a guest digest, and If you’d like to propose a guest digest click here. Our 5 most recent digests can be found here:

Weekly Digest #131: Increasing Grading/Marking Efficiency

Weekly Digest #132: Dual Coding, Visual Note Taking, and Sketchnoting

Weekly Digest #133: Technology for Math Learning

Weekly Digest #134: How to Sleep Well

Weekly Digest #135: SoTL Researcher Spotlight II

Weekly Digest #135: SoTL Researcher Spotlight II

0
Weekly Digest #135: SoTL Researcher Spotlight II

Dr. Henry L. Roediger, III is a cognitive psychologist recognized for his work on human learning and memory. He is known for developing techniques to study false memories, the power of retrieval practice in enhancing learning and retention, and a theory to explain differences observed between explicit and implicit memory tests. Dr. Roediger has served as President of the Association for Psychological Science and several other organizations of psychologists. He received the William James Lifetime Achievement Award from APS as well as numerous other awards. For information about Roddy’s research, visit his lab website.

Word + Quiz: yurt

0
Word + Quiz: yurt

Note: Our Sixth Annual 15-Second Vocabulary Video Challenge is underway. It will run until Feb. 18.

: a circular domed dwelling that is portable and self-supporting; originally used by nomadic Mongol and Turkic people of Central Asia but now used as inexpensive alternative or temporary housing

_________

The word yurt has appeared in 19 articles on NYTimes.com in the past year, including on Feb. 2 in “A Room (or a Ryokan, Yurt or R.V.) With a View” by Stephanie Rosenbloom:

Airbnb’s booking data for the beginning of this year suggests that more travelers are interested in spending their vacations in what the short-term rental site calls “nontraditional” spaces, particularly those that allow travelers to be or feel closer to nature. Bookings for nature lodges and ryokans (traditional Japanese inns) have skyrocketed since last year. Reservations for yurts and recreational vehicles (R.V.s) have also spiked.

These are hardly new or nontraditional forms of shelter. The ryokan is centuries old. Yurts have been used by nomads for decades. Yet it seems interest in such lodgings has prompted more places to not only offer them, but reimagine them, too. The latest iterations have modern comforts and deluxe trappings even as they aim to retain some of the minimalism and spirit of their predecessors.

Learning With: ‘These Whales Are Serenaders of the Seas. It’s Quite a Racket.’

0
Learning With: ‘These Whales Are Serenaders of the Seas. It’s Quite a Racket.’

4. How do scientists study whale sounds? What challenges do they face in tracking whale sounds over time?

5. How is the development of an individual whale’s song “one of the best examples of cultural evolution in the animal kingdom”? What are some of the hypotheses to explain why whales repeat, alter or begin new songs? Which do you find most convincing?

6. How have changes to the environment affected whales’ singing? What role has human behavior played?

Finally, tell us more about what you think:

— What did you learn from the article? What was most fascinating, surprising or intriguing? Tell us why. What questions do you still have?

— Does the article make you think differently about whales? Do you have a greater appreciation for the mysteries and diversity of life?

— Do you think research like this is important? How is the study of whale vocalizations helpful? Why or why not?

— In a related article, “Oceans Are Getting Louder, Posing Potential Threats to Marine Life,” Jim Robbins writes about the damage caused by noises from air guns, ship sonar and general tanker traffic:

Aside from the seismic noise, compounded sounds from container ships to navy sonar are posing a problem for marine life. As the number of ships moving around the world has increased significantly in recent years, cavitation, the noise from the synchronous collapse of bubbles created by a ship’s propeller, as well as the rumble of ship engines, poses a bigger and bigger problem. A recent study found that shipping noise could double by 2030.

Noise masks whale expressions between families, which can affect orientation, feeding, care of young, detection of prey and even increase aggression. Already 80 percent of communications of some species of whales is masked by noise, according to models assessed by a team of biologists.

“It’s ripping the communications system apart,” Dr. Clark said. “And every aspect of their lives is dependent on sound, including finding food.”

What is your reaction to this information? What should be done to better protect whales and their ability to communicate with one another?